Compare commits

...

1268 Commits

Author SHA1 Message Date
Paul Blakey 73590d9573 tc: flower: Fix buffer overflow on large labels
Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').

Fix that by increasing the print buffer size.

Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111

Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-06 13:44:50 -08:00
Stephen Hemminger 3f77bc6253 uapi: update to if_ether.h
Merged from 5.16-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-03 12:20:02 -08:00
Maxim Petrov 5f8bb902e1 ip/ipnexthop: fix unsigned overflow in parse_nh_group_type_res()
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.

Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.

Fixes: 9167671822 ("nexthop: Add support for resilient nexthop groups")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:48 -08:00
Maxim Petrov 3184de3797 lib/bpf_legacy: remove always-true check
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:04 -08:00
Stephen Hemminger 79026c1262 rdma: update uapi headers
Update the RDMA uapi headers from 5.16.0-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 10:00:19 -08:00
Stephen Hemminger fa58de9b0c vdpa: align uapi headers
Update vdpa headers based on 5.16.0-rc1 and remove redundant
copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 09:56:57 -08:00
[200~jiangheng be31c26484 lnstat: fix buffer overflow in header output
Running lnstat will cause core dump from reading past end of array.

Segmentation fault (core dumped)

The maximum  value of th.num_lines is HDR_LINES(10),  h should not be equal to th.num_lines, array th.hdr may be out of bounds.

Signed-off-by jiangheng <jiangheng12@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 13:41:10 -08:00
Maxim Petrov 0e94972590 tc/m_vlan: fix print_vlan() conditional on TCA_VLAN_ACT_PUSH_ETH
Fix the wild bracket in the if clause leading to the error in the condition.

Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 11:13:12 -08:00
Davide Caratti 9bd5ab0f09 mptcp: fix JSON output when dumping endpoints by id
iproute ignores '-j' command line argument when dumping endpoints by id:

 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
 [{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
 1.2.3.4 id 42 signal backup

fix mptcp_addr_show() to use the proper JSON helpers.

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-11 10:07:26 -08:00
Anssi Hannula a787d9ae10 man: tc-u32: Fix page to match new firstfrag behavior
Commit 690b11f4a6 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.

However, the documentation added in f15a23966f ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).

Due to this, the example does not work correctly and does not actually
classify regular SSH packets.

Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.

Fixes: f15a23966f ("tc: add a man page for u32 filter")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:46:17 -08:00
Luca Boccassi af96c7b5dd Fix some typos detected by Lintian in manpages
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:45:44 -08:00
Stephen Hemminger 35c81b18c4 uapi: update vdpa.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:40:40 -08:00
David Ahern 50b668bdbf Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:45:31 -06:00
David Ahern 9c56d693f6 Merge branch 'can-tdc-plus-cleanups' into next
Vincent Mailhol  says:

====================

The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.

This patch series contains four patches which respectively:

  1. Correct the bittiming ranges in the print_usage function and add
  the units to give more clarity: some parameters are in milliseconds,
  some in nano seconds, some in time quantum and the newly TDC
  parameters introduced in this series would be in clock period.

  2. Do some code refactoring on function print_ctrlmode().

  3. factorize the many print_*(PRINT_JSON, ...) and fprintf
  occurrences in a single print_*(PRINT_ANY, ...) call and fix the
  signedness while doing that.

  4. report the value of the bitrate prescalers (brp and dbrp).

  5. adds command line support for the TDC in iproute and goes together
  with below series in the kernel:
  https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t

** Changelog **

>From RFC v5 to v6:
  * Dropped the RFC tag because the related patch series on the kernel
    side were pulled into net-next.
  * Remove the changes in include/uapi/linux/can/netlink.h because
    these should be pulled separately.
  * Add another patch (the second of this series) to do some cleanup
    on function print_ctrlmode().
  * Minor fixes in the patch comments (grammar, rephrasing).

>From RFC v4 to RFC v5:
  * Add the unit (bps, tq, ns or ms) in print_usage()
  * Rewrote void can_print_timing_min_max() to better factorize the
    code.
  * Rewrote the commit message of the two last patches (those related
    to TDC) to either add clarification of fix inacurracies.

>From v3 to RFC v4:
  * Reflect the changes made on the kernel side.

>From RFC v2 to v3:
  * Dropped the RFC tag. Now that the kernel patch reach the testing
    branch, I am finaly ready.
  * Regression fix: configuring a link with only nominal bittiming
    returned -EOPNOTSUPP
  * Added two more patches to the series:
      - iplink_can: fix configuration ranges in print_usage()
      - iplink_can: print brp and dbrp bittiming variables
  * Other small fixes on formatting.

>From RFC v1 to RFC v2:
  * Add an additional patch to the series to fix the issues reported
    by Stephen Hemminger
    Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:44:56 -06:00
Vincent Mailhol 0c263d7c36 iplink_can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.

This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).

This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
  - IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
    minimum value
  - IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
    maximum value
  - IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
    minimum value
  - IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
    maximum value
  - IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
    window minimum value
  - IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
    window maximum value
  - IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
  - IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
  - IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window

All those new parameters are nested together into the attribute
IFLA_CAN_TDC.

The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.

When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate.  Valid options for tdc-mode are:

  * auto: the transmitter dynamically measures TDCV for each of the
    transmitted frames. As such, TDCV can not be manually provided. In
    this mode, the user must specify TDCO and may also specify TDCF if
    supported.

  * manual: use a static TDCV provided by the user. In this mode, the
    user must specify both TDCV and TDCO and may also specify TDCF if
    supported.

  * off: TDC is explicitly disabled.

  * tdc-mode parameter omitted (default mode): the kernel decides
    whether TDC should be enabled or not and if so, it calculates the
    TDC values. TDC parameters are an expert option and the average
    user is not expected to provide those, thus the presence of this
    "default mode".

If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.

If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.

TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).

TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).

For reference, here are a few samples of how the output looks like:

| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto

| $ ip --details link show can0
| 1:  can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD,TDC-AUTO> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
| 	  bitrate 1000000 sample-point 0.750
| 	  tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1 brp 1
| 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp_inc 1
| 	  dbitrate 8000000 dsample-point 0.700
| 	  dtq 12 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1
| 	  tdco 7 tdcf 8
| 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp_inc 1
| 	  tdco 0..127 tdcf 0..127
| 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

| $ ip --details --json --pretty link show can0
| [ {
|         "ifindex": 1,
|         "ifname": "can0",
|         "flags": [ "NOARP","ECHO" ],
|         "mtu": 72,
|         "qdisc": "noop",
|         "operstate": "DOWN",
|         "linkmode": "DEFAULT",
|         "group": "default",
|         "txqlen": 10,
|         "link_type": "can",
|         "promiscuity": 0,
|         "min_mtu": 0,
|         "max_mtu": 0,
|         "linkinfo": {
|             "info_kind": "can",
|             "info_data": {
|                 "ctrlmode": [ "FD","TDC-AUTO" ],
|                 "state": "STOPPED",
|                 "berr_counter": {
|                     "tx": 0,
|                     "rx": 0
|                 },
|                 "restart_ms": 0,
|                 "bittiming": {
|                     "bitrate": 1000000,
|                     "sample_point": "0.750",
|                     "tq": 12,
|                     "prop_seg": 29,
|                     "phase_seg1": 30,
|                     "phase_seg2": 20,
|                     "sjw": 1,
|                     "brp": 1
|                 },
|                 "bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 256
|                     },
|                     "tseg2": {
|                         "min": 2,
|                         "max": 128
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 128
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 512
|                     },
|                     "brp_inc": 1
|                 },
|                 "data_bittiming": {
|                     "bitrate": 8000000,
|                     "sample_point": "0.700",
|                     "tq": 12,
|                     "prop_seg": 3,
|                     "phase_seg1": 3,
|                     "phase_seg2": 3,
|                     "sjw": 1,
|                     "brp": 1,
|                     "tdc": {
|                         "tdco": 7,
|                         "tdcf": 8
|                     }
|                 },
|                 "data_bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 32
|                     },
|                     "tseg2": {
|                         "min": 1,
|                         "max": 16
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 8
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 32
|                     },
|                     "brp_inc": 1,
|                     "tdc": {
|                         "tdco": {
|                             "min": 0,
|                             "max": 127
|                         },
|                         "tdcf": {
|                             "min": 0,
|                             "max": 127
|                         }
|                     }
|                 },
|                 "clock": 80000000
|             }
|         },
|         "num_tx_queues": 1,
|         "num_rx_queues": 1,
|         "gso_max_size": 65536,
|         "gso_max_segs": 65535
|     } ]

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:43:10 -06:00
Vincent Mailhol 0f7bb8d842 iplink_can: print brp and dbrp bittiming variables
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.

Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.

Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:

        brp = clock * tq / 1000000000

with clock in hertz and tq in nano second (thus the need of a 1
billion factor to convert it back to second).

But because above formula is not so trivial to remember and is
subjected to rounding errors, it makes sense to directly output
{d,}bpr.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:54 -06:00
Vincent Mailhol 67f3c7a5cc iplink_can: use PRINT_ANY to factorize code and fix signedness
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.

Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.

This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:50 -06:00
Vincent Mailhol fd5e958c49 iplink_can: code refactoring of print_ctrlmode()
This patch only does cleanup and do not introduce any functional
changes.

We do some code refactoring of print_ctrlmode() in prevision of the
upcoming patch:

  - remove the first argument of print_ctrlmode(). It is a pointer to
    FILE and is never used.

  - add a new function argument: enum output_type t in order to
    specify the output type (i.e. PRINT_{FP,JSON,ANY}).

  - add a new function argument: const char *key in order to specify
    the name of the json array (e.g. "ctrlmode").

  - replace the _PF() macro with the print_flag() function to increase
    readability.

  - directly return if none of the flags are set (previously, this
    check was done before calling the function).

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:44 -06:00
Vincent Mailhol 8316df6e6d iplink_can: fix configuration ranges in print_usage() and add unit
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.

The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".

Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.

The actual ranges are specific to each device and can be confirmed
doing:

$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state STOPPED restart-ms 0
	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.

For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { 1..1000000 }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER }
		  PROP-SEG	:= { 1..8 }
		  PHASE-SEG1	:= { 1..8 }
		  PHASE-SEG2	:= { 1..8 }
		  SJW		:= { 1..4 }
		  RESTART-MS	:= { 0 | NUMBER }

...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]
	[ cc-len8-dlc { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { NUMBER in bps }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER in ns }
		  PROP-SEG	:= { NUMBER in tq }
		  PHASE-SEG1	:= { NUMBER in tq }
		  PHASE-SEG2	:= { NUMBER in tq }
		  SJW		:= { NUMBER in tq }
		  RESTART-MS	:= { 0 | NUMBER in ms }

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:23 -06:00
Taehee Yoo 6e15d27aae ip: add AMT support
Add basic support for Automatic Multicast Tunneling (AMT) network devices.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2021-11-03 13:24:13 -06:00
David Ahern 9cae1de564 Import amt.h
Impor amt.h uapi from last kernel sync point

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:23:38 -06:00
David Ahern 258e350ca9 Update kernel headers
Update kernel headers to commit:
    cc0356d6a02e ("Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:22:15 -06:00
Moshe Shemesh 047e9ae516 devlink: Fix cmd_dev_param_set() to check configuration mode
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.

Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.

Fixes: 13925ae9eb ("devlink: Add param command support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-02 08:34:33 -07:00
Stephen Hemminger 7a8b7573a4 v5.15.0 2021-11-01 16:41:02 -07:00
Neta Ostrovsky ad3a118f88 rdma: Fix SRQ resource tracking information json
Fix the json output for the QPs that are associated with the SRQ -
The qpn are now displayed in a json array.

Sample output before the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128,130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Sample output after the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128","130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Fixes: 9b272e138d ("rdma: Add SRQ resource tracking information")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:04:45 -07:00
Antoine Tenart 7a235a101b man: devlink-port: fix pfnum for devlink port add
When configuring a devlink PCI port, the pfnumber can be specified
using 'pfnum' and not 'pcipf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:03:44 -07:00
David Ahern e2947f6fd8 Merge branch 'managed-neighbor' into next
Daniel Borkmann  says:

====================

iproute2 patches to add support for managed neighbor entries as per recent
net-next commits:

  2ed08b5ead3c ("Merge branch 'Managed-Neighbor-Entries'")
  c47fedba94bc ("Merge branch 'minor-managed-neighbor-follow-ups'")

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 09:00:26 -06:00
Daniel Borkmann 9e009e78e7 ip, neigh: Add NTF_EXT_MANAGED support
Currently, ip neigh does not support the NTF_EXT_MANAGED flag. Add cmdline
support.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:59:03 -06:00
Daniel Borkmann 040e52526c ip, neigh: Add missing NTF_USE support
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:55 -06:00
Daniel Borkmann c76a3849ec ip, neigh: Fix up spacing in netlink dump
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:50 -06:00
Nicolas Dichtel 76b30805f9 xfrm: enable to manage default policies
Two new commands to manage default policies:
 - ip xfrm policy setdefault
 - ip xfrm policy getdefault

And the corresponding part in 'ip xfrm monitor'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:28 -06:00
David Ahern 2be7d99960 Merge branch 'rdma-optional-stats' into next
Mark Zhang  says:

====================

This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg106283.html

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-16 12:52:02 -06:00
Stephen Hemminger 229eaba507 uapi: pickup fix for xfrm ABI breakage
See kernel
Commit 844f7eaaed9 ("include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-15 17:40:30 -07:00
Nicolas Dichtel 95cd2a6204 iplink: enable to specify index when changing netns
When an interface is moved to another netns, it's possible to specify a
new ifindex. Let's add this support.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eeb85a14ee34
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 18:05:09 -06:00
David Ahern a936a73fc2 Merge branch 'config-libdir' into next
Andrea Claudi  says:

====================

This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.

Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;

Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;

Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;

Patch 5 introduces support for the --opt=value style on current options,
for uniformity;

Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;

Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.

Changelog:
----------
v4 -> v5
  - bail out when multiple values are provided with a single option
  - simplify option parsing and reduce code duplication, as suggested
    by Phil Sutter
  - remove a nasty eval on libdir option processing

v3 -> v4
  - fix parsing issue on '--include_dir' and '--libbpf_dir'
  - split '--opt value' and '--opt=value' use cases, avoid code
    duplication moving semantic checks on value to dedicated functions

v2 -> v3
  - fix parsing error on prefix and libdir options.

v1 -> v2
  - consolidate '--opt value' and '--opt=value' use cases, as suggested
    by David Ahern.
  - added patch 2 to manage the --prefix option, used by the Debian
    packaging system, as reported by Luca Boccassi, and use it when
    setting lib directory.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:59:33 -06:00
Andrea Claudi cee0cf84bd configure: add the --libdir option
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.

At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).

As Luca states in commit a3272b9372 ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.

Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.

Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:20 -06:00
Andrea Claudi 0ee1950b5c configure: add the --prefix option
This commit add the '--prefix' option to the iproute2 configure script.

This mimics the '--prefix' option that autotools configure provides, and
will be used later to allow users or packagers to set the lib directory.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:17 -06:00
Andrea Claudi 4b8bca5f9e configure: support --param=value style
This commit makes it possible to specify values for configure params
using the common autotools configure syntax '--param=value'.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:05 -06:00
Andrea Claudi 99245d1741 configure: simplify options parsing
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.

- The conditional shift after the assignments is moved right after the
  case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
  like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
  making it possible to get rid of the null string case break.

As a bonus, now the help message states that on or off should follow
--libbpf_force

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:02 -06:00
Andrea Claudi c330d09794 configure: fix parsing issue with more than one value per option
With commit a9c3d70d90 ("configure: add options ability") users are no
more able to provide wrong command lines like:

$ ./configure --include_dir foo bar

The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.

Commit a3272b9372 ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.

This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.

Fixes: a3272b9372 ("configure: restore backward compatibility")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:57 -06:00
Andrea Claudi 48c379bc2a configure: fix parsing issue on libbpf_dir option
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:

$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.

Fixes: 7ae2585b86 ("configure: convert LIBBPF environment variables to command-line options")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:53 -06:00
Andrea Claudi 1d819dcc74 configure: fix parsing issue on include_dir option
configure is stuck in an endless loop if '--include_dir' option is used
without a value:

$ ./configure --include_dir
./configure: line 506: shift: 2: shift count out of range
./configure: line 506: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the include dir
exists; this avoid to produce an erroneous configuration.

Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:48 -06:00
Neta Ostrovsky 19ba785f16 rdma: Add optional-counters set/unset support
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.

Examples:
To enable a set of optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic unset link rocep8s0f0/1 optional-counters

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:57 -06:00
Neta Ostrovsky 7d5cb70e94 rdma: Add stat "mode" support
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.

An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.

Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
    $ rdma statistic mode link rocep8s0f0/1
    link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

To present supported optional counters on link rocep8s0f0/1:
    $ rdma statistic mode supported link rocep8s0f0/1
    link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:53 -06:00
Neta Ostrovsky d480cb71f5 rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit 7301d0a9834c
("RDMA/nldev: Add support to get status of all counters")

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:47 -06:00
David Ahern e4ca6a4965 Update kernel headers
Update kernel headers to commit:
    295711fa8fec ("Merge branch 'dpaa2-irq-coalescing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:49:19 -06:00
Stephen Hemminger a31e7b7967 mptcp: cleanup include section.
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.

We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.

Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:48:36 -06:00
Paul Chaignon a500c5ac87 lib/bpf: fix map-in-map creation without prepopulation
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.

Commit 6d61a2b557 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.

This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=130996, max_entries=1, map_flags=0, inner_map_fd=0, map_name="maglev_inner", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = 32
    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=2, value_size=4, max_entries=65536, map_flags=BPF_F_NO_PREALLOC, inner_map_fd=0, map_name="maglev_outer", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = -1 EINVAL (Invalid argument)

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-14 14:37:51 -07:00
Antoine Tenart 7c032cac10 man: devlink-port: remove extra .br
br. were added between options of the same command. That is not needed
and makes the output to be one 3 lines for no particular reason.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 04ee8e6f06 man: devlink-port: fix style
Values should be .I, square brackets should be used for optional values,
curly brackets for lists. Follow this in the devlink-port man page.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 14802d84d3 man: devlink-port: fix the devlink port add synopsis
When configuring a devlink PCI SF port, the sfnumber can be specified
using 'sfnum' and not 'pcisf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
David Ahern 8cd517a805 Merge branch 'main' into next
Conflicts:
	ip/ipneigh.c

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:47:47 -06:00
David Ahern 763fd793fe Merge branch 'ioam-encap-modes' into next
Justin Iurman  says:

====================

Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.

In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.

This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:

 - inline: directly inserts IOAM inside packets (by default).

 - encap:  ip6ip6 encapsulation of IOAM inside packets.

 - auto:   either inline mode for packets generated locally or encap mode for
           in-transit packets.

With current iproute2 implementation, it is configured this way:

$ ip -6 r [...] encap ioam6 trace prealloc [...]

The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:

(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]

(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]

(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]

A tunnel destination address must be configured when using the encap mode or the
auto mode.

  [1] https://lore.kernel.org/netdev/163335001045.30570.12527451523558030753.git-patchwork-notify@kernel.org/T/#m3b428d4142ee3a414ec803466c211dfdec6e0c09

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:37:12 -06:00
Justin Iurman 41020eb0fd Update documentation
This patch updates the IOAM documentation (ip-route man page) to reflect the
three encap modes that were introduced.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:54 -06:00
Justin Iurman 8fb522cde3 Add support for IOAM encap modes
This patch adds support for the three IOAM encap modes that were introduced:
inline, encap and auto.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:29 -06:00
Frank Villaro-Dixon 897772a735 cmd: use spaces instead of tabs for usage indentation
Fix rogue "tab after spaces" used for indentation of the documentation.
This causes rendering issues on terminals using a non-standard tab width.

Signed-off-by: Frank Villaro-Dixon <frank.villaro@infomaniak.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-06 10:00:49 -07:00
Nikolay Aleksandrov b840c620fe ip: nexthop: keep cache netlink socket open
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:34:29 -06:00
Jacob Keller b90174354d devlink: print maximum number of snapshots if available
Recently the kernel gained ability to report the maximum number of
snapshots a region can have. Print this value out if it was reported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:31:30 -06:00
David Ahern 6448ed373c Update kernel headers
Update kernel headers to commit:
    49ed8dde3715 ("net: usb: use eth_hw_addr_set() for dev->addr_len cases")

Update to linux/mptcp.h is removed because it breaks compilation
of ipmptcp.c in a nontrivial way.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:28:28 -06:00
Davide Caratti e7a98a96f0 mptcp: unbreak JSON endpoint list
the following command:

 # ip -j mptcp endpoint show

prints a JSON array that misses the terminating bracket. Fix this calling
delete_json_obj() to balance the call to new_json_obj().

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
2021-10-04 14:07:09 -07:00
David Ahern ec703e0629 Merge branch 'nexthop-cache' into next
Nikolay Aleksandrov  says:

====================

This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.

I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.

Comments are very welcome as usual. :)

Changes since RFC:
 - reordered parse/print splits, in order to do that I have to parse
   resilient groups first, then add nh entry parsing so code has been
   reordered as well and patch order has changed, but there have been
   no functional changes (as before refactoring of old code is done in
   the first 8 patches and then patches 9-12 add the new cache and use it)
 - re-run all tests above

Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
             can directly pass them from the nh_entry structure later
Patch     3: adds new nha_res_grp structure which describes a resilient
             nexhtop group
Patch     4: splits print_nh_res_group into a parse and print parts
             which use the new nha_res_grp structure
Patch     5: adds new nh_entry structure which describes a nexthop
Patch     6: factors out print_nexthop's attribute parsing into nh_entry
             structure used before printing
Patch     7: factors out print_nexthop's nh_entry structure printing
Patch     8: factors out ipnh_get's rtnl talk part and allows to use a
             different rt handle for the communication
Patch     9: adds nexthop cache and helpers to manage it, it uses the
             new __ipnh_get to retrieve nexthops
Patch    10: adds a new helper print_cache_nexthop_id that prints nexthop
             information from its id, if the nexthop is not found in the
             cache it fetches it
Patch    11: the new print_cache_nexthop_id helper is used when printing
             routes with show_details (-d) to output detailed nexthop
             information, the format after nh_info is the same as
             ip nexthop show
Patch    12: changes print_nexthop into print_cache_nexthop which always
             outputs the nexthop information and can also update the cache
             (based on process_cache argument), it's used to keep the
             cache up to date while monitoring

Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
	nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via 169.254.2.12 dev veth3 weight 1
	nexthop via 169.254.3.13 dev veth5 weight 1

[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
	nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via fe80:2::12 dev veth3 weight 1
	nexthop via fe80:3::13 dev veth5 weight 1

[NEXTHOP]id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
	nh_info id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink

JSON:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101,
                    "weight": 11
                },{
                    "id": 102,
                    "weight": 45
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 11,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 45,
                "flags": [ ]
            } ]
  }

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:31:44 -06:00
Nikolay Aleksandrov 7ca868a7aa ip: nexthop: add print_cache_nexthop which prints and manages the nh cache
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:59 -06:00
Nikolay Aleksandrov 5d5dc549ce ip: route: print and cache detailed nexthop information when requested
If -d (show_details) is used when printing/monitoring routes then print
detailed nexthop information in the field "nh_info". The nexthop is also
cached for future searches.

Output looks like:
 unicast 198.51.100.0/24 nhid 103 table 3 proto boot scope global
	 nh_info id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	 nexthop via 169.254.2.22 dev veth2 weight 1
	 nexthop via 169.254.3.23 dev veth4 weight 1

The nh_info field has the same format as ip -d nexthop show would've had
for the same nexthop id.

For completeness the JSON version looks like:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101
                },{
                    "id": 102
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 1,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 1,
                "flags": [ ]
            } ]
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:55 -06:00
Nikolay Aleksandrov cb3d18c29e ip: nexthop: add a helper which retrieves and prints cached nh entry
Add a helper which looks for a nexthop in the cache and if not found
reads the entry from the kernel and caches it. Finally the entry is
printed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:51 -06:00
Nikolay Aleksandrov 60a9703032 ip: nexthop: add cache helpers
Add a static nexthop cache in a hash with 1024 buckets and helpers to
manage it (link, unlink, find, add nexthop, del nexthop). Adding new
nexthops is done by creating a new rtnl handle and using it to retrieve
the nexthop so the helper is safe to use while already reading a
response (i.e. using the global rth).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:41 -06:00
Nikolay Aleksandrov 53d7c43bd3 ip: nexthop: factor out ipnh_get_id rtnl talk into a helper
Factor out ipnh_get_id's rtnl talk portion into a separate helper which
will be reused later to retrieve nexthops for caching.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:36 -06:00
Nikolay Aleksandrov a2ca431215 ip: nexthop: factor out print_nexthop's nh entry printing
Factor out nexthop entry structure printing from print_nexthop,
effectively splitting it into parse and print parts.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:32 -06:00
Nikolay Aleksandrov 945c26db68 ip: nexthop: parse attributes into nh entry structure before printing
Factor out the nexthop attribute parsing and parse attributes into a
nexthop entry structure which is then used to print.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:28 -06:00
Nikolay Aleksandrov 7ec1cee630 ip: nexthop: add nh entry structure
Add a structure which describes a nexthop, it will be later used to
parse, print and cache nexthops.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:24 -06:00
Nikolay Aleksandrov 60a7515b89 ip: nexthop: split print_nh_res_group into parse and print parts
Now that we have resilient group structure split print_nh_res_group into
a parse and print functions, print_nexthop calls the parse function
first to parse the attributes into the structure and then uses the print
function to print the parsed structure.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:19 -06:00
Nikolay Aleksandrov cfb0a8729e ip: nexthop: add resilient group structure
Add a structure which describes a resilient nexthop group. It will be
later used for parsing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:11 -06:00
Nikolay Aleksandrov 371e889da7 ip: export print_rta_gateway version which outputs prepared gateway string
Export a new __print_rta_gateway that takes a prepared gateway string to
print which is also used by print_rta_gateway for consistent format.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:06 -06:00
Nikolay Aleksandrov f72789965e ip: print_rta_if takes ifindex as device argument instead of attribute
We need print_rta_if() to take ifindex directly so later we can use it
with cached converted nexthop objects.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:23:31 -06:00
David Ahern c8c9111a4c Merge branch 'ax.25-netrom-rose' into next
Ralf Baechle  says:

====================

net-tools contain support for these three protocol but are deprecated and
no longer installed by default by many distributions.  Iproute2 otoh has
no support at all and will dump the addresses of these protocols which
actually are pretty human readable as hex numbers:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 88:98:60:a0:92:40:02 brd a2:a6:a8:40:40:40:00
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom 88:98:60:a0:92:40:0a brd 00:00:00:00:00:00:00
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 65:09:33:30:00 brd 00:00:00:00:00

This series adds basic support for the three protocols to print addresses:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 DL0PI-1 brd QST-0
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom DL0PI-5 brd *
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 6509333000 brd 0000000000

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:03:11 -06:00
Ralf Baechle e2cc9840ea ROSE: Print decoded addresses rather than hex numbers.
NETROM is a OSI layer 3 protocol sitting on top of AX.25.  It uses BCD-
encoded 10 digit telephone numbers as addresses.  Without this ip will
print a ROSE addresses like

  link/rose 12:34:56:78:90 brd 00:00:00:00:00

which is readable but ugly.  With this applied it ROSE addresses will be
printed as

  link/rose 1234567890 brd 0000000000

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:51 -06:00
Ralf Baechle 26c5782fab ROSE: Add rose_ntop implementation.
ROSE addresses are ten digit numbers, basically like North American
telephone numbers.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:45 -06:00
Ralf Baechle fd4c1c8168 NETROM: Print decoded addresses rather than hex numbers.
NETROM is an OSI layer 3 protocol sitting on top of AX.25.  It also uses
AX.25 addresses.  Without this commit ip will print NETROM address like

  link/generic 98:92:9c:aa:b0:40:02 brd 00:00:00:00:00:00:00

while with this commit the decoded result

  link/generic LINUX-1 brd *

is much more eye friendly.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:40 -06:00
Ralf Baechle c63b769ad4 NETROM: Add netrom_ntop implementation.
NETROM uses AX.25 addresses so this is a simple wrapper around ax25_ntop1.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:37 -06:00
Ralf Baechle 399ae00af5 AX.25: Print decoded addresses rather than hex numbers.
Before this, ip would have printed the AX.25 address configured for an
AX.25 interface's default addresses as:

  link/ax25 98:92:9c:aa:b0:40:02 brd a2:a6:a8:40:40:40:00

which is pretty unreadable.  With this commit ip will decode AX.25
addresses like

  link/ax25 LINUX-1 brd QST-0

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:34 -06:00
Ralf Baechle 3a92669b3a AX.25: Add ax25_ntop implementation.
AX.25 addresses are based on Amateur radio callsigns followed by an SSID
like XXXXXX-SS where the callsign is up to 6 characters which are either
letters or digits and the SSID is a decimal number in the range 0..15.
Amateur radio callsigns are assigned by a country's relevant authorities
and are 3..6 characters though a few countries have assigned callsigns
longer than that.  AX.25 is not able to handle such longer callsigns.

Being based on HDLC AX.25 encodes addresses by shifting them one bit left
thus zeroing bit 0, the HDLC extension bit for all but the last bit of
a packet's address field but for our purposes here we're not considering
the HDLC extension bit that is it will always be zero.

Linux' internal representation of AX.25 addresses in Linux is very similar
to this on the on-air or on-the-wire format.  The callsign is padded to
6 octets by adding spaces, followed by the SSID octet then all 7 octets
are left-shifted by one byte.

This for example turns "LINUX-1" where the callsign is LINUX and SSID is 1
into 98:92:9c:aa:b0:40:02.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:30 -06:00
Andrea Claudi 2f5825cb38 lib: bpf_legacy: fix bpffs mount when /sys/fs/bpf exists
bpf selftests using iproute2 fails with:

$ ip link set dev veth0 xdp object ../bpf/xdp_dummy.o section xdp_dummy
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program

This happens when the /sys/fs/bpf directory exists. In this case, mkdir
in bpf_mnt_check_target() fails with errno == EEXIST, and the function
returns -1. Thus bpf_get_work_dir() does not call bpf_mnt_fs() and the
bpffs is not mounted.

Fix this in bpf_mnt_check_target(), returning 0 when the mountpoint
exists.

Fixes: d4fcdbbec9 ("lib/bpf: Fix and simplify bpf_mnt_check_target()")
Reported-by: Mingyu Shi <mshi@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:30:52 -07:00
Puneet Sharma d756c08a3d tc/f_flower: fix port range parsing
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.

$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:28:48 -07:00
Gokul Sivakumar ebbb701714 lib: bpf_legacy: add prog name, load time, uid and btf id in prog info dump
The BPF program name is included when dumping the BPF program info and the
kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes for the program
name.

$ sudo ip link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited

The BPF program load time (ns since boottime), UID of the user who loaded
the program and the BTF ID are also included when dumping the BPF program
information when the user expects a detailed ip link info output.

$ sudo ip -details link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filt
ering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:4c:df:a4:54 designated_root 8000.2:42:4c:df:a4:54 root_port 0 r
oot_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_chan
ge_timer    0.00 gc_timer  265.36 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask
0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast
_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_
interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query
_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_v
ersion 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues
1 gso_max_size 65536 gso_max_segs 65535
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited load_time 2676682607316255 created_by_uid 0 btf_id 708

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-21 09:16:32 -06:00
David Ahern 75c5054e7a Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-14 10:46:48 -06:00
Stephen Hemminger 92e32f7791 uapi: updates from 5.15-rc1
Small changes to virtio etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-13 15:07:58 -07:00
Lahav Schlesinger 0431e1e724 ip: Support filter links/neighs with no master
Commit d3432bf10f17 ("net: Support filtering interfaces on no master")
in the kernel added support for filtering interfaces/neighbours that
have no master interface.

This patch completes it and adds this support to iproute2:
1. ip link show nomaster
2. ip address show nomaster
3. ip neighbour {show | flush} nomaster

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:17:18 -06:00
Lennert Buytenhek 12b3d6a2ad man: ip-macsec: fix gcm-aes-256 formatting issue
The 'ip link add' invocation template at the top of the ip-macsec man
page formats with a pair of extra double quotes:

   ip  link  add  link DEVICE name NAME type macsec [ [ address <lladdr> ]
   port PORT | sci <u64> ]  [  cipher  {  default  |  gcm-aes-128  |  gcm-
   aes-256"}][" icvlen ICVLEN ] [ encrypt { on | off } ] [ send_sci { on |

This is due to missing whitespace around the gcm-aes-256 identifier
in the source file.

Fixes: b16f525323 ("Add support for configuring MACsec gcm-aes-256 cipher type.")
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:13:26 -06:00
David Ahern 917d913b2e Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-08 15:13:49 -06:00
David Ahern d0cba0d1f6 Merge branch 'bridge-mcast_router' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan port/bridge multicast router option. It is
similar to the already existing bridge-wide mcast_router control. Patch 01
moves attribute adding and parsing together for vlan option setting,
similar to global vlan option setting. It simplifies adding new options
because we can avoid reserved values and additional checks. Patch 02
adds the new mcast_router option and updates the related man page.

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Note that this set depends on the latest kernel uapi headers.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:03:58 -06:00
Nikolay Aleksandrov ae895504c6 bridge: vlan: add support for mcast_router option
Add support for setting and dumping per-vlan/interface mcast_router
option. It controls the mcast router mode of a vlan/interface pair.
For bridge devices only modes 0 - 2 are allowed. The possible modes
are:
 0 - disabled
 1 - automatic router presence detection (default)
 2 - permanent router
 3 - temporary router (available only for ports)

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
Nikolay Aleksandrov 12fbe3e4eb bridge: vlan: set vlan option attributes while parsing
Set vlan option attributes immediately while parsing to simplify the
checks, avoid having reserved values (e.g. -1 for unset var) and have
more limited scope for the variables. This is also similar to how global
vlan options are set. The attribute setting and checks are moved with
option parsing, no functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
David Ahern db28c944d8 Update kernel headers
Update kernel headers to commit:
    27151f177827 ("Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:59:38 -06:00
Stephen Hemminger 6d676ad934 ip: rewrite routel in python
Not sure if anyone uses the routel script. The script was
a combination of ip route, shell and awk doing command scraping.
It is now possible to do this much better using the JSON
output formats and python.

Rewriting also fixes the bug where the old script could not parse
the current output format.  At the end was getting:
/usr/bin/routel: 48: shift: can't shift that many

The new script also has IPv6 as option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:24 -06:00
Stephen Hemminger 1eaebad2c5 ip: remove routef script
This script is old and limited to IPv4.
Using ip route command directly is better option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:23 -06:00
Stephen Hemminger adddf30cd8 ip: remove ifcfg script
This script was from olden days of ifcfg.
I don't see any distribution using it and it is time to put
it out to pasture.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:19 -06:00
Stephen Hemminger 2c8110881b ip: remove old rtpr script
This script was a one off hack for a special case.
Now that ip commands have better formatting, there is no
real reason for it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:30:36 -06:00
David Marchand e7e0e2ce65 iptuntap: fix multi-queue flag display
When creating a tap with multi_queue flag, this flag is not displayed
when dumping:

$ ip tuntap add tap23 mode tap multi_queue
$ ip tuntap
tap23: tap persist0x100

While at it, add a space between known flags and hexdump of unknown
ones.

Fixes: c41e038f48 ("iptuntap: allow creation of multi-queue tun/tap device")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:41:17 -07:00
Nikolay Aleksandrov deef844b1e man: ip-link: remove double of
Remove double "of".

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:40:36 -07:00
Luca Boccassi a3272b9372 configure: restore backward compatibility
Commit a9c3d70d90 broke backward compatibility
by making 'configure' error out if parameters are passed, instead of
ignoring them.
Sometimes packaging systems detect 'configure' and assume it's from
autotools, and pass a bunch of options. Eg:

 dh_auto_configure
	./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking

Ignore unknown options again instead of erroring out.

Fixes: a9c3d70d90 ("configure: add options ability")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Luca Boccassi ceba59308d tree-wide: fix some typos found by Lintian
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Stephen Hemminger 7a70524270 ip: remove leftovers from IPX and DECnet
Iproute2 has not supported DECnet or IPX since version 5.0.
There were some leftover support in the ip options flags
and parsing, remove these.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:03:53 -07:00
Stephen Hemminger 8ab1834e56 uapi: update headers from 5.15 merge
New headers from 5.15 early merge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:02:50 -07:00
Hangbin Liu 6d0d35bab9 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-09-01 12:51:44 -07:00
David Ahern 926ad64104 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Ilya Dmitrichenko c730bd0b11 ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman df8912ede2 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Peilin Ye 7e7270bb1f tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 86c596ed91 IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 2d83c71082 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman f0b3808afa Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern acbdef9386 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern 2d6fa30bb8 Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Gokul Sivakumar 508ad89c82 ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern fb843668fb Merge branch 'bridge-vlan-global-mcast' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan multicast options. The feature is
globally controlled by a new bridge option called mcast_vlan_snooping
which is added by patch 01. Then patches 2-5 add support for dumping
global vlan options and filtering on vlan id. Patch 06 adds support for
setting global vlan options and then patches 07-18 add all the new
global vlan options, finally patch 19 adds support for dumping vlan
multicast router ports. These options are identical in meaning, names and
functionality as the bridge-wide ones.

All the new vlan global commands are under the global keyword:
 $ bridge vlan global show [ vid VID dev DEVICE ]
 $ bridge vlan global set vid VID dev DEVICE ...

I've added command examples in each commit message. The patch-set is a
bit bigger but the global options follow the same pattern so I don't see
a point in breaking them. All man page descriptions have been taken from
the same current bridge-wide mcast options. The only additional iproute2
change which is left to do is the per-vlan mcast router control which
I'll send separately. Note to properly use this set you'll need the
updated kernel headers where mcast router was moved from a global option
to per-vlan/per-device one (changed uapi enum which was in net-next).

Example:
 # enable vlan mcast snooping globally
 $ ip link set dev bridge type bridge mcast_vlan_snooping 1
 # enable mcast querier on vlan 100
 $ bridge vlan global set dev bridge vid 100 mcast_querier 1
 # show vlan 100's global options
 $ bridge -s vlan global show vid 100
port              vlan-id
bridge            100
                    mcast_snooping 1 mcast_querier 1 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000

A following kernel patch-set will add selftests which use these commands.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:32:31 -06:00
Nikolay Aleksandrov 72222cd467 bridge: vlan: add support for dumping router ports
Add dump support for vlan multicast router ports and their details if
requested. If details are requested we print 1 entry per line, otherwise
we print all router ports on a single line similar to how mdb prints
them.

Looks like:
$ bridge vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20 ens16

Looks like (with -s):
 $ bridge -s vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20   187.57 temp
                                   ens16   118.27 temp

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:31 -06:00
Nikolay Aleksandrov 7ad5505bb5 bridge: vlan: add global mcast_querier option
Add control and dump support for the global mcast_querier option which
controls if the bridge will act as a multicast querier for that vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_querier 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:26 -06:00
Nikolay Aleksandrov 061da2e222 bridge: vlan: add global mcast_startup_query_interval option
Add control and dump support for the global mcast_startup_query_interval
option which controls the interval between queries in the startup phase.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when setting
it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_interval 15000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:12 -06:00
Nikolay Aleksandrov 60dcd5c318 bridge: vlan: add global mcast_query_response_interval option
Add control and dump support for the global mcast_query_response_interval
option which sets the Max Response Time/Maximum Response Delay for IGMP/MLD
queries sent by the bridge. To be consistent with the same bridge-wide
option the value is reported with USER_HZ granularity and the same
granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_response_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:47 -06:00
Nikolay Aleksandrov 0e4cfa0370 bridge: vlan: add global mcast_query_interval option
Add control and dump support for the global mcast_query_interval
option which controls the interval between queries sent by the bridge
after the end of the startup phase. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:29 -06:00
Nikolay Aleksandrov ebcee09ca1 bridge: vlan: add global mcast_querier_interval option
Add control and dump support for the global mcast_querier_interval
option which controls the interval after which if no other router
queries are seen the bridge will start sending its own queries.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when
setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_querier_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:12 -06:00
Nikolay Aleksandrov 3ae784f589 bridge: vlan: add global mcast_membership_interval option
Add control and dump support for the global mcast_membership_interval
option which controls the interval after which the bridge will leave a
group if no reports have been received for it. To be consistent with the
same bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 26000 (260 seconds).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_membership_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:56 -06:00
Nikolay Aleksandrov 2b6cc38d52 bridge: vlan: add global mcast_last_member_interval option
Add control and dump support for the global mcast_last_member_interval
option which controls the interval between queries to find remaining
members of a group after a leave message. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 100 (1 second).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_interval 200

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:45 -06:00
Nikolay Aleksandrov 7cc7dbf447 bridge: vlan: add global mcast_startup_query_count option
Add control and dump support for the global mcast_startup_query_count
option which controls the number of queries the bridge will send on the
vlan during startup phase (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_count 5

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:28 -06:00
Nikolay Aleksandrov 3399c0759f bridge: vlan: add global mcast_last_member_count option
Add control and dump support for the global mcast_last_member_count option
which controls the number of queries the bridge will send on the vlan after
a leave is received (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_count 10

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:26:07 -06:00
Nikolay Aleksandrov a8d7212a4f bridge: vlan: add global mcast_mld_version option
Add control and dump support for the global mcast_mld_version option
which controls the MLD version on the vlan (default 1).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_mld_version 2

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:25:17 -06:00
Nikolay Aleksandrov 29fada0f41 bridge: vlan: add global mcast_igmp_version option
Add control and dump support for the global mcast_igmp_version option
which controls the IGMP version on the vlan (default 2).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_igmp_version 3

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:24:09 -06:00
Nikolay Aleksandrov 1f608d590c bridge: vlan: add global mcast_snooping option
Add control and dump support for the global mcast_snooping option which
controls if multicast snooping is enabled or disabled for a single vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_snooping 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:23:26 -06:00
Nikolay Aleksandrov dee5eb05e5 bridge: vlan: add support to set global vlan options
Add support to change global vlan options via a new vlan global
set subcommand similar to the current vlan set subcommand. The man page
and help are updated accordingly. The command works only with bridge
devices. It doesn't support any options yet.

Syntax: $ bridge vlan global set vid VID dev DEV

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:13 -06:00
Nikolay Aleksandrov ecf6d8b4a1 bridge: vlan: add support for vlan filtering when dumping options
In order to allow vlan filtering when dumping options we need to move
all print operations into the option dumping functions and add the
filtering after we've parsed the nested attributes so we can extract the
start and end vlan ids.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:09 -06:00
Nikolay Aleksandrov 720f8613bd bridge: vlan: add support to show global vlan options
Add support for new bridge vlan command grouping called global which
operates on global options. The first command it supports is "show".
To do that we update print_vlan_rtm to recognize the global vlan options
attribute and parse it properly.
Man page and help are also updated with the new command.

Syntax is: $ bridge vlan global show [ vid VID ] [ dev DEV ]

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:04 -06:00
Nikolay Aleksandrov d3a961a9b1 bridge: vlan: skip unknown attributes when printing options
Skip unknown attributes when printing vlan options in print_vlan_rtm.
Make sure print_vlan_opts doesn't accept attributes it doesn't understand.
Currently we print only one type, later global vlan options support will
be added.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:00 -06:00
Nikolay Aleksandrov 312e22fe79 bridge: vlan: factor out vlan option printing
Factor out the code which prints current per-vlan options from
print_vlan_rtm without any changes, later we'll filter based on the vlan
attribute and add support for global vlan option printing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:53 -06:00
Nikolay Aleksandrov d2eecb9d1d ip: bridge: add support for mcast_vlan_snooping
Add support for mcast_vlan_snooping option which controls per-vlan
multicast snooping, also update the man page.
Syntax: $ ip link set dev bridge type bridge mcast_vlan_snooping 0/1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:03 -06:00
Stephen Hemminger 169f36a0c9 v5.14.0 2021-08-31 11:57:59 -07:00
Jakub Kicinski 85b0e73c77 ss: fix fallback to procfs for raw sockets
Jonas reports that ss -awp does not display any RAW sockets
on a Knoppix 4.4 kernel.

sockdiag_send() diverts to tcpdiag_send() to try the older
netlink interface. tcpdiag_send() works for TCP and DCCP
but not other protocols. Instead of rejecting unsupported
protocols (and missing RAW and SCTP) match on supported ones.

Link: https://lore.kernel.org/netdev/20210815231738.7b42bad4@mmluhan/
Reported-and-tested-by: Jonas Bechtel <post@jbechtel.de>
Fixes: 41fe6c34de ("ss: Add inet raw sockets information gathering via netlink diag interface")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 15:03:46 -07:00
Stephen Hemminger 1afde09498 uapi: update neighbour.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:09:34 -07:00
Gokul Sivakumar 10ecd12690 man: bridge: fix the typo to change "-c[lor]" into "-c[olor]" in man page
Fixes: 3a1ca9a5b ("bridge: update man page for new color and json changes")
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 057d3c6d37 bridge: fdb: don't colorize the "dev" & "dst" keywords in "bridge -c fdb"
To be consistent with the colorized output of "ip" command and to increase
readability, stop highlighting the "dev" & "dst" keywords in the colorized
output of "bridge -c fdb" cmd.

Example: in the following "bridge -c fdb" entry, only "00:00:00:00:00:00",
"vxlan100" and "2001:db8:2::1" fields should be highlighted in color.

00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1 self permanent

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 82149efee9 bridge: reorder cmd line arg parsing to let "-c" detected as "color" option
As per the man/man8/bridge.8 page, the shorthand cmd line arg "-c" can be
used to colorize the bridge cmd output. But while parsing the args in while
loop, matches() detects "-c" as "-compressedvlans" instead of "-color", so
fix this by doing the check for "-color" option first before checking for
"-compressedvlans".

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Hangbin Liu 3a09567f7d ip/bond: add arp_validate filter support
Add arp_validate filter support based on kernel commit 896149ff1b2c
("bonding: extend arp_validate to be able to receive unvalidated arp-only traffic")

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:02:44 -07:00
Parav Pandit 355c49ffa5 devlink: Show port state values in man page and in the help command
Port function state can have either of the two values - active or
inactive. Update the documentation and help command for these two
values to tell user about it.

With the introduction of state, hw_addr and state are optional.
Hence mark them as optional in man page that also aligns with the help
command output.

Fixes: bdfb9f1bd6 ("devlink: Support set of port function state")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-11 15:02:30 -07:00
Hangbin Liu ebaa603b30 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-08-11 12:26:20 -06:00
David Ahern 8d6134b204 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:23:33 -06:00
Ilya Dmitrichenko 51d8fc708c ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:17:52 -06:00
Justin Iurman 71ba9c18e0 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:16:09 -06:00
Phil Sutter 9b7ea92b9e tc: u32: Fix key folding in sample option
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:02:43 -07:00
Andrea Claudi d1eacf12b5 lib: bpf_glue: remove useless assignment
The value of s used inside the cycle is the result of strstr(), so this
assignment is useless.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:01:54 -07:00
Andrea Claudi 50a4127022 lib: bpf_legacy: fix potential NULL-pointer dereference
If bpf_map_fetch_name() returns NULL, strlen() hits a NULL-pointer
dereference on outer_map_name.

Fix this checking outer_map_name value, and returning false when NULL,
as already done for inner_map_name before.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:55:12 -07:00
Jacob Keller 954a0077c8 devlink: fix infinite loop on flash update for drivers without status
When processing device flash update, cmd_dev_flash function waits until
the flash process has completed. This requires the following two
conditions to both be true:

a) we've received an exit status from the child process
b) we've received the DEVLINK_CMD_FLASH_UPDATE_END *or*
   we haven't received any status notifications from the driver.

The original devlink flash status monitoring code in 9b13cddfe2
("devlink: implement flash status monitoring") was written assuming that
a driver will either send no status updates, or it will send at least
one DEVLINK_CMD_FLASH_UPDATE_STATUS before DEVLINK_CMD_FLASH_UPDATE_END.

Newer versions of the kernel since commit 52cc5f3a166a ("devlink: move flash
end and begin to core devlink") in v5.10 moved handling of the
DEVLINK_CMD_FLASH_UPDATE_END into the core stack, and will send this
regardless of whether or not the driver sends any of its own status
notifications.

The handling of DEVLINK_CMD_FLASH_UPDATE_END in cmd_dev_flash_status_cb
has an additional condition that it must not be the first message.
Otherwise, it falls back to treating it like
a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is wrong because it can lead to an infinite loop if a driver does
not send any status updates.

In this case, the kernel will send DEVLINK_CMD_FLASH_UPDATE_END without
any DEVLINK_CMD_FLASH_UPDATE_STATUS. The devlink application will see
that ctx->not_first is false, and will treat this like any other status
message. Thus, ctx->not_first will be set to 1.

The loop condition to exit flash update will thus never be true, since
we will wait forever, because ctx->not_first is true, and
ctx->received_end is false.

This leads to the application appearing to process the flash update, but
it will never exit.

Fix this by simply always treating DEVLINK_CMD_FLASH_UPDATE_END the same
regardless of whether its the first message or not.

This is obviously the correct thing to do: once we've received the
DEVLINK_CMD_FLASH_UPDATE_END the flash update must be finished. For new
kernels this is always true, because we send this message in the core
stack after the driver flash update routine finishes.

For older kernels, some drivers may not have sent any
DEVLINK_CMD_FLASH_UPDATE_STATUS or DEVLINK_CMD_FLASH_UPDATE_END. This is
handled by the while loop conditional that exits if we get a return
value from the child process without having received any status
notifications.

An argument could be made that we should exit immediately when we get
either the DEVLINK_CMD_FLASH_UPDATE_END or an exit code from the child
process. However, at a minimum it makes no sense to ever process
DEVLINK_CMD_FLASH_UPDATE_END as if it were a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is easy to test as it is triggered by the selftests for the
netdevsim driver, which has a test case for both with and without status
notifications.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:54:39 -07:00
Feng Zhou be99929d60 lib/bpf: Fix btf_load error lead to enable debug log
Use tc with no verbose, when bpf_btf_attach fail,
the conditions:
"if (fd < 0 && (errno == ENOSPC || !ctx->log_size))"
will make ctx->log_size != 0. And then, bpf_prog_attach,
ctx->log_size != 0. so enable debug log.
The verifier log sometimes is so chatty on larger programs.
bpf_prog_attach is failed.
"Log buffer too small to dump verifier log 16777215 bytes (9 tries)!"

BTF load failure does not affect prog load. prog still work.
So when BTF/PROG load fail, enlarge log_size and re-fail with
having verbose.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:53:54 -07:00
Peilin Ye e78411948d tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-08 11:56:55 -06:00
David Ahern 09d8ce3db1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-04 09:24:12 -06:00
David Ahern e8763fc9ab Merge branch 'ipv6-oam' into next
Justin Iurman says:

====================

The IOAM patchset was merged recently (see net-next commits [1,2,3,4,5,6]).
Therefore, this patchset provides support for IOAM inside iproute2, as well as
manpage documentation. Here is a summary of added features inside iproute2.

(1) configure IOAM namespaces and schemas:

$ ip ioam
Usage:  ip ioam { COMMAND | help }
        ip ioam namespace show
        ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
        ip ioam namespace del ID
        ip ioam schema show
        ip ioam schema add ID DATA
        ip ioam schema del ID
        ip ioam namespace set ID schema { ID | none }

(2) provide a new encap type to insert the IOAM pre-allocated trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

  [1] db67f219fc9365a0c456666ed7c134d43ab0be8a
  [2] 9ee11f0fff205b4b3df9750bff5e94f97c71b6a0
  [3] 8c6f6fa6772696be0c047a711858084b38763728
  [4] 3edede08ff37c6a9370510508d5eeb54890baf47
  [5] de8e80a54c96d2b75377e0e5319a64d32c88c690
  [6] 968691c777af78d2daa2ee87cfaeeae825255a58

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:34:09 -06:00
Justin Iurman 78832863ef IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:35 -06:00
Justin Iurman 32f4969d44 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:31 -06:00
Justin Iurman 2909812583 Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:05 -06:00
David Ahern e53f4cd504 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:32:26 -06:00
David Ahern 236696e52c Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:25:09 -06:00
Gokul Sivakumar cf866f0a5a ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:14:50 -06:00
Peilin Ye c06d313d86 tc/skbmod: Remove misinformation about the swap action
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."

This is false.  In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored.  As an example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		matchall action skbmod \
        	set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
        	swap mac

The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's.  The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():

	parm = nla_data(tb[TCA_SKBMOD_PARMS]);
	index = parm->index;
	if (parm->flags & SKBMOD_F_SWAPMAC)
		lflags = SKBMOD_F_SWAPMAC;
		^^^^^^^^^^^^^^^^^^^^^^^^^^

Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap".  Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.

If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-22 15:14:29 -07:00
Roi Dayan 71d36000dc police: Fix normal output back to what it was
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.

Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:14:30 -07:00
Lahav Schlesinger f760bff328 ipmonitor: Fix recvmsg with ancillary data
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().

This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.

This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.

Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:13:36 -07:00
Stephen Hemminger 7a7e9ed98f uapi: headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:12:47 -07:00
Christian Schürmann 1f2c908d53 man8/ip-tunnel.8: fix typo, 'encaplim' is not a valid option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-15 09:31:51 -07:00
Alexander Mikhalitsyn 115e987035 libnetlink: check error handler is present before a call
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.

Fixes: 459ce6e3d7 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-11 10:33:44 -07:00
Stephen Hemminger 0015ada629 libnetlink: cosmetic changes
Don't initialize arguments that are NULL, and format initialization
in a more logical way.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:39:07 -07:00
Alexander Mikhalitsyn 459ce6e3d7 ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".

Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated

Expected result here is to get empty dump file (as it was before this
change).

v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.

v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).

v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions

v5: space fixes, commit message reformat, empty initializers

Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:32:56 -07:00
Stephen Hemminger 8f85d085fe uapi: update kernel headers from 5.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 17:07:24 -07:00
Martynas Pumputis 83d4d61bc9 libbpf: fix attach of prog with multiple sections
When BPF programs which consists of multiple executable sections via
iproute2+libbpf (configured with LIBBPF_FORCE=on), we noticed that a
wrong section can be attached to a device. E.g.:

    # tc qdisc replace dev lxc_health clsact
    # tc filter replace dev lxc_health ingress prio 1 \
        handle 1 bpf da obj bpf_lxc.o sec from-container
    # tc filter show dev lxc_health ingress filter protocol all
        pref 1 bpf chain 0 filter protocol all pref 1 bpf chain 0
        handle 0x1 bpf_lxc.o:[__send_drop_notify] <-- WRONG SECTION
        direct-action not_in_hw id 38 tag 7d891814eda6809e jited

After taking a closer look into load_bpf_object() in lib/bpf_libbpf.c,
we noticed that the filter used in the program iterator does not check
whether a program section name matches a requested section name
(cfg->section). This can lead to a wrong prog FD being used to attach
the program.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 16:59:39 -07:00
David Ahern 02c06ffc13 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-07-01 14:29:42 +00:00
Stephen Hemminger fc3511962d lib: remove blank line at eof
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 13:20:44 -07:00
Stephen Hemminger 0e7ea3e8fe v5.13.0 2021-06-29 11:24:17 -07:00
Ben Hutchings 33cf9306c8 devlink: Fix printf() type mismatches on 32-bit architectures
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.

Fix this by using the standard macro PRIu64 instead.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Ben Hutchings 4ac0383a59 utils: Fix BIT() to support up to 64 bits on all architectures
devlink and vdpa use BIT() together with 64-bit flag fields.  devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.

Fix this by making BIT() use uint64_t instead of unsigned long.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Stephen Hemminger c73fb66070 uapi: update headers to 5.13
Final 5.13 header update

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-28 10:19:08 -07:00
Roi Dayan 6f15c21719 devlink: Fix link errors on some systems
On some systems we fail to link because of missing math lib.
add -lm to devlink.

    LINK     devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2

Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-26 14:57:27 -07:00
Asbjørn Sloth Tønnesen 2ff4761db4 tc: pedit: add decrement operation
Implement a decrement operation for ttl and hoplimit.

Since this is just syntactic sugar, it goes that:

  tc filter add ... action pedit ex munge ip ttl dec ...
  tc filter add ... action pedit ex munge ip6 hoplimit dec ...

is just a more readable version of this:

  tc filter add ... action pedit ex munge ip ttl add 0xff ...
  tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...

This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.

Tested with skip_sw on Mellanox ConnectX-6 Dx.

[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989

v3:
   - Use dedicated flags argument in parse_cmd() (David Ahern)
   - Minor rewording of the man page

v2:
   - Fix whitespace issue (Stephen Hemminger)
   - Add to usage info in explain()

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:45:19 +00:00
Asbjørn Sloth Tønnesen bc5e8473aa tc: pedit: parse_cmd: add flags argument
This patch just prepares the flags argument, so it's
available to the next patch.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:44:35 +00:00
Sergey Ryazanov 6acccd52a2 iplink: support for WWAN devices
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.

The WWAN interface for a new data channel should be created with a
command like this:

ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2

Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:57 +00:00
Sergey Ryazanov 362da458a4 iplink: add support for parent device
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".

Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:22 +00:00
David Ahern 083e2706e1 Import wwan.h uapi file
Import wwan.h uapi file at version from last kernel headers sync.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:39:47 +00:00
Stephen Hemminger 8316825a52 man: fix syntax for ip link property
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-24 11:54:04 -07:00
Paolo Lungaroni 3e26254f31 seg6: add support for SRv6 End.DT46 Behavior
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.

As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.

To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:

 $ sysctl -wq net.vrf.strict_mode=1

Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.

An SRv6 End.DT46 Behavior instance can be created as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6local",
        "action": "End.DT46",
        "vrftable": 100,
        "dev": "vrf100",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:36:17 +00:00
David Ahern 1d11326a57 Update kernel headers
Update kernel headers to commit:
    ef2c3ddaa4ed ("ibmvnic: Use strscpy() instead of strncpy()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:33:45 +00:00
Guillaume Nault f8879e85f0 utils: bump max args number to 512 for batch files
Large tc filters can have many arguments. For example the following
filter matches the first 7 MPLS LSEs, pops all of them, then updates
the Ethernet header and redirects the resulting packet to eth1.

filter add dev eth0 ingress handle 44 priority 100 \
  protocol mpls_uc flower mpls                     \
    lse depth 1 label 1040076 tc 4 bos 0 ttl 175   \
    lse depth 2 label 89648 tc 2 bos 0 ttl 9       \
    lse depth 3 label 63417 tc 5 bos 0 ttl 185     \
    lse depth 4 label 593135 tc 5 bos 0 ttl 67     \
    lse depth 5 label 857021 tc 0 bos 0 ttl 181    \
    lse depth 6 label 239239 tc 1 bos 0 ttl 254    \
    lse depth 7 label 30 tc 7 bos 1 ttl 237        \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol ipv6 pipe               \
  action vlan pop_eth pipe                         \
  action vlan push_eth                             \
    dst_mac 00:00:5e:00:53:7e                      \
    src_mac 00:00:5e:00:53:03 pipe                 \
  action mirred egress redirect dev eth1

This filter has 149 arguments, so it can't be used with tc -batch
which is limited to a 100.

Let's bump the limit to 512. That should leave a lot of room for big
batch commands.

v2:
   -Define the limit in utils.h (Stephen Hemminger)
   -Bump the limit even higher (256 -> 512) (Stephen Hemminger)

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-18 02:57:05 +00:00
Stephen Hemminger e1d3ac755d uapi: update kernel headers to 5.13-rc6
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-17 15:54:05 -07:00
David Ahern d8b3b9d32d Merge branch 'devlink-rate-support' into next
Dmytro Linkin says:

====================

Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object

Examples:

Display all rate objects:

    # devlink port function rate show
    pci/0000:03:00.0/1 type leaf parent some_group
    pci/0000:03:00.0/2 type leaf tx_share 12Mbit
    pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps

Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:

    # devlink port function rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/1 type leaf

Display node rate object with name some_group of the pci/0000:03:00.0
device:

    # devlink port function rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group type node

Display leaf rate object rate values using IEC units:

    # devlink -i port function rate show pci/0000:03:00.0/2
    pci/0000:03:00.0/2 type leaf 11718Kibit

Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:

    # devlink -jp port function rate show pci/0000:03:00.0/2
    {
        "rate": {
            "pci/0000:03:00.0/2": {
                "type": "leaf",
                "tx_share": 1500000
            }
        }
    }

Create node rate object with name "1st_group" on pci/0000:03:00.0 device:

    # devlink port function rate add pci/0000:03:00.0/1st_group

Create node rate object with specified parameters:

    # devlink port function rate add pci/0000:03:00.0/2nd_group \
        tx_share 10Mbit tx_max 30Mbit parent 1st_group

Set parameters to the specified leaf rate object:

    # devlink port function rate set pci/0000:03:00.0/1 \
        tx_share 2Mbit tx_max 10Mbit

Set leaf's parent to "1st_group":

    # devlink port function rate set pci/0000:03:00.0/1 parent 1st_group

Unset leaf's parent:

    # devlink port function rate set pci/0000:03:00.0/1 noparent

Delete node rate object:

    # devlink port function rate del pci/0000:03:00.0/2nd_group

Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:34 +00:00
Dmytro Linkin dedf895184 devlink: Add ISO/IEC switch
Add -i/--iec switch to print rate values using binary prefixes.
Update devlink(8) and devlink-rate(8) pages.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:13 +00:00
Dmytro Linkin 6c70aca76e devlink: Add port func rate support
Implement user commands to manage devlink port func rate objects.
List all rate commands:

    $ devlink port func rate help

or just

    $ devlink port func rate

To list all OR particular rate object:

    $ devlink port func rate show
    pci/0000:03:00.0/some_group: type node
    pci/0000:03:00.0/0: type leaf
    pci/0000:03:00.0/1: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/0: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group: type node

Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:

    $ devlink port func rate add pci/0000:03:00.0/some_group

or with defining tx rate limits

    $ devlink port func rate add pci/0000:03:00.0/some_group \
        tx_shara 10kbit tx_max 100mbit

NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.

To delete node object:

    $ devlink port func rate del pci/0000:03:00.0/some_group

Set rate limits of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 \
        tx_share 5MBps tx_max 25GBps

    $ devlink prot func rate set pci/0000:03:00.0/some_group \
        tx_share 0

Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.

NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.

NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.

Set or unset parent of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 parent some_group

    $ devlink port func rate set pci/0000:03:00.0/0 noparent

NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:06 +00:00
Dmytro Linkin 95339955c5 devlink: Add helper function to validate object handler
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:37:21 +00:00
David Ahern 85903c9a29 Update kernel headers
Update kernel headers to commit:
    76cf404c40ae ("Merge branch 'ipa-mem-2'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:38:23 +00:00
Parav Pandit fbd4b581cb devlink: Add optional controller user input
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.

An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev

$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:49 +00:00
Roi Dayan 0d5cf51e0d police: Add support for json output
Change to use the print wrappers instead of fprintf().

This is example output of the options part before this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1 police 0x2 ,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    }overhead 0b linklayer unspec
        ref 1 bind 1
,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

This is the output of the same dump with this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1,
                    "kind": "police",
                    "index": 2,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    },
                    "overhead": 0,
                    "linklayer": "unspec",
                    "ref": 1,
                    "bind": 1,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:36 +00:00
Eric Dumazet 52f136f640 tc: fq: add horizon attributes
Commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
added kernel support for horizon attributes in linux-5.8

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
 Sent 690924 bytes 3234 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 112 (inactive 104 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

$ tc qd change dev wlp2s0 root fq horizon 500ms horizon_cap

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 500ms horizon_cap
 Sent 831220 bytes 3844 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 122 (inactive 120 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-07 02:56:01 +00:00
Hangbin Liu 7ae2585b86 configure: convert LIBBPF environment variables to command-line options
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:59 +00:00
Hangbin Liu a9c3d70d90 configure: add options ability
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.

This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:11 +00:00
Roman Mashak 9d9b1a84a5 ss: update ss man page
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-01 15:55:06 -07:00
Ariel Levkovich 825bd5dacb tc: f_flower: Add missing ct_state flags to usage description
Add ct_state flags rpl and inv to the commands usage
description

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:40:05 +00:00
Ariel Levkovich 7fda6c588a tc: f_flower: Add option to match on related ct state
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state -est-rel+trk \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +rel+trk \
  action mirred egress redirect dev ens1f0_1

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:39:14 +00:00
Florian Westphal d3740fdc26 libgenl: make genl_add_mcast_grp set errno on error
genl_add_mcast_grp doesn't set errno in all cases.

On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with

    "can't subscribe to mptcp events: Success"

Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.

Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-05-17 11:59:37 -07:00
Heiko Thiery c5b72cc56b lib/fs: fix issue when {name,open}_to_handle_at() is not implemented
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-17 02:31:29 +00:00
David Ahern 62c88ed940 config.mk: Rerun configure when it is newer than config.mk
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
2021-05-17 02:13:56 +00:00
Jakub Kicinski 49437375b6 ip: dynamically size columns when printing stats
This change makes ip -s -s output size the columns
automatically. I often find myself using json
output because the normal output is unreadable.
Even on a laptop after 2 days of uptime byte
and packet counters almost overflow their columns,
let alone a busy server.

For max readability switch to right align.

Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8227918473 8617683  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    691937917  4727223  0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

After:

    RX:  bytes packets errors dropped  missed   mcast
    8228633710 8618408      0       0       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
     692006303 4727740      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0      10

More importantly, with large values before:

    RX: bytes  packets  errors  dropped overrun mcast
    126570234447969 15016149200 0       0       0       0
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    126570234447969 15016149200 0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

Note that in this case we have full shift by a column,
e.g. the value under "dropped" is actually for "errors" etc.

After:

    RX:       bytes     packets errors dropped  missed   mcast
    126570234447969 15016149200      0       0       0       0
    RX errors:           length    crc   frame    fifo overrun
                              0      0       0       0       0
    TX:       bytes     packets errors dropped carrier collsns
    126570234447969 15016149200      0       0       0       0
    TX errors:          aborted   fifo  window heartbt transns
                              0      0       0       0      10

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:51:59 +00:00
Paolo Lungaroni 02ca3aabe9 seg6: add counters support for SRv6 Behaviors
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

v2:
 - add help and route.8 man page updates

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:20:59 +00:00
Andrea Claudi e44786b269 tc: htb: improve burst error messages
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"

The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.

This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"

Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:13:22 +00:00
Andrea Claudi 28ee49e515 tipc: bail out if key is abnormally long
tipc segfaults when called with an abnormally long key:

$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated

Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Andrea Claudi 93c267bfb4 tipc: bail out if algname is abnormally long
tipc segfaults when called with an abnormally long algname:

$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated

Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Hoang Le 459f280813 tipc: call a sub-routine in separate socket
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.

Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Tyson Moore 0d95472a4b tc-cake: update docs to include LE diffserv
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.

Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:59:52 +00:00
Andrea Claudi 2d212aae55 dcb: fix memory leak
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.

Fix this using a goto, as it is already done in the same function.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:48:02 +00:00
Andrea Claudi cfd89a6f8b dcb: fix return value on dcb_cmd_app_show
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:47:57 +00:00
Andrea Claudi 3296d4fe77 lib: bpf_legacy: avoid to pass invalid argument to close()
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.

Avoid this checking for fd to be positive.

Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:43:54 +00:00
Andrea Claudi a2f1f66075 tc: q_ets: drop dead code from argument parsing
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
  parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
  parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
  specified and nbands is 0, terminating execution.

Thus nbands cannot be < 1 at this point and this code cannot be executed.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:42:44 +00:00
Jakub Kicinski 570d2cf0ec ip: align the name of the 'nohandler' stat
Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8848233056 8548168  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun   nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1142925945 4683483  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

After:

    RX: bytes  packets  errors  dropped missed  mcast
    8848297833 8548461  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1143049820 4683865  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:41:19 +00:00
David Ahern c3f852754f Update kernel headers
Update kernel headers to commit:
    8621436671f3 ("smc: disallow TCP_ULP in smc_setsockopt()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:16:04 +00:00
David Ahern c79fcefaaf Merge branch 'rdma-copy-on-fork' into next
Gal Pressman  says:

====================

This is the userspace part for the new copy-on-fork attribute added to
the get sys netlink command.

The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

Kernel series was merged:
https://lore.kernel.org/linux-rdma/20210418121025.66849-1-galpress@amazon.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:45:19 +00:00
Gal Pressman bce4247869 rdma: Add copy-on-fork to get sys command
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

If the attribute is not reported (expected on older kernels),
copy-on-fork is disabled.

Example:
$ rdma sys
netns shared copy-on-fork on

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:13 +00:00
Gal Pressman 212e2c1d0c rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
6cc9e215eb27 ("RDMA/nldev: Add copy-on-fork attribute to get sys command")

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:06 +00:00
Jianguo Wu 7f1d58d1a1 mptcp: make sure flag signal is set when add addr with port
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.

Fixes: 42fbca91cd ("mptcp: add support for port based endpoint")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-30 14:30:24 +00:00
David Ahern e1e089d1f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:48:28 +00:00
Jethro Beekman d56dcd3549 ip: Add nodst option to macvlan type source
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:45:59 +00:00
David Ahern e5f1505e53 Merge branch 'rdma-resource-tracking' into next
Leon Romanovsky  says:

====================

This is the user space part of already accepted to the kernel series
that extends RDMA netlink interface to return uverbs context and SRQ
information.

The accepted kernel series can be seen here:
https://lore.kernel.org/linux-rdma/20210422133459.GA2390260@nvidia.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:32 +00:00
Neta Ostrovsky 9b272e138d rdma: Add SRQ resource tracking information
Sample output:

$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon

$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon

$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon

Reviewed-by: Ido Kalir <idok@nvidia.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:16 +00:00
Neta Ostrovsky 4278941285 rdma: Add context resource tracking information
Sample output:

$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:59 +00:00
Neta Ostrovsky 4c61b5b9df rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit c6c11ad3ab9f
("RDMA/nldev: Add QP numbers to SRQ information")

Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:21 +00:00
David Ahern a5ea744ca2 Update kernel headers
Update kernel headers to commit:
    99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:35:30 +00:00
Stephen Hemminger 2363bc99f9 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Required manual fix of devlink/devlink.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 19:39:39 -07:00
Stephen Hemminger 1fdea28051 v5.12.0 2021-04-27 11:59:09 -07:00
Stephen Hemminger a3fb3fcb7d remove trailing whitespace
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 11:55:53 -07:00
Andrea Claudi e1ad689545 lib: bpf_legacy: fix missing socket close when connect() fails
In functions bpf_{send,recv}_map_fds(), when connect fails after a
socket is successfully opened, we return with error missing a close on
the socket.

Fix this closing the socket if opened and using a single return point
for both the functions.

Fixes: 6256f8c9e4 ("tc, bpf: finalize eBPF support for cls and act front-end")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 92af24c907 lib: bpf_legacy: treat 0 as a valid file descriptor
As stated in the man page(), open returns a non-negative integer as a
file descriptor. Hence, when checking for its return value to be ok, we
should include 0 as a valid value.

This fixes a covscan warning about a missing close() in this function.

Fixes: ecb05c0f99 ("bpf: improve error reporting around tail calls")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 932fe3453f tc: e_bpf: fix memory leak in parse_bpf()
envp_run is dinamically allocated with a malloc, and not freed in the
out: return path. This commit fix it.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 38ef5bb7b4 ip: netns: fix missing netns close on some error paths
In functions netns_pids() and netns_identify_pid(), the netns file is
not closed on some error paths.

Fix this using a conditional close and a single return point on both
functions.

Fixes: 44b563269e ("ip-nexthop: support flush by id")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:04:02 -07:00
Nikolay Aleksandrov c72de3713d bridge: vlan: dump port only if there are any vlans
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).

Example (no vlans):
$ bridge -d vlan show
port              vlan-id

Fixes: e5f87c8341 ("bridge: vlan: add support for the new rtm dump call")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:32:46 +00:00
Tony Ambardar e705b19d48 ip: drop 2-char command assumption
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:

  $ ./ip-full help
  Object "-full" is unknown, try "ip help".

This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).

Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.

Fixes: 351efcde4e ("Update header files to 2.6.14")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:29:42 +00:00
Stephen Hemminger b5a6ed9cc9 uapi: add missing virtio related headers
The build of iproute2 relies on having correct copy of santized
kernel headers. The vdpa utility introduced a dependency on
the vdpa related headers, but these headers were not present
in iproute2 repo.

Fixes: c2ecc82b9d ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-23 10:36:17 -07:00
Andrea Claudi 81bfd01a4c lib: move get_task_name() from rdma
The function get_task_name() is used to get the name of a process from
its pid, and its implementation is similar to ip/iptuntap.c:pid_name().

Move it to lib/fs.c to use a single implementation and make it easily
reusable.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:22:16 +00:00
David Ahern 75a35d50a3 Merge branch 'bridge-vlan' into next
Nikolay Aleksandrov  says:

====================

From: Nikolay Aleksandrov <nikolay@nvidia.com>

This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.

Example:
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state forwarding
 bridge            1 PVID Egress Untagged
                     state forwarding

 $ bridge vlan set vid 1 dev ens13 state blocking
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state blocking
 bridge            1 PVID Egress Untagged
                     state forwarding

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:20:13 +00:00
Nikolay Aleksandrov c311404780 bridge: monitor: add support for vlan monitoring
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:39 +00:00
Nikolay Aleksandrov e5f87c8341 bridge: vlan: add support for the new rtm dump call
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.

Example:
$ bridge -s -d vlan show
port              vlan-id
virbr1            1 PVID Egress Untagged
                    state forwarding

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:34 +00:00
Nikolay Aleksandrov 34c14bea22 libnetlink: add bridge vlan dump request helper
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:29 +00:00
Nikolay Aleksandrov 04e2783d5e bridge: vlan: add option set command and state option
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.

Example:
 $ bridge vlan set vid 10 dev br0 state learning
or a range:
 $ bridge vlan set vid 10-20 dev swp1 state blocking

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:24 +00:00
Nikolay Aleksandrov f2f52fcabe bridge: add parse_stp_state helper
Add a helper which parses an STP state string to its numeric value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:20 +00:00
Nikolay Aleksandrov f07516c3b0 bridge: rename and export print_portstate
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:09 +00:00
Florian Westphal ff619e4fd3 mptcp: add support for event monitoring
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.

While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.

This adds print support for the existing events.

Sample output of 'ip mptcp monitor':
[       CREATED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[   ESTABLISHED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[SF_ESTABLISHED] token=83f3a692 remid=0 locid=1 saddr4=10.0.2.2 daddr4=10.0.1.1 sport=40195 dport=10011 backup=0
[        CLOSED] token=83f3a692

Signed-off-by: Florian Westphal <fw@strlen.de>
2021-04-22 05:10:25 +00:00
David Ahern 98040c2dc1 Update kernel headers
Update kernel headers to commit:
    5d869070569a ("net: phy: marvell: don't use empty switch default case")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:09:39 +00:00
Andrea Claudi c8216fabe8 rdma: stat: fix return code
libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
return codes, and stat_qp_show_parse_cb() should do the same.

Fixes: 16ce4d2366 ("rdma: stat: initialize ret in stat_qp_show_parse_cb()")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-20 18:08:38 -07:00
Andrea Claudi 16ce4d2366 rdma: stat: initialize ret in stat_qp_show_parse_cb()
In the unlikely case in which the mnl_attr_for_each_nested() cycle is
not executed, this function return an uninitialized value.

Fix this initializing ret to 0.

Fixes: 5937552b42 ("rdma: Add "stat qp show" support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6a2c51da99 nexthop: fix memory leak in add_nh_group_attr()
grps is dinamically allocated with a calloc, and not freed in a return
path in the for cycle. This commit fix it.

While at it, make the function use a single return point.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6801ae8273 q_cake: remove useless check on argv
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6b8fa2ea2d devlink: always check strslashrsplit() return value
strslashrsplit() return value is not checked in __dl_argv_handle(),
despite the fact that it can return EINVAL.

This commit fix it and make __dl_argv_handle() return error if
strslashrsplit() return an error code.

Fixes: 2f85a9c535 ("devlink: allow to parse both devlink and port handle in the same time")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Stephen Hemminger cc718c191b uapi: update can.h
Upstream commit to force packing on ARM OABI

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:14:34 -07:00
Stephen Hemminger 06d0bbf1ee erspan: fix JSON output
The format for erspan/erspan6 output is not valid JSON, as on version 2 a
valueless key was presented. The direction should be value and erspan_dir
should be the key.

Fixes: 2897636267 ("erspan: add erspan version II support")
Cc: u9012063@gmail.com
Reported-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-10 09:52:48 -07:00
Chunmei Xu 44b563269e ip-nexthop: support flush by id
since id is unique for nexthop, it is heavy to dump all nexthops.
use existing delete_nexthop to support flush by id

Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-08 15:38:58 +00:00
Hoang Le 2021028306 tipc: use the libmnl functions in lib/mnl_utils.c
To avoid code duplication, tipc should be converted to use the helper
functions for working with libmnl in lib/mnl_utils.c

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-03 01:10:54 +00:00
Stephen Hemminger e77a0d3dc9 uapi: bpf.h update from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-30 16:38:05 -07:00
Baowen Zheng cf9ae1bd31 police: add support for packet-per-second rate limiting
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.

e.g.
 # $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
 # $TC actions ls action police
 total acts 1

	action order 0:  police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
	ref 1 bind 0

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:50 +00:00
Cooper Lees 16430e9afd Add Open/R to rt_protos
- Open Routing is using ID 99 for it's installed routes
- https://github.com/facebook/openr
- Kernel has accepted 99 in `rtnetlink.h`

Signed-of-by: Cooper Lees <me@cooperlees.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:09 +00:00
Petr Machata 7384c15e0e ip: Fix batch processing
After the comment cited below, batch mode neglects to set the global
variable batch_mode to a non-zero value. Netns and VRF commands use this
variable, and break in batch mode. Fix by setting the value again.

Fixes: 1d9a81b8c9 ("Unify batch processing across tools")
Reported-by: Tim Rice <trice@posteo.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-22 16:30:21 -07:00
David Ahern 76bfc185f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:16:01 +00:00
Sabrina Dubroca 3c75135835 ip: xfrm: add support for tfcpad
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:15:07 +00:00
Stephen Hemminger 872689d431 uapi: minor header update for l2tp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:36:07 -07:00
Stephen Hemminger 87d6d395d1 README: remove doc instructions
The out of date documentation was removed in 2017, but the instructions
in the README were not removed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:29:02 -07:00
David Ahern fa505da84b Merge branch 'nexthop-resilient-hash' into next
Petr Machata  says:

====================

Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:05:29 +00:00
Ido Schimmel 2be6d18b30 nexthop: Add support for nexthop buckets
Add ability to dump multiple nexthop buckets and get a specific one.
Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop
 id 1 via 192.0.2.2 dev dummy10 scope link
 id 2 via 192.0.2.19 dev dummy20 scope link
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
 # ip nexthop bucket
 id 10 index 0 idle_time 28.1 nhid 2
 id 10 index 1 idle_time 28.1 nhid 2
 id 10 index 2 idle_time 28.1 nhid 2
 id 10 index 3 idle_time 28.1 nhid 2
 id 10 index 4 idle_time 28.1 nhid 1
 id 10 index 5 idle_time 28.1 nhid 1
 id 10 index 6 idle_time 28.1 nhid 1
 id 10 index 7 idle_time 28.1 nhid 1
 # ip nexthop bucket show nhid 1
 id 10 index 4 idle_time 53.59 nhid 1
 id 10 index 5 idle_time 53.59 nhid 1
 id 10 index 6 idle_time 53.59 nhid 1
 id 10 index 7 idle_time 53.59 nhid 1
 # ip nexthop bucket get id 10 index 5
 id 10 index 5 idle_time 81 nhid 1
 # ip -j -p nexthop bucket get id 10 index 5
 [ {
         "id": 10,
         "bucket": {
             "index": 5,
             "idle_time": 104.89,
             "nhid": 1
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:25 +00:00
Ido Schimmel 9167671822 nexthop: Add support for resilient nexthop groups
Add ability to configure resilient nexthop groups and show their current
configuration. Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop show id 10
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
 # ip -j -p nexthop show id 10
 [ {
         "id": 10,
         "group": [ {
                 "id": 1
             },{
                 "id": 2
             } ],
         "type": "resilient",
         "resilient_args": {
             "buckets": 8,
             "idle_timer": 120,
             "unbalanced_timer": 0
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:18 +00:00
Ido Schimmel b82d6b81fa nexthop: Add ability to specify group type
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.

These two commands are equivalent:

 # ip nexthop add id 10 group 1/2/3
 # ip nexthop add id 10 group 1/2/3 type mpath

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:49 +00:00
Petr Machata 28fb925d8b nexthop: Extract a helper to parse a NH ID
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:43 +00:00
Petr Machata e757f741e9 json_print: Add print_tv()
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:08 +00:00
David Ahern a5b355c08c Update kernel headers
Update kernel headers to commit:
    38cb57602369 ("selftests: net: forwarding: Fix a typo")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 14:59:17 +00:00
Stephen Hemminger 6639fce430 ip: cleanup help message text
Wrap help message text at 80 characters, and put list of things
in alpha order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-18 11:24:06 -07:00
Tony Ambardar 06bee37c1c lib/bpf: add missing limits.h includes
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.

It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:

bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
   46 |  char buf[PATH_MAX], *s;
      |           ^~~~~~~~
      |           AF_MAX

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:53 -07:00
Sabrina Dubroca 6050055387 ip: xfrm: limit the length of the security context name when printing
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.

While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.

Fixes: b2bb289a57 ("xfrm security context support")
Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:28 -07:00
David Ahern 27ca8989c1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-15 15:08:01 +00:00
Toke Høiland-Jørgensen 60204c81e4 q_cake: Fix incorrect printing of signed values in class statistics
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.

Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-08 19:05:19 -08:00
Roi Dayan 9f366536ed dcb: Fix compilation warning about reallocarray
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:56:39 -08:00
Luca Boccassi 6739068fb0 iproute: fix printing resolved localhost
format_host_rta_r might return a cached hostname
via its return value and not use the input buffer.

Before:

$ ip -resolve -6 route
 dev lo proto kernel metric 256 pref medium

After:

$ ip/ip -resolve -6 route
localhost dev lo proto kernel metric 256 pref medium

Bug-Debian: https://bugs.debian.org/983591

Reported-by: Axel Scheepers <axel.scheepers76@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:54:16 -08:00
Parav Pandit c54e7bd605 devlink: Add error print when unknown values specified
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:16 +00:00
Parav Pandit 62ff25e51b devlink: Use generic socket helpers from library
User generic socket helpers from library for netlink generic socket
access.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:10 +00:00
Parav Pandit e3a4067e52 utils: Introduce helper routines for generic socket recv
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.

Use API in subsequent devlink patch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:04 +00:00
Parav Pandit 03662000e4 devlink: Use library provided string processing APIs
User helper routines provided by library for counting slash and
splitting string on delimiter.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 03:59:58 +00:00
Paolo Abeni 42fbca91cd mptcp: add support for port based endpoint
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.

Just parse and dump an additional, per endpoint, u16 attribute

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:15:10 +00:00
David Ahern 455c9f5361 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:07:57 +00:00
Stephen Hemminger 9d00602f82 vdpa: add .gitignore
Ignore the resulting binary vdpa.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:12:14 -08:00
Stephen Hemminger 5e0e73c347 Update kernel headers from 5.12-pre rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:11:12 -08:00
Stephen Hemminger 52c5f3f043 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2021-02-23 23:03:42 -08:00
Stephen Hemminger bbddfcec6c v5.11.0 2021-02-23 09:34:11 -08:00
Andrea Claudi b2d44b9a95 lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.

get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b3 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi 1de363b180 lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.

This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.

Fixes: ac3415f5c1 ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi d4fcdbbec9 lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.

As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.

Fixes: 95ae9a4870 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
2021-02-22 18:19:01 -08:00
Andrea Claudi 1e25de9a92 lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.

$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
.  ..

This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.

Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()

Fixes: e998e118dd ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:17:56 -08:00
Andrea Claudi e833dbe140 ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.

Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.

Fixes: 0486388a87 ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:48 -08:00
Andrea Claudi 546f738220 tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.

Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.

Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:03 -08:00
Vladimir Oltean e1d79d49ed man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 14f528a556 man8/bridge.8: explain self vs master for "bridge fdb add"
The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean b64ceb687d man8/bridge.8: fix which one of self/master is default for "bridge fdb"
The bridge program does:

fdb_modify:
	/* Assume self */
	if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
		req.ndm.ndm_flags |= NTF_SELF;

which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 10130bfafe man8/bridge.8: explain what a local FDB entry is
Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean ae3cb3d34d man8/bridge.8: document that "local" is default for "bridge fdb add"
The bridge does this:

fdb_modify:
	/* Assume permanent */
	if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
		req.ndm.ndm_state |= NUD_PERMANENT;

So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 1261459c64 man8/bridge.8: document the "permanent" flag for "bridge fdb add"
The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:

fdb_modify:
		} else if (matches(*argv, "local") == 0 ||
			   matches(*argv, "permanent") == 0) {
			req.ndm.ndm_state |= NUD_PERMANENT;

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Ido Kalir 675e2df632 rdma: Fix statistics bind/unbing argument handling
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.

Let's check the user input and ensure that arguments are valid.

Fixes: a6d0773ebe ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 10:52:39 -08:00
Thayne McCombs c7897ec2a6 ss: Make leading ":" always optional for sport and dport
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".

This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-14 22:09:37 -07:00
Amit Cohen 33e2471e8f ip route: Print "rt_offload_failed" indication
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:50:15 -07:00
David Ahern 34de4b26bf Update kernel headers
Update kernel headers to commit:
    c4762993129f ("Merge branch 'skbuff-introduce-skbuff_heads-bulking-and-reusing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:48:05 -07:00
Oleksandr Mazur c946f5d3e4 devlink: add support for port params get/set
Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:21:24 -07:00
David Ahern 143610383d Merge branch 'vdpa' into next
Parav Pandit  says:

====================

Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.

vdpa interface also includes showing supported management devices
which support such operations.

This patchset includes kernel uapi headers and a vdpa tool.

examples:

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
  supported_classes
    net
pci/0000:03.00:4
  supported_classes
    net
auxiliary/mlx5_core.sf.8
  supported_classes
    net

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:16:49 -07:00
Parav Pandit c2ecc82b9d vdpa: Add vdpa tool
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:15 -07:00
Parav Pandit 6c76994982 utils: Add helper to map string to unsigned int
In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:10 -07:00
Parav Pandit b822275ad8 utils: Add generic socket helpers
Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family

Hence add helper routines to open, close, query family and to perform
send receive operations.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:07 -07:00
Parav Pandit bd3709c3a7 utils: Add helper routines for indent handling
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:08:13 -07:00
Parav Pandit 5a6bf92a95 Add kernel headers
Add kernel headers to commit from kernel tree [1].
  6acba4951632 ("vdpa_sim_net: Add support for user supported devices")

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:07:47 -07:00
Paul Blakey 049708a002 tc: flower: Add support for ct_state reply flag
Matches on conntrack rpl ct_state.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:28 -07:00
Maxim Mikityanskiy b8b8b6d4c9 tc/htb: Hierarchical QoS hardware offload
This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:

$ tc qdisc show dev eth1
qdisc htb 1: root offloaded r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 offload
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
$ tc class show dev eth1
class htb 1:101 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:1 root rate 100Gbit ceil 100Gbit burst 0b cburst 0b  offload
class htb 1:103 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:102 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:105 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:104 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:107 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:106 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:108 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
$ tc -j qdisc show dev eth1
[{"kind":"htb","handle":"1:","root":true,"offloaded":true,"options":{"r2q":10,"default":"0","direct_packets_stat":0,"direct_qlen":1000,"offload":null}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}}]

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:13 -07:00
Thayne McCombs b7e5002456 ss: always prefer family as part of host condition to default family
ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.

This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:48:16 -07:00
Stephen Hemminger 2741208502 uapi: pick up rpl.h fix
Upstream change to fix byte order issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-03 08:16:16 -08:00
Luca Boccassi 5a37254b71 iproute: force rtm_dst_len to 32/128
Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:

 ip route get `1.0.0.0/1
  Error: ipv4: Invalid values in header for route get request.
 ip route get 0.0.0.0/0
  Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4

Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.

Bug-Debian: https://bugs.debian.org/944730
Reported-By: Clément 'wxcafé' Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:32:47 -08:00
Thayne McCombs 38957a2f6c ss: Add clarification about host conditions with multiple familes to man
In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).

The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.

I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?

Anyway, for now, this patch just notes the limitation in the man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:30:40 -08:00
Thayne McCombs df361a27c2 Add documentation of ss filter to man page
This adds some documentation of the syntax for the FILTER argument to
the ss command to the ss (8) man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:24:03 -08:00
Edwin Peer 9764761888 iplink: print warning for missing VF data
The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:18:42 -08:00
Paolo Abeni 3d6d9e6e67 ss: do not emit warn while dumping MPTCP on old kernels
Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:

RTNETLINK answers: Invalid argument

When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.

Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.

Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:17:14 -08:00
Vladimir Oltean 4712a46174 man: tc-taprio.8: document the full offload feature
Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.

This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:12:27 -08:00
Guillaume Nault 86d9660dc1 iplink_bareudp: cleanup help message and man page
* Fix PROTO description in help message (mpls isn't a valid argument).

 * Remove SRCPORTMIN description from help message since it doesn't
   appear in the syntax string.

 * Use same keywords in help message and in man page.

 * Use the "ethertype" option name (.B ethertype) rather than the
   option value (.I ETHERTYPE) in the man page description of
   [no]multiproto.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:11:32 -08:00
David Ahern d10f2a4bd8 Merge branch 'devlink-port-mgmt' into next
Parav Pandit  says:

====================

This patchset implements devlink port add, delete and function state
management commands.

An example sequence for a PCI SF:

Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false

Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active

Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive

Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:45:49 +00:00
Parav Pandit bdfb9f1bd6 devlink: Support set of port function state
Support set operation of the devlink port function state.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:48 +00:00
Parav Pandit 249465d3bf devlink: Support get port function state
Print port function state and operational state whenever reported by
kernel.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:41 +00:00
Parav Pandit 331bf89ad0 devlink: Supporting add and delete of devlink port
Enable user to add and delete the devlink port.

Examples for adding and deleting one SF port:

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

Add devlink port of flavour 'pcipf' for PF number 0 SF number 88:

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Delete newly added devlink port
$ devlink port del pci/0000:06:00.0/32768

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:36 +00:00
Parav Pandit 836a1365b7 devlink: Introduce PCI SF port flavour and attribute
Introduce PCI SF port flavour and port attributes such as PF
number and SF number.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:30 +00:00
Parav Pandit a9642c5fa6 devlink: Introduce and use string to number mapper
Instead of using static mapping in code, introduce a helper routine to
map a value to string.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:01:53 +00:00
David Ahern 1e61902180 Update kernel headers
Update kernel headers to commit:
    14e8e0f60088 ("tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 01:58:51 +00:00
Oliver Hartkopp 2ce313d1bb iplink_can: add Classical CAN frame LEN8_DLC support
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.

Add the command line support for cc-len8-dlc for Linux 5.11+

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-29 15:49:23 +00:00
Jarod Wilson 7887500008 bond: support xmit_hash_policy=vlan+srcmac
There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0

$ sudo ip link set bond0 type bond xmit_hash_policy vlan+srcmac

$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any
primary_reselect always fail_over_mac none xmit_hash_policy vlan+srcmac resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs
65535

$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlan+srcmac (5)

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
                [ clear_active_slave ] [ miimon MIIMON ]
                [ updelay UPDELAY ] [ downdelay DOWNDELAY ]
                [ peer_notify_delay DELAY ]
                [ use_carrier USE_CARRIER ]
                [ arp_interval ARP_INTERVAL ]
                [ arp_validate ARP_VALIDATE ]
                [ arp_all_targets ARP_ALL_TARGETS ]
                [ arp_ip_target [ ARP_IP_TARGET, ... ] ]
                [ primary SLAVE_DEV ]
                [ primary_reselect PRIMARY_RESELECT ]
                [ fail_over_mac FAIL_OVER_MAC ]
                [ xmit_hash_policy XMIT_HASH_POLICY ]
                [ resend_igmp RESEND_IGMP ]
                [ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
                [ all_slaves_active ALL_SLAVES_ACTIVE ]
                [ min_links MIN_LINKS ]
                [ lp_interval LP_INTERVAL ]
                [ packets_per_slave PACKETS_PER_SLAVE ]
                [ tlb_dynamic_lb TLB_DYNAMIC_LB ]
                [ lacp_rate LACP_RATE ]
                [ ad_select AD_SELECT ]
                [ ad_user_port_key PORTKEY ]
                [ ad_actor_sys_prio SYSPRIO ]
                [ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlan+srcmac
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:33:15 +00:00
wenxu c94fd71b34 tc: flower: add tc conntrack inv ct_state support
Matches on conntrack inv ct_state.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:16:35 +00:00
David Ahern c81a173f6b Update kernel headers
Update kernel headers to commit:
    59a49d9617e2 ("Merge branch 'mlxsw-expose-number-of-physical-ports'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:15:57 +00:00
Luca Boccassi 8498ca92d7 vrf: fix ip vrf exec with libbpf
The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:

$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?

https://bugs.debian.org/980046

Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com>

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:17 -08:00
Luca Boccassi 8dca565b17 vrf: print BPF log buffer if bpf_program_load fails
Necessary to understand what is going on when bpf_program_load fails

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:11 -08:00
Roi Dayan 1a22ad2721 build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366 ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc ("lib: Move get_size() from tc here")
Fixes: adbe5de966 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:28:47 -08:00
David Ahern b553cffa9f Merge branch 'dcb-app-dcbx' into next
Petr Machata  says:

====================

Add support to the dcb tool for the following two DCB objects:

- APP, which allows configuration of traffic prioritization rules based on
  several possible packet headers.

- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
  protocol is implemented in the device or in the host, and which version
  of the protocol should be used.

Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.

Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.

In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.

Patches #6 and #7 add the subtools themselves and their man pages.

v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
  This patch set now depends on that one.
- Patch #5:
    - Make it -N / --Numeric instead of -n / --no-nice-names
    - Rename the flag from no_nice_names to numeric as well
- Patch #6:
    - Adjust to s/no_nice_names/numeric/ from another patch.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:10:27 +00:00
Petr Machata 89d11ea596 dcb: Add a subtool for the DCBX object
The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.

For example:

	# dcb dcbx set dev eni1np1 host ieee
	# dcb dcbx show dev eni1np1
	host ieee

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 8e9bed1493 dcb: Add a subtool for the DCB APP object
DCB APP interfaces are standardized in 802.1q-2018, and allow configuration
of traffic prioritization rules based on several possible headers.

Add a dcb subtool for maintenance and display of the APP table. For
example:

    # dcb app add dev eni1np1 dscp-prio 0:0 CS3:3 CS6:6
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS6:6
    # dcb app add dev eni1np1 dscp-prio CS3:4
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS3:4 CS6:6
    # dcb app replace dev eni1np1 dscp-prio CS3:5
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:5 CS6:6

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 0aebd32b82 dcb: Support -N to suppress translation to human-readable names
Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata e59876ff55 dcb: Generalize dcb_get_attribute()
The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.

Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 69290c32dc dcb: Generalize dcb_set_attribute()
The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.

Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.

Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata c13216f7a6 lib: Generalize parse_mapping()
The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.

To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata bf244ee677 lib: rt_names: Add rtnl_dsfield_get_name()
For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
David Ahern fa2881b664 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 03:57:29 +00:00
Guillaume Nault 676a1a708f tc: flower: fix json output with mpls lse
The json output of the TCA_FLOWER_KEY_MPLS_OPTS attribute was invalid.

Example:

  $ tc filter add dev eth0 ingress protocol mpls_uc flower mpls \
      lse depth 1 label 100                                     \
      lse depth 2 label 200

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "  mpls":["    lse":["depth":1,"label":100],
                  "    lse":["depth":2,"label":200]]}...

This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.

Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.

Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.

Normal output isn't modified.
The json output now looks like:

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "mpls":[{"depth":1,"label":100},
                {"depth":2,"label":200}]}...

Fixes: eb09a15c12 ("tc: flower: support multiple MPLS LSE match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:13:36 -08:00
Petr Machata 934919b991 dcb: Change --Netns/-N to --netns/-n
This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata b4c0cad06e dcb: Plug a leaking DCB socket buffer
DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata 2e99c28161 dcb: Set values with RTM_SETDCB type
dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Stephen Hemminger 8b4b132261 uapi: update if_link.h from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:35 -08:00
Petr Machata ffe58c9185 include: uapi: Carry dcbnl.h
To allow building a new suite of DCB tools on an older kernel, carry a copy
of dcbnl.h.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:28 -08:00
Patrisious Haddad 537995c6d5 rdma: Add support for the netlink extack
Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:18:42 +00:00
Ido Schimmel 9bd498bfcd ipmonitor: Mention "nexthop" object in help and man page
Before:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid
 FILE := file FILENAME

After:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid | nexthop
 FILE := file FILENAME

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:17:32 +00:00
Ido Schimmel 043e03a369 nexthop: Fix usage output
Before:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get| del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
       [ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
 GROUP := [ id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

After:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get | del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
         [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
 GROUP := [ <id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:14:08 +00:00
Stephen Hemminger 2953235e61 uapi: update kernel headers to 5.11 pre rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-24 19:38:35 -08:00
Stephen Hemminger 2639bdc176 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main 2020-12-24 19:29:15 -08:00
Stephen Hemminger c9c64b8d1e 5.10.0 2020-12-21 10:28:53 -08:00
Guillaume Nault cb0debfe2d testsuite: Add mpls packet matching tests for tc flower
Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:

  * with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
    first LSE),

  * with the more generic "lse" keyword (allows matching at different
    depth of the MPLS label stack).

This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:14:26 +00:00
David Ahern c01dec8475 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:06:06 +00:00
Thomas Karlsson 42f5642a40 iplink:macvlan: Added bcqueuelen parameter
This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument

This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.

If not specified, the driver default (1000) will be used.

Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.

For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:02:07 +00:00
Andrea Claudi c8faeca5ad ss: mptcp: fix add_addr_accepted stat print
add_addr_accepted value is not printed if add_addr_signal value is 0.
Fix this properly looking for add_addr_accepted value, instead.

Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-15 13:59:13 -08:00
Andrea Claudi 0d78e8eabf tc: pedit: fix memory leak in print_pedit
keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.

Fixes: 081d6c310d ("tc: pedit: Support JSON dumping")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:24:08 -08:00
Andrea Claudi ec1346acbe devlink: fix memory leak in cmd_dev_flash()
nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.

This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:23:24 -08:00
Andrea Claudi 309e6027e5 man: tc-flower: fix manpage
Commit 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.

This commit simply removes it.

Fixes: 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:22:53 -08:00
David Ahern ee50fd58dc Merge branch 'dcb-pfc-buffer-maxrate' into next
Petr Machata  says:
====================

Add support to the dcb tool for the following three DCB objects:

- PFC, for "Priority-based Flow Control", allows configuration of priority
  lossiness, and related toggles.

- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
  allow configuration of port headroom buffers.

- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
  and allow configuration of rate with which traffic in a given traffic
  class is sent.

Patches #1-#4 fix small issues in the current DCB code and man pages.

Patch #5 adds new helpers to the DCB dispatcher.

Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.

Patches #8-#10 add the subtools themselves and their man pages.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:48:38 +00:00
Petr Machata 117939d9bd dcb: Add a subtool for the DCB maxrate object
DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.

Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:

    # dcb maxrate show dev eni1np1
    tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:07 +00:00
Petr Machata 2e36f91000 dcb: Add a subtool for the DCB buffer object
DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.

Add a dcb subtool to allow showing and tweaking of buffer priority mapping
and buffer sizes. For example:

    # dcb buf show dev eni1np1
    prio-buffer 0:0 1:0 2:0 3:3 4:0 5:0 6:6 7:0
    buffer-size 0:10000 1:0 2:0 3:70000 4:0 5:0 6:10000 7:0
    total-size 221072

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:03 +00:00
Petr Machata 6567cb588b dcb: Add a subtool for the DCB PFC object
PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.

Add a dcb subtool to allow showing and tweaking of individual PFC
configuration options, and querying statistics. For example:

    # dcb pfc show dev eni1np1
    pfc-cap 8 macsec-bypass on delay 0
    pg-pfc 0:off 1:on 2:off 3:off 4:off 5:off 6:off 7:on
    requests 0:0 1:217 2:0 3:0 4:0 5:0 6:0 7:28
    indications 0:0 1:179 2:0 3:0 4:0 5:0 6:0 7:18

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:58 +00:00
Petr Machata 808dd741fc dcb: Add -i to enable IEC mode
Allow switching "dcb" into the ISO/IEC mode of units by passing -i.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:54 +00:00
Petr Machata 6e9687db04 dcb: Add -s to enable statistics
Allow selective display of statistical counters by passing -s.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:50 +00:00
Petr Machata 11a72186a0 dcb: Add dcb_set_u32(), dcb_set_u64()
The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:45 +00:00
Petr Machata 7e94711c71 man: dcb-ets: Remove an unnecessary empty line
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:40 +00:00
Petr Machata a7c2eaac39 dcb: ets: Change the way show parameters are given in synopsis
None, one, or many parameters can be given on the command line, but
the current synopsis allows only none or one. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:22 +00:00
Petr Machata 12d41d0184 dcb: ets: Fix help display for "show" subcommand
"dcb ets show dev X help" currently shows full "ets" help instead of just
help for the show command. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:19 +00:00
Petr Machata 7fe954ee34 dcb: Remove unsupported command line arguments from getopt_long()
getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:40:32 +00:00
Stephen Hemminger 376367d917 uapi: merge in change to bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 08:07:06 -08:00
David Ahern 6e9bfdcdde Merge branch 'devlink-reload' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:43:41 +00:00
Moshe Shemesh c2d7c45c32 devlink: Add reload stats to dev show
Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.

Output examples:
$ devlink dev show -s
pci/0000:82:00.0:
  stats:
      reload:
          driver_reinit:
            unspecified 2
          fw_activate:
            unspecified 1 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
pci/0000:82:00.1:
  stats:
      reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 1
          fw_activate:
            unspecified 1 no_reset 0

$ devlink dev show -s -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 2
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                }
            }
        },
        "pci/0000:82:00.1": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 1
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                }
            }
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:15 +00:00
Moshe Shemesh 0c0023ad71 devlink: Add pr_out_dev() helper function
Add pr_out_dev() helper function and use it both by cmd_dev_show_cb()
and by cmd_mon_show_cb().

Dev stats will be added on the next patch to dev context, so
cmd_mon_show_cb() should print the whole dev context and not just dev
handle.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:09 +00:00
Moshe Shemesh f28c910274 devlink: Add devlink reload action and limit options
Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.

The following reload actions are supported:
  driver_reinit: driver entities re-initialization, applying
                 devlink-param and devlink-resource values.
  fw_activate: firmware activate.

The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.

By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.

Reload limit added:
  no_reset: No reset allowed, no down time allowed, no link flap and no
            configuration is lost.

Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

devlink dev reload pci/0000:82:00.1 action driver_reinit -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit" ]
    }
}

devlink dev reload pci/0000:82:00.0 action fw_activate -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit","fw_activate" ]
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:40:00 +00:00
David Ahern 120cdeb1b7 Merge branch 'rate-size-parsing-output' into next
Petr Machata says:
==================

The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.

s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.

The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.

Patch #4 then improves the code of sprint_size() a little bit.

Patch #5 fixes a buglet in formatting small rates in IEC mode.

Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.

This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:32:17 +00:00
Petr Machata 44396bdfcc lib: Move get_size() from tc here
The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:50 +00:00
Petr Machata f3be0e6366 lib: Move get_rate(), get_rate64() from tc here
The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:44 +00:00
Petr Machata aaeda2a768 lib: print_color_rate(): Fix formatting small rates in IEC mode
ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:41 +00:00
Petr Machata a0a4b6618c lib: sprint_size(): Uncrustify the code a bit
Ideally this and the rate printing would both be converted to a common
helper, but unfortunately the two format differently and this would break
tests and scripts out there. So just make the code look less like a wad of
hay.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:36 +00:00
Petr Machata adbe5de966 lib: Move sprint_size() from tc here, add print_size()
When displaying sizes of various sorts, tc commonly uses the function
sprint_size() to format the size into a buffer as a human-readable string.
This string is then displayed either using print_string(), or in some code
even fprintf(). As a result, a typical sequence of code when formatting a
size is something like the following:

	SPRINT_BUF(b);
	print_uint(PRINT_JSON, "foo", NULL, foo);
	print_string(PRINT_FP, NULL, "foo %s ", sprint_size(foo, b));

For a concept as broadly useful as size, it would be better to have a
dedicated function in json_print.

To that end, move sprint_size() from tc_util to json_print. Add helpers
print_size() and print_color_size() that wrap arount sprint_size() and
provide the JSON dispatch as appropriate.

Since print_size() should be the preferred interface, convert vast majority
of uses of sprint_size() to print_size(). Two notable exceptions are:

- q_tbf, which does not show the size as such, but uses the string
  "$human_readable_size/$cell_size" even in JSON. There is simply no way to
  have print_size() emit the same text, because print_size() in JSON mode
  should of course just use the raw number, without human-readable frills.

- q_cake, which relies on the existence of sprint_size() in its macro-based
  formatting helpers. There might be ways to convert this particular case,
  but given q_tbf simply cannot be converted, leave it as is.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:25 +00:00
Petr Machata 60265cc226 lib: Move print_rate() from tc here; modernize
The functions print_rate() and sprint_rate() are useful for formatting
rate-like values. The DCB tool would find these useful in the maxrate
subtool. However, the current interface to these functions uses a global
variable use_iec as a flag indicating whether 1024- or 1000-based powers
should be used when formatting the rate value. For general use, a global
variable is not a great way of passing arguments to a function. Besides, it
is unlike most other printing functions in that it deals in buffers and
ignores JSON.

Therefore make the interface to print_rate() explicit by converting use_iec
to an ordinary parameter. Since the interface changes anyway, convert it to
follow the pattern of other json_print functions (except for the
now-explicit use_iec parameter). Move to json_print.c.

Add a wrapper to tc, so that all the call sites do not need to repeat the
use_iec global variable argument, and convert all call sites.

In q_cake.c, the conversion is not straightforward due to usage of a macro
that is shared across numerous data types. Simply hand-roll the
corresponding code, which seems better than making an extra helper for one
call site.

Drop sprint_rate() now that everybody just uses print_rate().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:15 +00:00
Petr Machata cdd9425315 Move the use_iec declaration to the tools
The tools "ip" and "tc" use a flag "use_iec", which indicates whether, when
formatting rate values, the prefixes "K", "M", etc. should refer to powers
of 1024, or powers of 1000. The flag is currently kept as a global variable
in "ip" and "tc", but is nonetheless declared in util.h.

Instead, move the declaration to tool-specific headers ip/ip_common.h and
tc/tc_common.h.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:28:43 +00:00
Paolo Lungaroni 69629b4e43 seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors
We introduce the "vrftable" attribute for supporting the SRv6 End.DT4 and
End.DT6 behaviors in iproute2.
The "vrftable" attribute indicates the routing table associated with
the VRF device used by SRv6 End.DT4/DT6 for routing IPv4/IPv6 packets.

The SRv6 End.DT4/DT6 is used to implement IPv4/IPv6 L3 VPNs based on Segment
Routing over IPv6 networks in multi-tenants environments.
It decapsulates the received packets and it performs the IPv4/IPv6 routing
lookup in the routing table of the tenant.

The SRv6 End.DT4/DT6 leverages a VRF device in order to force the routing
lookup into the associated routing table using the "vrftable" attribute.

Some examples:
 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0
 $ ip -6 route add 2001:db8::2 encap seg6local action End.DT6 vrftable 200 dev eth0

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT4 vrftable 100 dev eth0 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End.DT6",
        "vrftable": 200,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

v2:
 - no changes made: resubmit after pulling out this patch from the kernel
   patchset.

v1:
 - mixing this patch with the kernel patchset confused patckwork.

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:27:42 +00:00
David Ahern cfad32569f Update kernel headers
Update kernel headers to commit:
    afae3cc2da10 ("net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:25:34 +00:00
David Ahern 8065d28218 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:25:12 +00:00
David Ahern b3c4a55064 Only compile mnl_utils when HAVE_MNL is defined
New lib/mnl_utils.c fails to compile if libmnl is not installed:

  mnl_utils.c:9:10: fatal error: libmnl/libmnl.h: No such file or directory
      9 | #include <libmnl/libmnl.h>

Make it dependent on HAVE_MNL.

Fixes: 72858c7b77 ("lib: Extract from devlink/mnlg a helper, mnlu_socket_open()")
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:19:05 +00:00
Stephen Hemminger 2e80ae89ca Merge branch 'gcc-10' into main 2020-12-03 08:33:06 -08:00
Luca Boccassi 755b1c584e tc/mqprio: json-ify output
As reported by a Debian user, mqprio output in json mode is
invalid:

{
     "kind": "mqprio",
     "handle": "8021:",
     "dev": "enp1s0f0",
     "root": true,
     "options": { tc 2 map 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
          queues:(0:3) (4:7)
          mode:channel
          shaper:dcb}
}

json-ify it, while trying to maintain the same formatting
for standard output.

New output:

{
    "kind": "mqprio",
    "handle": "8001:",
    "root": true,
    "options": {
        "tc": 2,
        "map": [ 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
        "queues": [ [ 0, 3 ], [ 4, 7 ] ],
        "mode": "channel",
        "shaper": "dcb"
    }
}

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=972784

Reported-by: Roméo GINON <romeo.ginon@ilexia.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:32:42 -08:00
Luca Boccassi 975c4944e8 ip/netns: use flock when setting up /run/netns
If multiple ip processes are ran at the same time to set up
separate network namespaces, and it is the first time so /run/netns
has to be set up first, and they end up doing it at the same time,
the processes might enter a recursive loop creating thousands of
mount points, which might crash the system depending on resources
available.

Try to take a flock on /run/netns before doing the mount() dance, to
ensure this cannot happen. But do not try too hard, and if it fails
continue after printing a warning, to avoid introducing regressions.

First reported on Debian: https://bugs.debian.org/949235

To reproduce (WARNING: run in a VM to avoid system lockups):

for i in {0..9}
do
        strace -e trace=mount -e inject=mount:delay_exit=1000000 ip \
 netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait

The strace is to ensure the problem always reproduces, to add an
artificial synchronization point after the first mount().

Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:31:23 -08:00
Vlad Buslov ea130da81e tc: implement support for action terse dump
Implement support for action terse dump using new TCA_ACT_FLAG_TERSE_DUMP
value of TCA_ROOT_FLAGS tlv. Set the flag when user requested it with
following example CLI (-br for 'brief'):

$ tc -s -br actions ls action tunnel_key
total acts 2

        action order 0: tunnel_key       index 1
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 1: tunnel_key       index 2
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
action (kind, index) and stats, if requested by the user.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:51:06 +00:00
Vlad Buslov 00fffb2d79 tc: use TCA_ACT_ prefix for action flags
Use TCA_ACT_FLAG_LARGE_DUMP_ON alias according to new preferred naming for
action flags.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:49:14 +00:00
David Ahern 23683dec32 Update kernel headers
Update kernel headers to commit:
    cec85994c6b4 ("bareudp: constify device_type declaration")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:47:07 +00:00
Sergey Ryazanov d7190d4ced ip: add IP_LIB_DIR environment variable
Do not hardcode /usr/lib/ip as a path and allow libraries path
configuration in run-time.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-02 16:37:07 +00:00
Stephen Hemminger fb054cb336 uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:17:22 -08:00
Stephen Hemminger c95d63e4fb uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:16:50 -08:00
Stephen Hemminger cae2e9291a f_u32: fix compiler gcc-10 compiler warning
With gcc-10 it complains about array subscript error.

f_u32.c: In function ‘u32_parse_opt’:
f_u32.c:1113:24: warning: array subscript 0 is outside the bounds of an interior zero-length array ‘struct tc_u32_key[0]’ [-Wzero-length-bounds]
 1113 |    hash = sel2.sel.keys[0].val & sel2.sel.keys[0].mask;
      |           ~~~~~~~~~~~~~^~~
In file included from tc_util.h:11,
                 from f_u32.c:26:
../include/uapi/linux/pkt_cls.h:253:20: note: while referencing ‘keys’
  253 |  struct tc_u32_key keys[0];
      |

This is because the keys are actually allocated in the second element
of the parent structure.

Simplest way to address the warning is to assign directly to the keys
in the containing structure.

This has always been in iproute2 (pre-git) so no Fixes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:33 -08:00
Stephen Hemminger c014983921 misc: fix compiler warning in ifstat and nstat
The code here was doing strncpy() in a way that causes gcc 10
warning about possible string overflow. Just use strlcpy() which
will null terminate and bound the string as expected.

This has existed since start of git era so no Fixes tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:31 -08:00
Stephen Hemminger 2319db9052 tc: fix compiler warnings in ip6 pedit
Gcc-10 complains about referencing a zero size array.
This occurs because the array of keys is actually in the following
structure which is part of the overall selector.

The original code was safe, but better to just use the key
array directly.

Fixes: 2d9a8dc439 ("tc: p_ip6: Support pedit of IPv6 dsfield")
Cc: petrm@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:23 -08:00
Stephen Hemminger 5bdc4e9151 bridge: fix string length warning
Gcc-10 complains about possible string length overflow.
This can't happen Ethernet address format is always limited to
18 characters or less. Just resize the temp buffer.

Fixes: 70dfb0b883 ("iplink: bridge: export bridge_id and designated_root")
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:16 -08:00
Stephen Hemminger f817699939 devlink: fix uninitialized warning
GCC-10 complains about uninitialized variable.

devlink.c: In function ‘cmd_dev’:
devlink.c:2803:12: warning: ‘val_u32’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 2803 |    val_u16 = val_u32;
      |    ~~~~~~~~^~~~~~~~~
devlink.c:2747:11: note: ‘val_u32’ was declared here
 2747 |  uint32_t val_u32;
      |           ^~~~~~~

This is a false positive because it can't figure out the control flow
when the parse returns error.

Fixes: 2557dca2b0 ("devlink: Add string to uint{8,16,32} conversion for generic parameters")
Cc: shalomt@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:19:36 -08:00
Vladimir Oltean c29f65db34 bridge: add support for L2 multicast groups
Extend the 'bridge mdb' command for the following syntax:
bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:54:02 +00:00
Luca Boccassi f5c1246e6a Add dcb/.gitignore
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:39:47 +00:00
David Ahern f98ce50046 Merge branch 'libbpf' into next
Hangbin Liu  says:

====================

This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v6:
a) print runtime libbpf version in ip -V and tc -V

v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.

Here are the test results with patched iproute2:
== Show libbpf version
$ ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
$ tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0

== setup env
$ clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
$ clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
$ rm -rf /sys/fs/bpf/xdp/globals
$ /root/iproute2/ip/ip link add type veth
$ /root/iproute2/ip/ip link set veth0 up
$ /root/iproute2/ip/ip link set veth1 up

== Load objs
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
$ /root/iproute2/ip/ip link set veth0 xdp off

== Load objs again to make sure maps could be reused
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Load new btf defined maps
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Test load objs by tc
$ /root/iproute2/tc/tc qdisc add dev veth0 ingress
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
$ /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
$ ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
$ bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:24:15 -07:00
Hangbin Liu 71c7c1fb4f examples/bpf: add bpf examples with BTF defined maps
Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:08 -07:00
Hangbin Liu 1ac8285a69 examples/bpf: move struct bpf_elf_map defined maps to legacy folder
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:06 -07:00
Hangbin Liu 6d61a2b557 lib: add libbpf support
This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.

In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.

Other maps/progs will be loaded by libbpf directly.

[1] https://lore.kernel.org/bpf/20190820114706.18546-1-toke@redhat.com/

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:05 -07:00
Hangbin Liu dc800a4ed4 lib: make ipvrf able to use libbpf and fix function name conflicts
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.

Function bpf_prog_load() is removed as it's conflict with libbpf
function name.

bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:04 -07:00
Hangbin Liu 503e9229b0 iproute2: add check_libbpf() and get_libbpf_version()
This patch aim to add basic checking functions for later iproute2
libbpf support.

First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.

Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.

When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:02 -07:00
David Ahern ee5d4b24e3 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:04:48 -07:00
Roi Dayan ed40b7e2ae tc flower: fix parsing vlan_id and vlan_prio
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e0 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:45:20 -07:00
Petr Machata ca5ec9a17a ip: iptuntap: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:41 -07:00
Petr Machata 66e574c4c5 ip: ipnetconf: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:34 -07:00
Petr Machata 07d82b4a79 ip: iplink_bridge_slave: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:30 -07:00
Petr Machata 3e0d2a73ba ip: iplink_bridge_slave: Port over to parse_on_off()
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:27 -07:00
Petr Machata 5f685d064b ip: iplink: Convert to use parse_on_off()
Invoke parse_on_off() instead of rolling a custom function.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:23 -07:00
Petr Machata 94d12fd796 bridge: link: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:19 -07:00
Petr Machata 9262ccc3ed bridge: link: Port over to parse_on_off()
Convert bridge/link.c from a custom on_off parser to the new global one.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:14 -07:00
David Ahern e1ae6efbb8 Merge branch 'nexthop-flags' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@nvidia.com>

Patch #1 prints the recently added 'RTNH_F_TRAP' flag.

Patch #2 makes sure that nexthop flags are always printed for nexthop
objects. Even when the nexthop does not have a device, such as a
blackhole nexthop or a group.

Example output with netdevsim:

$ ip nexthop
id 1 via 192.0.2.2 dev eth0 scope link trap
id 2 blackhole trap
id 3 group 2 trap

Example output with mlxsw:

$ ip nexthop
id 1 via 192.0.2.2 dev swp3 scope link offload
id 2 blackhole offload
id 3 group 2 offload

Tested with fib_nexthops.sh that uses "ip nexthop" output:

Tests passed: 164
Tests failed:   0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:46:30 -07:00
Ido Schimmel 0788678991 nexthop: Always print nexthop flags
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.

Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:43:56 -07:00
Ido Schimmel 3de35f41be ip route: Print "trap" nexthop indication
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:42:20 -07:00
David Ahern db8b149b16 Update kernel headers
Update kernel headers to commit:
    f9e425e99b07 ("octeontx2-af: Add support for RSS hashing based on Transport protocol field")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:41:23 -07:00
Stephen Hemminger 7a49ff9d79 bridge: report correct version
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-15 08:58:52 -08:00
Zahari Doychev 4c551369e0 tc flower: use right ethertype in icmp/arp parsing
Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 20:07:38 -07:00
David Ahern 1ed00380b0 Merge branch 'dcb-tool' into next
Petr Machata  says:
====================

The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc.

Currently a common libre tool for configuration of DCB is OpenLLDP. This
suite contains a daemon that uses Linux DCB interface to configure HW
according to the DCB TLVs exchanged over an interface. The daemon can also
be controlled by a client, through which the user can adjust and view the
configuration. The downside of using OpenLLDP is that it is somewhat
heavyweight and difficult to use in scripts, and does not support
extensions such as buffer and rate commands.

For access to many HW features, one would be perfectly fine with a
fire-and-forget tool along the lines of "ip" or "tc". For scripting in
particular, this would be ideal. This author is aware of one such tool,
mlnx_qos from Mellanox OFED scripts collection[1].

The downside here is that the tool is very verbose, the command line
language is awkward to use, it is not packaged in Linux distros, and
generally has the appearance of a very vendor-specific tool, despite not
being one.

This patchset addresses the above issues by providing a seed of a clean,
well-documented, easily usable, extensible fire-and-forget tool for DCB
configuration:

    # dcb ets set dev eni1np1 \
                  tc-tsa all:strict 0:ets 1:ets 2:ets \
		  tc-bw all:0 0:33 1:33 2:34

    # dcb ets show dev eni1np1 tc-tsa tc-bw
    tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict
    tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0

    # dcb ets set dev eni1np1 tc-bw 1:30 2:37

    # dcb -j ets show dev eni1np1 | jq '.tc_bw[2]'
    37

The patchset proceeds as follows:

- Many tools in iproute2 have an option to work in batch mode, where the
  commands to run are given in a file. The code to handle batching is
  largely the same independent of the tool in question. In patch #1, add a
  helper to handle the batching, and migrate individual tools to use it.

- A number of configuration options come in a form of an on-off switch.
  This in turn can be considered a special case of parsing one of a given
  set of strings. In patch #2, extract helpers to parse one of a number of
  strings, on top of which build an on-off parser.

  Currently each tool open-codes the logic to parse the on-off toggle. A
  future patch set will migrate instances of this code over to the new
  helpers.

- The on/off toggles from previous list item sometimes need to be dumped.
  While in the FP output, one typically wishes to maintain consistency with
  the command line and show actual strings, "on" and "off", in JSON output
  one would rather use booleans. This logic is somewhat annoying to have to
  open-code time and again. Therefore in patch #3, add a helper to do just
  that.

- The DCB tool is built on top of libmnl. Several routines will be
  basically the same in DCB as they are currently in devlink. In patches
  #4-#6, extract them to a new module, mnl_utils, for easy reuse.

- Much of DCB is built around arrays. A syntax similar to the iplink_vlan's
  ingress-qos-map / egress-qos-map is very handy for describing changes
  done to such arrays. Therefore in patch #7, extract a helper,
  parse_mapping(), which manages parsing of key-value arrays. In patch #8,
  fix a buglet in the helper, and in patch #9, extend it to allow setting
  of all array elements in one go.

- In patch #10, add a skeleton of "dcb", which contains common helpers and
  dispatches to subtools for handling of individual objects. The skeleton
  is empty as of this patch.

  In patch #11, add "dcb_ets", a module for handling of specifically DCB
  ETS objects.

  The intention is to gradually add handlers for at least PFC, APP, peer
  configuration, buffers and rates.

[1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:48:52 -07:00
Petr Machata ef15b07601 dcb: Add a subtool for the DCB ETS object
ETS, for "Enhanced Transmission Selection", is a set of configurations that
permit configuration of mapping of priorities to traffic classes, traffic
selection algorithm to use per traffic class, bandwidth allocation, etc.

Add a dcb subtool to allow showing and tweaking of individual ETS
configuration options. For example:

    # dcb ets show dev eni1np1
    willing on ets_cap 8 cbs off
    tc-bw 0:0 1:0 2:0 3:0 4:100 5:0 6:0 7:0
    pg-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    tc-tsa 0:strict 1:strict 2:strict 3:strict 4:ets 5:strict 6:strict 7:strict
    prio-tc 0:1 1:3 2:5 3:0 4:0 5:0 6:0 7:0
    reco-tc-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    reco-tc-tsa 0:strict 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
    reco-prio-tc 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 67033d1c1c Add skeleton of a new tool, dcb
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc. Add a new tool to show that
configuration and tweak it.

DCB allows configuration of several objects, and possibly could expand to
pre-standard CEE interfaces. Therefore the tool itself is a lean shell that
dispatches to subtools each dedicated to one of the objects.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 66a2d71487 lib: parse_mapping: Recognize a keyword "all"
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.

To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.

Have iplink_vlan pass false for allow_all.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata bc3523ae70 lib: parse_mapping: Update argc, argv on error
Currently argc and argv are not updated unless parsing of all of the
mapping was successful. However in that case, "ip link" will point at the
wrong argument when complaining:

    # ip link add name eth0.100 link eth0 type vlan id 100 egress 1:1 2:foo
    Error: argument "1" is wrong: invalid egress-qos-map

Update argc and argv even in the case of parsing error, so that the right
element is indicated.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 28e663ee65 lib: Extract from iplink_vlan a helper to parse key:value arrays
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.

Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 6dd778e837 lib: Extract from devlink/mnlg a helper, mnlu_socket_recv_run()
Receiving a message in libmnl is a somewhat involved operation. Devlink's
mnlg library has an implementation that is going to be handy for other
tools as well. Extract it into a new helper.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata dd78dfc7be lib: Extract from devlink/mnlg a helper, mnlu_msg_prepare()
Allocation of a new netlink message with the two usual headers is reusable
with other netlink netlink message types. Extract it into a helper,
mnlu_msg_prepare(). Take the second header as an argument, instead of
passing in parameters to initialize it, and copy it in.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 72858c7b77 lib: Extract from devlink/mnlg a helper, mnlu_socket_open()
This little dance of mnl_socket_open(), option setting, and bind, is the
same regardless of tool. Extract into a new module that should hold helpers
for working with libmnl, mnl_util.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 9091ff0251 lib: json_print: Add print_on_off()
The value of a number of booleans is shown as "on" and "off" in the plain
output, and as an actual boolean in JSON mode. Add a function that does
that.

RDMA tool already uses a function named print_on_off(). This function
always shows "on" and "off", even in JSON mode. Since there are probably
very few if any consumers of this interface at this point, migrate it to
the new central print_on_off() as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 82604d2852 lib: Add parse_one_of(), parse_on_off()
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.

Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".

Convert the macsec code to the new helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 1d9a81b8c9 Unify batch processing across tools
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Guillaume Nault 8682f588bf tc-mpls: fix manpage example and help message string
Manpage:
 * Remove the extra "and to ip packets" part from command description
   to make it more understandable.

 * Redirect packets to eth1, instead of eth0, as told in the
   description.

Help string:
 * "mpls pop" can be followed by a CONTROL keyword.

 * "mpls modify" can also set the MPLS_BOS field.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:28 -08:00
Guillaume Nault 7c7a0fe0c8 tc-vlan: fix help and error message strings
* "vlan pop" can be followed by a CONTROL keyword.

 * Add missing space in error message.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:18 -08:00
Stephen Hemminger 72f88bd42a uapi: update kernel headers from 5.10-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:47:27 -08:00
Stephen Hemminger b90c39be33 rdma: fix spelling error in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:44:19 -08:00
Stephen Hemminger c8424b73e1 man: fix spelling errors
Lots of little typo errors on man pages.
Found by running codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:40:30 -08:00
Stephen Hemminger cbf6481797 tc/m_gate: fix spelling errors
Fix spelling errors in error messages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:34:23 -08:00
Stephen Hemminger 14b189f066 uapi: updates from 5.10-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-03 08:29:53 -08:00
David Ahern 51f28eb928 Merge branch 'tc-terse-dump' into next
Vlad Buslov  says:

====================

Implement support for terse dump mode which provides only essential
classifier/action info (handle, stats, cookie, etc.). Use new
TCA_DUMP_FLAGS_TERSE flag to prevent copying of unnecessary data from
kernel.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:18:43 -06:00
Vlad Buslov 477ca0dfb4 tc: implement support for terse dump
Implement support for classifier/action terse dump using new TCA_DUMP_FLAGS
tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Set the flag when
user requested it with following example CLI (-br for 'brief'):

$ tc -s -br filter show dev ens1f0 ingress
filter protocol ip pref 49151 flower chain 0
filter protocol ip pref 49151 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter protocol ip pref 49152 flower chain 0
filter protocol ip pref 49152 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
filter and action (handle, cookie, etc.) and stats, if requested by the
user. The intention is to significantly improve rule dump rate by omitting
all static data that do not change after rule is created.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:15:15 -06:00
Vlad Buslov a99ebeeef2 tc: skip actions that don't have options attribute when printing
Modify implementations that return error from action_until->print_aopt()
callback to silently skip actions that don't have their corresponding
TCA_ACT_OPTIONS attribute set (some actions already behave like this).
Print action kind before returning from action_until->print_aopt()
callbacks. This is necessary to support terse dump mode in following patch
in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:14:01 -06:00
Johannes Berg 9fc5bf734f libnetlink: define __aligned conditionally
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.

Use our own definition only if there isn't one already.

Fixes: d5acae244f ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-28 10:24:02 -07:00
David Ahern eb12cc9ae1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:08:12 -06:00
Guillaume Nault f1298d7660 m_mpls: test the 'mac_push' action after 'modify'
Commit 02a261b5ba ("m_mpls: add mac_push action") added a matches()
test for the "mac_push" string before the test for "modify".
This changes the previous behaviour as 'action m' used to match
"modify" while it now matches "mac_push".

Revert to the original behaviour by moving the "mac_push" test after
"modify".

Fixes: 02a261b5ba ("m_mpls: add mac_push action")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:07:13 -06:00
David Ahern 2b7a768408 Merge branch 'tipc-encryption' into next
Tuong Lien  says:

====================

This series adds two new options in the 'iproute2/tipc' command, enabling users
to use the new TIPC encryption features, i.e. the master key and rekeying which
have been recently merged in kernel.

The help menu of the "tipc node set key" command is also updated accordingly:

 # tipc node set key --help
Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES]
       tipc node set key rekeying REKEYING

KEY
  Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form:
  [KEY: 16, 24 or 32 octets][SALT: 4 octets]

ALGNAME
  Cipher algorithm [default: "gcm(aes)"]

PROPERTIES
  master                - Set KEY as a cluster master key
  <empty>               - Set KEY as a cluster key
  nodeid NODEID         - Set KEY as a per-node key for own or peer

REKEYING
  INTERVAL              - Set rekeying interval (in minutes) [0: disable]
  now                   - Trigger one (first) rekeying immediately

EXAMPLES
  tipc node set key this_is_a_master_key master
  tipc node set key 0x746869735F69735F615F6B657931365F73616C74
  tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002
  tipc node set key rekeying 600

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:05:40 -06:00
Tuong Lien 2bf1ba5a5c tipc: add option to set rekeying for encryption
As supported in kernel, the TIPC encryption rekeying can be tuned using
the netlink attribute - 'TIPC_NLA_NODE_REKEYING'. Now we add the
'rekeying' option correspondingly to the 'tipc node set key' command so
that user will be able to perform that tuning:

tipc node set key rekeying REKEYING

where the 'REKEYING' value can be:

INTERVAL              - Set rekeying interval (in minutes) [0: disable]
now                   - Trigger one (first) rekeying immediately

For example:
$ tipc node set key rekeying 60
$ tipc node set key rekeying now

The command's help menu is also updated with these descriptions for the
new command option.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:45 -06:00
Tuong Lien 5fb3681885 tipc: add option to set master key for encryption
In addition to the support of master key in kernel, we add the 'master'
option to the 'tipc node set key' command for user to be able to
specify a key as master key during the key setting. This is carried out
by turning on the new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER'.
For example:

$ tipc node set key "this_is_a_master_key" master

The command's help menu is also updated to give a better description of
all the available options.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:37 -06:00
David Ahern b4edd6a8a6 Merge branch 'tc-mpls-l2-vpn' into next
Guillaume Nault  says:

====================

This patch series adds the possibility for TC to tunnel Ethernet frames
over MPLS.

Patch 1 allows adding or removing the Ethernet header.
Patch 2 allows pushing an MPLS LSE before the MAC header.

By combining these actions, it becomes possible to encapsulate an
entire Ethernet frame into MPLS, then add an outer Ethernet header
and send the resulting frame to the next hop.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:47 -06:00
Guillaume Nault 02a261b5ba m_mpls: add mac_push action
Add support for the new TCA_MPLS_ACT_MAC_PUSH action (kernel commit
a45294af9e96 ("net/sched: act_mpls: Add action to push MPLS LSE before
Ethernet header")). This action let TC push an MPLS header before the
MAC header of a frame.

Example (encapsulate all outgoing frames with label 20, then add an
outer Ethernet header):
 # tc filter add dev ethX matchall \
       action mpls mac_push label 20 ttl 64 \
       action vlan push_eth dst_mac 0a:00:00:00:00:02 \
                            src_mac 0a:00:00:00:00:01

This patch also adds an alias for ETH_P_TEB, since it is useful when
decapsulating MPLS packets that contain an Ethernet frame.

With MAC_PUSH, there's no previous Ethertype to modify. However, the
"protocol" option is still needed, because the kernel uses it to set
skb->protocol. So rename can_modify_ethtype() to can_set_ethtype().

Also add a test suite for m_mpls, which covers the new action and the
pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:08 -06:00
Guillaume Nault d61167dd88 m_vlan: add pop_eth and push_eth actions
Add support for the new TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH
actions (kernel commit 19fbcb36a39e ("net/sched: act_vlan:
Add {POP,PUSH}_ETH actions"). These action let TC remove or add the
Ethernet at the head of a frame.

Drop an Ethernet header:
 # tc filter add dev ethX matchall action vlan pop_eth

Push an Ethernet header (the original frame must have no MAC header):
 # tc filter add dev ethX matchall action vlan \
       push_eth dst_mac 0a:00:00:00:00:02 src_mac 0a:00:00:00:00:01

Also add a test suite for m_vlan, which covers these new actions and
the pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:36:38 -06:00
Jacob Keller 3342688a66 devlink: display elapsed time during flash update
For some devices, updating the flash can take significant time during
operations where no status can meaningfully be reported. This can be
somewhat confusing to a user who sees devlink appear to hang on the
terminal waiting for the device to update.

Recent changes to the kernel interface allow such long running commands
to provide a timeout value indicating some upper bound on how long the
relevant action could take.

Provide a ticking counter of the time elapsed since the previous status
message in order to make it clear that the program is not simply stuck.

Display this message whenever the status message from the kernel
indicates a timeout value. Additionally also display the message if
we've received no status for more than couple of seconds. If we elapse
more than the timeout provided by the status message, replace the
timeout display with "timeout reached".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-17 09:30:06 -06:00
Stephen Hemminger cb7ce51cc1 v5.9.0 2020-10-15 15:18:35 -07:00
zhangkaiheb@126.com 78ace1c211 tc: fq: clarify the length of orphan_mask.
Signed-off-by: kai zhang <zhangkaiheb@126.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-15 15:16:52 -07:00
Jan Engelhardt 0ca1312c20 ip: add error reporting when RTM_GETNSID failed
`ip addr` when run under qemu-user-riscv64, fails. This likely is due
to qemu-5.1 not doing translation of RTM_GETNSID calls. Aborting ip
completely is not helpful for the user however. This patch reworks
the error handling.

Before:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ffrtest:/ #

Afterwards:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
rtnl_send(RTM_GETNSID): Operation not supported. Continuing anyway.
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.147/28 brd 192.168.72.159 scope global host0
       valid_lft forever preferred_lft forever
    inet6 fe80::443f:2dff:fe88:3ddb/64 scope link
       valid_lft forever preferred_lft forever

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-12 08:10:25 -07:00
Dmitry Yakunin 58c3c55f38 lib: ignore invalid mounts in cg_init_map
In case of bad entries in /proc/mounts just skip cgroup cache initialization.
Cgroups in output will be shown as "unreachable:cgroup_id".

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 23:02:35 -07:00
Stephen Hemminger 003b9af516 uapi: add new SNMP entry
Update to snmp.h from 5.9

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 22:50:22 -07:00
David Ahern b5a583fb32 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:11:09 -06:00
Johannes Berg 7812012849 genl: ctrl: print op -> policy idx mapping
Newer kernels can dump per-op policies, so print out the new
mapping attribute to indicate which op has which policy.

v2:
 * print out both do/dump policy idx
v3:
 * fix userspace API which renumbered after patch rebasing

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:10:09 -06:00
David Ahern 91c54917cd Merge branch 'bridge-igmpv3-mldv2' into next
Nikolay Aleksandrov  says:

====================
This set adds support for IGMPv3/MLDv2 attributes, they're mostly
read-only at the moment. The only new "set" option is the source address
for S,G entries. It is added in patch 01 (see the patch commit message for
an example). Patch 02 shows a missing flag (fast_leave) for
completeness, then patch 03 shows the new IGMPv3/MLDv2 flags:
added_by_star_ex and blocked. Patches 04-06 show the new extra
information about the entry's state when IGMPv3/MLDv2 are enabled. That
includes its filter mode (include/exclude), source list with timers and
origin protocol (currently only static/kernel), in order to show the new
information the user must use "-d"/show_details.
Here's the output of a few IGMPv3 entries:
 dev bridge port ens12 grp 239.0.0.1 src 20.21.22.23 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 8.9.10.11 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 1.2.3.1 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 temp filter_mode exclude source_list 20.21.22.23/0.00,8.9.10.11/0.00,1.2.3.1/0.00 proto kernel    26.65

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:09:14 -06:00
Nikolay Aleksandrov 86588450c5 bridge: mdb: print protocol when available
Print the mdb entry's protocol (i.e. who added it)  when it's available if
the user requested to show details (-d). Currently the only possible
values are RTPROT_STATIC (user-space added) or RTPROT_KERNEL
(automatically added by kernel). The value is kernel controlled.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:50 -06:00
Nikolay Aleksandrov 2de81d1eff bridge: mdb: print source list when available
Print the mdb entry's source list when it's available if the user
requested to show details (-d). Each source has an associated timer
which controls if traffic should be forwarded to that S,G entry (if the
timer is non-zero traffic is forwarded, otherwise it's not).
Currently the source list is kernel controlled and can't be changed by
user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:45 -06:00
Nikolay Aleksandrov 1d28c48046 bridge: mdb: print filter mode when available
Print the mdb entry's filter mode when it's available if the user
requested to show details (-d). It can be either include or exclude.
Currently it's kernel controlled and can't be changed by user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:39 -06:00
Nikolay Aleksandrov e331677ea2 bridge: mdb: show igmpv3/mldv2 flags
With IGMPv3/MLDv2 support we have 2 new flags:
 - added_by_star_ex: set when the S,G entry was automatically created
                     because of a *,G entry in EXCLUDE mode
 - blocked: set when traffic for the S,G entry for that port has to be
            blocked
Both flags are used only on the new S,G entries and are currently kernel
managed, i.e. similar to other flags which can't be set from user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:34 -06:00
Nikolay Aleksandrov f94e8b0749 bridge: mdb: print fast_leave flag
We're not showing the fast_leave flag when it's set. Currently that can
be only when an mdb entry is being deleted due to fast leave, so it will
only affect mdb monitor.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:30 -06:00
Nikolay Aleksandrov 547b319762 bridge: mdb: add support for source address
This patch adds the user-space control and dump of mdb entry source
address. When setting the new MDBA_SET_ENTRY_ATTRS nested attribute is
used and inside is added MDBE_ATTR_SOURCE based on the address family.
When dumping we look for MDBA_MDB_EATTR_SOURCE and if present we add the
"src x.x.x.x" output. The source address will be always shown as it's
needed to match the entry to modify it from user-space.

Example:
 $ bridge mdb add dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100
 $ bridge mdb show
 dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:25 -06:00
David Ahern f905191a48 Update kernel headers
Update kernel headers to commit:
    bc081a693a56 ("Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:04:57 -06:00
Antony Antony 4322b13c8d ip xfrm: support setting XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute can be set in states (4.19+)
It is optional and the kernel default is 0xffffffff
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

e.g.
./ip/ip xfrm state add output-mark 0x6 mask 0xab proto esp \
 auth digest_null 0 enc cipher_null ''
ip xfrm state
src 0.0.0.0 dst 0.0.0.0
	proto esp spi 0x00000000 reqid 0 mode transport
	replay-window 0
	output-mark 0x6/0xab
	auth-trunc digest_null 0x30 0
	enc ecb(cipher_null)
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	sel src 0.0.0.0/0 dst 0.0.0.0/0

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:10:47 -06:00
Jiri Pirko 8dc1db80e4 devlink: Add health reporter test command support
Add health reporter test command and allow user to trigger a test event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:08:53 -06:00
Jacob Keller 012164718b devlink: support setting the overwrite mask attribute
The recently added DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK allows
userspace to indicate how a device should handle subsections of a flash
component when updating. For example, a flash component might contain
vital data such as PCIe serial number or configuration fields such as
settings that control device bootup.

The overwrite mask allows specifying whether the device should overwrite
these subsections when updating from the provided image. If nothing is
specified, then the update is expected to preserve all vital fields and
configuration.

Add support for specifying the overwrite mask using the new "overwrite"
option to the flash command line.

By specifying "overwrite identifiers", the user request that the flash
update should overwrite any settings in the updated flash component with
settings from the provided flash image

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite identifiers

By specifying "overwrite settings" the user requests that the flash update
should overwrite any settings in the updated flash component with setting
values from the provided flash image.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings

These options may be combined, in which case both subsections will be sent
in the overwrite mask, resulting in a request to overwrite all settings and
identifiers stored in the updated flash components.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings overwrite identifiers

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:02:16 -06:00
David Ahern 34be2d2619 Update kernel headers
Update kernel headers to commit:
    9faebeb2d800 ("Merge branch 'ethtool-allow-dumping-policies-to-user-space'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:01:26 -06:00
Stephen Hemminger be1bea8432 addr: Fix noprefixroute and autojoin for IPv4
These were reported as IPv6-only and ignored:

     # ip address add 192.0.2.2/24 dev dummy5 noprefixroute
     Warning: noprefixroute option can be set only for IPv6 addresses
     # ip address add 224.1.1.10/24 dev dummy5 autojoin
     Warning: autojoin option can be set only for IPv6 addresses

This enables them back for IPv4.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Adel Belhouane <bugs.a.b@free.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:15:56 -07:00
Eyal Birger e410c963e3 ipntable: add missing ndts_table_fulls ntable stat
Used for tracking neighbour table overflows.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:07:10 -07:00
Kamal Heib 10414de9e6 ip: iplink_ipoib.c: Remove extra spaces
Remove the extra space between the reported ipoib attrs - use only one
space instead of two.

Fixes: de0389935f ("iplink: Added support for the kernel IPoIB RTNL ops")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 22:29:05 -07:00
Ciara Loftus d2be31d9b6 ss: add support for xdp statistics
The patch exposes statistics for XDP sockets which can be useful for
debugging purposes.

The stats exposed are:
    rx dropped
    rx invalid
    rx queue full
    rx fill ring empty
    tx invalid
    tx ring empty

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:21:24 -06:00
David Ahern f481515c89 Update kernel headers
Update kernel headers to commit:
    280095713ce2 ("Merge branch 'ibmvnic-refactor-some-send-handle-functions'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:13:21 -06:00
Stephen Hemminger 03fb6fa1d8 uapi: update headers from 5.9-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:50:36 -07:00
Jan Engelhardt fece144abc build: avoid make jobserver warnings
I observe:

	» make -j8 CCOPTS=-ggdb3
	lib
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	make[1]: Nothing to be done for 'all'.
	ip
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	    CC       ipntable.o

MFLAGS is a historic variable of some kind; removing it fixes the
jobserver issue.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:49:24 -07:00
Jakub Kicinski b8663da049 ip: promote missed packets to the -s row
missed_packet_errors are much more commonly reported:

linux$ git grep -c '[.>]rx_missed_errors ' -- drivers/ | wc -l
64
linux$ git grep -c '[.>]rx_over_errors ' -- drivers/ | wc -l
37

Plus those drivers are generally more modern than those
using rx_over_errors.

Since recently merged kernel documentation makes this
preference official, let's make ip -s output more informative
and let rx_missed_errors take the place of rx_over_errors.

Before:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    6.04T      4.67G    0       0       0       67.7M
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       7
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

After:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped missed  mcast
    6.04T      4.67G    0       0       7       67.7M
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:23:29 -06:00
David Ahern cec67df974 Merge branch 'devlink-controller-external-info' into next
Parav Pandit  says:

====================

For certain devlink port flavours controller number and optionally external=
 attributes are reported by the kernel.

(a) controller number indicates that a given port belong to which local or =
external controller.
(b) external port attribute indicates that if a given port is for external =
or local controller.

This short series shows this attributes to user.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:17:48 -06:00
Parav Pandit 748cbad33b devlink: Show controller number of a devlink port
Show the controller number of the devlink port whenever kernel reports
it.

Example of a PCI VF port for an external controller number 1:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:09 -06:00
Parav Pandit 8fadd01101 devlink: Show external port attribute
If a port is for an external controller, port's external attribute is
set. Show such external attribute.

An example of an external controller port for PCI VF:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:04 -06:00
David Ahern 454429e8b4 Update kernel headers
Update kernel headers to commit:
    748d1c8a425e ("Merge branch 'devlink-Use-nla_policy-to-validate-range'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:10:43 -06:00
Roman Mashak aba44dc2ea ip: updated ip-link man page
Added description of link flags allmulticast, promisc and trailers.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-14 20:42:04 -07:00
Wei Wang ad34d5fadb iproute2: ss: add support to expose various inet sockopts
This commit adds support to expose the following inet socket options:
-- recverr
-- is_icsk
-- freebind
-- hdrincl
-- mc_loop
-- transparent
-- mc_all
-- nodefrag
-- bind_address_no_port
-- recverr_rfc4884
-- defer_connect
with the option --inet-sockopt. The individual option is only shown
when set.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:36:06 -06:00
David Ahern c8eb4b52c1 Update kernel headers
Update kernel headers to commit:
4349abdb409b ("net: dsa: don't print non-fatal MTU error if not supported")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:35:28 -06:00
Hoang Le abee772ff1 tipc: support 128bit node identity for peer removing
Problem:
In kernel upstream, we add the support to set node identity with
128bit. However, we are still using legacy format in command tipc
peer removing. Then, we got a problem when trying to remove
offline node i.e:

$ tipc node list
Node Identity                    Hash     State
d6babc1c1c6d                     1cbcd7ca down

$ tipc peer remove address d6babc1c1c6d
invalid network address, syntax: Z.C.N
error: No such device or address

Solution:
We add the support to remove a specific node down with 128bit
node identifier, as an alternative to legacy 32-bit node address.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 20:01:39 -06:00
Roopa Prabhu 6fd53b2a1c iplink: add support for protodown reason
This patch adds support for recently
added link IFLA_PROTO_DOWN_REASON attribute.
IFLA_PROTO_DOWN_REASON enumerates reasons
for the already existing IFLA_PROTO_DOWN link
attribute.

$ cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ ip link set dev vx10 protodown on protodown_reason vrrp on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "proto_down_reason": [ "vrrp" ]
} ]
$ip link set dev vx10 protodown_reason mlag on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <mlag,vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "protodown_reason": [ "mlag","vrrp" ]
} ]

$ip -p -j link show dev vx10
$ip link set dev vx10 protodown off protodown_reason vrrp off
Error: Cannot clear protodown, active reasons.
$ip link set dev vx10 protodown off protodown_reason mlag off
$

Note: for somereason the json and non-json key for protodown
are different (protodown and proto_down). I have kept the
same for protodown reason for consistency (protodown_reason and
proto_down_reason).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:52:13 -06:00
Antony Antony af27494d2e ip xfrm: support printing XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute is set in states (4.19+).
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

sample output: note the output-mark mask
ip xfrm state
	src 192.1.2.23 dst 192.1.3.33
	proto esp spi 0xSPISPI reqid REQID mode tunnel
	replay-window 32 flag af-unspec
	output-mark 0x3/0xffffff
	aead rfc4106(gcm(aes)) 0xENCAUTHKEY 128
	if_id 0x1

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:49:29 -06:00
David Ahern 275eed9be5 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:46:20 -06:00
Phil Sutter 23203b750e ip link: Fix indenting in help text
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.

While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Fixes: 5a3ec4ba64 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-31 12:32:26 -07:00
Johannes Berg cc889b8241 genl: ctrl: support dumping netlink policy
Support dumping the netlink policy of a given generic netlink
family, the policy (with any sub-policies if appropriate) is
exported by the kernel in a general fashion.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:14 -06:00
Johannes Berg d5acae244f libnetlink: add nl_print_policy() helper
This prints out the data from the given nested attribute
to the given FILE pointer, interpreting the firmware that
the kernel has for showing netlink policies.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:07 -06:00
Johannes Berg 784fa9f62f libnetlink: add rtattr_for_each_nested() iteration macro
This is useful for iterating elements in a nested attribute,
if they're not parsed with a strict length limit or such.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:34:29 -06:00
Murali Karicheri ea6aeeb90c ip: iplink: prp: update man page for new parameter
PRP support requires a proto parameter which is 0 for hsr and 1 for
prp. Default is hsr and is backward compatible.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Murali Karicheri 68f027724b iplink: hsr: add support for creating PRP device similar to HSR
This patch enhances the iplink command to add a proto parameters to
create PRP device/interface similar to HSR. Both protocols are
quite similar and requires a pair of Ethernet interfaces. So re-use
the existing HSR iplink command to create PRP device/interface as
well. Use proto parameter to differentiate the two protocols.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Amit Cohen 8e6bce735a devlink: Add fflush() in cmd_mon_show_cb()
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

Without it, stdout output is buffered and not written to the disk.

This is useful when writing scripts that rely on devlink-monitor output.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:13:11 -07:00
Sascha Hauer 7e7a1d107b iproute2: ip maddress: Check multiaddr length
ip maddress add|del takes a MAC address as argument, so insist on
getting a length of ETH_ALEN bytes. This makes sure the passed argument
is actually a MAC address and especially not an IPv4 address which
was previously accepted and silently taken as a MAC address.

While at it, do not print *argv in the error path as this has been
modified by ll_addr_a2n() and doesn't contain the full string anymore,
which can lead to misleading error messages.

Also while at it, replace the hardcoded buffer size with the actual
buffer size using sizeof().

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:12:30 -07:00
Stephen Hemminger bf538de59d uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 16:09:52 -07:00
Leon Romanovsky db6d6becb0 rdma: Properly print device and link names in CLI output
The citied commit broke the CLI output and printed ifindex/ifname
instead of dev/link.

Before:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/lqpn 1 type GSI state RTS sq-psn 0 comm ib_core
[leonro@vm ~]$ rdma res show cq
ifindex 0 ifname rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm ib_core

After:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/- lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
[leonro@vm ~]$ rdma res show cq
dev rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm [ib_core]

It was missed because rdmatool mostly used in JSON mode.

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Leon Romanovsky 7ded3c97b9 rdma: Fix owner name for the kernel resources
Owner of kernel resources is printed in different format than user
resources to easy with the reader by simply looking on the name.
The kernel owner will have "[ ]" around the name.

Before this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm ib_core

After this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm [ib_core]

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Stephen Hemminger 52d767aff8 uapi: update kernel headers
pre-rc1 version of Linux kernel headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-11 13:18:41 -07:00
Mark Zhang e8e8f16ed1 rdma: Document the new "pid" criteria for auto mode
Document the new supported criteria of auto mode. Examples:
$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto pid,type on

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:12 +00:00
Mark Zhang e28133316d rdma: Add "PID" criteria support for statistic counter auto mode
With this new criteria, QPs have different PIDs will be bound to
different counters in auto mode. This can be used in combination with
other criteria like "type". Examples:

$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto type,pid on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp show link mlx5_0 qp-type UD

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:04 +00:00
Mark Zhang cb69794736 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 76251e15ea73
("RDMA/counter: Add PID category support in auto mode")

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:25:04 +00:00
David Ahern e572e3af0d Merge branch 'main' into next
Conflicts:
	bridge/fdb.c
	man/man8/bridge.8

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:21:35 +00:00
Stephen Hemminger 53159d8115 v5.8.0 2020-08-03 10:03:42 -07:00
Stephen Hemminger d530608d33 lnstat: use same version as iproute2
Lnstat was trying to be different and have its own version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Stephen Hemminger fbef655568 replace SNAPSHOT with auto-generated version string
Replace the iproute2 snapshot with a version string which is
autogenerated as part of the build process using git describe.

This will also allow seeing if the version of the command
is built from the same sources is as upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Vasundhara Volam 7332b188a6 devlink: Add board.serial_number to info subcommand.
Add support for reading board serial_number to devlink info
subcommand. Example:

$ devlink dev info pci/0000:af:00.0 -jp
{
    "info": {
        "pci/0000:af:00.0": {
            "driver": "bnxt_en",
            "serial_number": "00-10-18-FF-FE-AD-1A-00",
            "board.serial_number": "433551F+172300000",
            "versions": {
                "fixed": {
                    "board.id": "7339763 Rev 0.",
                    "asic.id": "16D7",
                    "asic.rev": "1"
                },
                "running": {
                    "fw": "216.1.216.0",
                    "fw.psid": "0.0.0",
                    "fw.mgmt": "216.1.192.0",
                    "fw.mgmt.api": "1.10.1",
                    "fw.ncsi": "0.0.0.0",
                    "fw.roce": "216.1.16.0"
                }
            }
        }
    }
}

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 15:35:56 +00:00
Petr Vaněk a7f1974f6e ip-xfrm: add support for oseq-may-wrap extra flag
This flag allows to create SA where sequence number can cycle in
outbound packets if set.

Signed-off-by: Petr Vaněk <pv@excello.cz>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:57:25 +00:00
David Ahern 91922a4121 Update kernel headers
Update kernel headers to commit:
    bd0b33b24897 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:56:28 +00:00
Danielle Ratson e5f4165a9e devlink: Expose port split ability
Add a new attribute that indicates the port split ability to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1
splittable false lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:37:14 +00:00
Danielle Ratson fcbc6c1c71 devlink: Expose number of port lanes
Add a new attribute that indicates the port's number of lanes to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1 lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:36:52 +00:00
Julien Fortin cb17e0cc57 bridge: fdb show: fix fdb entry state output for json context
bridge json fdb show is printing an incorrect / non-machine readable
value, when using -j (json output) we are expecting machine readable
data that shouldn't require special handling/parsing.

$ bridge -j fdb show | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
	"master": "br0",
	"mac": "56:23:28:4f:4f:e5",
	"flags": [],
	"ifname": "vx0",
	"state": "state=0x80"  <<<<<<<<< with the patch: "state": "0x80"
    }
]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-29 18:08:46 -07:00
Briana Oursler 41a9e38469 tc: Add space after format specifier
Add space after format specifier in print_string call. Fixes broken
qdisc tests within tdc testing suite. Per suggestion from Petr Machata,
remove a space and change spacing in tc/q_event.c to complete the fix.

Tested fix in tdc using:
./tdc.py -c qdisc

All qdisc RED tests return ok.

Fixes: d0e450438571("tc: q_red: Add support for qevents "mark" and "early_drop")
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-29 17:03:46 +00:00
Anton Danilov 65c0c4d21b bridge: fdb: the 'dynamic' option in the show/get commands
In most of cases a user wants to see only the dynamic mac addresses
in the fdb output. But currently the 'fdb show' displays tons of
various self entries, those only waste the output without any useful
goal.

New option 'dynamic' for 'show' and 'get' commands forces display
only relevant records.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:41:39 -07:00
Matthieu Baerts 3a53ff7e58 mptcp: show all endpoints when no ID is specified
According to 'ip mptcp help', 'endpoint show' can accept no argument:

  ip mptcp endpoint show [ id ID ]

It makes sense to print all endpoints when no filter is used.

So here if the following command is used, all endpoints are printed:

  ip mptcp endpoint show

Same as:

  ip mptcp endpoint

Fixes: 7e0767cd ("add support for mptcp netlink interface")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:39:58 -07:00
David Ahern 1ca65af1c5 Merge branch 'devlink-port-health' into next
Moshe Shemesh  says:

====================

Implement commands for interaction with per-port devlink health
reporters. To do this, adapt devlink-health for usage of port handles
with any existing devlink-health subcommands. Add devlink-port health
subcommand as an alias for devlink-health.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:34:07 +00:00
Vladyslav Tarasiuk 1fe8c44bd9 devlink: Update devlink-health and devlink-port manpages
Describe support for per-port reporters in devlink-health and
devlink-port commands.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:37 +00:00
Vladyslav Tarasiuk 211c8d6ca9 devlink: Add devlink port health command
Add devlink port health show subcommand which displays information about
specified port reporter or all present port reporters as in the example.
Device and port reporters can be distinguished by a handle being used.

Make other devlink-health subcommands be aliased by devlink port health.
Refactor devlink-health commands for usage of port handles in order to
interact with port reporters.

Change devlink health show output to dump information about both device
and port reporters with correct handles.

Example:
$ devlink health show
pci/0000:00:0b.0:
  reporter fw
    state healthy error 0 recover 0 auto_dump true
  reporter fw_fatal
    state healthy error 0 recover 0 grace_period 1200000 auto_recover true auto_dump true
pci/0000:00:0b.0/1:
  reporter tx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink health show pci/0000:00:0b.0/1 reporter rx
Which is equivalent to:
$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink port health show pci/0000:00:0b.0/1 reporter rx -j --pretty
{
    "health": {
         "pci/0000:00:0b.0/1": [ {
                 "reporter": "rx",
                 "state": "healthy",
                 "error": 0,
                 "recover": 0,
                 "grace_period": 500,
                 "auto_recover": true,
                 "auto_dump": true
              } ]
    }
}

$ devlink health set pci/0000:00:0b.0/1 reporter rx grace_period 5000
Which is equivalent to:
$ devlink port health set pci/0000:00:0b.0/1 reporter rx grace_period 5000

$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 5000 auto_recover true auto_dump true

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:32 +00:00
Vladyslav Tarasiuk e533faa72e devlink: Add a possibility to print arrays of devlink port handles
Add a capability of printing port handles for arrays in non-JSON format
in devlink-health manner.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:26 +00:00
Stephen Hemminger 848b1b8e04 uapi: update bpf.h
Upstrean 5.8-rc6 changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-21 09:18:15 -07:00
Guillaume Nault 4735df15a2 testsuite: Add tests for bareudp tunnels
Test the plain MPLS (unicast and multicast) and IP (v4 and v6) modes.
Also test the multiproto option for MPLS and for IP.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:30:55 -07:00
Anton Danilov 8f5a602f7a misc: make the pattern matching case-insensitive
To improve the usability better use case-insensitive pattern-matching
in ifstat, nstat and ss tools.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:29:55 -07:00
Jamie Gloudon 66702fb9ba tc/m_estimator: Print proper value for estimator interval in raw.
While looking at the estimator code, I noticed an incorrect interval
number printed in raw for the handles. This patch fixes the formatting.

Before patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=4294967294 e=2]

After patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=-2 e=2]

Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:25:56 -07:00
David Ahern 7b6361bf61 Merge branch 'tc-qevent-block' into next
Petr Machata  says:

====================

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well:

    # ip link add up type dummy
    # tc qdisc add dev dummy1 root handle 1: \
         red min 30000 max 60000 avpkt 1000 qevent early_drop block 100
    # tc filter add block 100 pref 1234 handle 102 matchall action drop
    # tc filter show block 100
    Cannot find block "100"

This patchset fixes this issue:

    # tc filter show block 100
    filter protocol all pref 1234 matchall chain 0
    filter protocol all pref 1234 matchall chain 0 handle 0x66
      not_in_hw
            action order 1: gact action drop
             random type none pass val 0
             index 2 ref 1 bind 1

In patch #1, the helpers and necessary infrastructure is introduced,
including a new qdisc_util callback that implements sniffing out bound
blocks in a given qdisc.

In patch #2, RED implements the new callback.

v3:
- Patch #1:
    - Do not pass &ctx->found directly to has_block. Do it through a
      helper variable, so that the callee does not overwrite the result
      already stored in ctx->found.

v2:
- Patch #1:
    - In tc_qdisc_block_exists_cb(), do not initialize 'q'.
    - Propagate upwards errors from q->has_block.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:36:41 +00:00
Petr Machata 02dce2fdce tc: q_red: Implement has_block for RED
In order for "tc filter show block X" to find a given block, implement the
has_block callback.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:49 +00:00
Petr Machata af0e036c09 tc: Look for blocks in qevents
When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well.

In order to support that, extend struct qdisc_util with a new callback,
has_block. That should report whether, give the attributes in TCA_OPTIONS,
a blocks with a given number is bound to a qevent. In
tc_qdisc_block_exists_cb(), invoke that callback when set.

Add a helper to the tc_qevent module that walks the list of qevents and
looks for a given block. This is meant to be used by the individual qdiscs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:02 +00:00
Paolo Abeni 9c3be2c0ee ss: mptcp: add msk diag interface support
This implement support for MPTCP sockets type, comprising
extended socket info. Note that we need to add an extended
attribute carrying the actual protocol number to the diag
request.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:57:36 +00:00
David Ahern beaf281cff Update kernel headers
Update kernel headers to commit:
    81adcd65b685 ("ksz884x: switch from 'pci_' to 'dma_' API")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:56:53 +00:00
David Ahern b78c480532 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:52:43 +00:00
Eyal Birger f33a871b80 ip xfrm: policy: support policies with IF_ID in get/delete/deleteall
The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).

Add support for getting/deleting policies with this attribute.

For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Eyal Birger ee93c1107f ip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies
In commit aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
I added the ability to set/print the xfrm interface ID without updating
the man page.

Fixes: aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Hoang Huu Le ca75a86337 tipc: fixed a compile warning in tipc/link.c
Fixes: 5027f233e3 ("tipc: add link broadcast get")
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:43:32 -07:00
Julien Fortin 8fc09aff8d bridge: fdb get: add missing json init (new_json_obj)
'bridge fdb get' has json support but the json object is never initialized

before patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0
56:23:28:4f:4f:e5 dev vx0 master br0 permanent
$

after patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0 | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
        "master": "br0",
        "mac": "56:23:28:4f:4f:e5",
        "flags": [],
        "ifname": "vx0",
        "state": "permanent"
    }
]
$

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:41:42 -07:00
Tony Ambardar 650591a7a7 configure: support ipset version 7 with kernel version 5
The configure script checks for ipset v6 availability but doesn't test
for v7, which is backward compatible and used on kernel v5.x systems.
Update the script to test for both ipset versions. Without this change,
the tc ematch function em_ipset will be disabled.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:48:02 -07:00
Andrea Claudi a8d6f51c84 ip address: remove useless include
utils.h is included two times in ipaddress.c, there is no need for that.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:47:28 -07:00
Stephen Hemminger 0689785782 genl: use <> for system includes
Be consistent about local versus system headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:41:24 -07:00
Stephen Hemminger a12b203c78 rtacct: drop unused header 2020-07-08 08:40:20 -07:00
Stephen Hemminger d44bcd2fbf iplink_bareudp: use common include syntax
Follow the precedent of other parts of iproute2 follow the example of:
  Standard libc headers
  Linux headers

  Iproute2 support headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:38:58 -07:00
Louis Peens 7c8d7848c7 devlink: add 'disk' to 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:57 -07:00
Ido Schimmel 2d4c3f65e2 devlink: Document zero policer identifier
When setting a policer to a trap group, a value of "0" will unbind the
currently bound policer from the group.

The behavior is intentional and tested in kernel selftests, so document
it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:24 -07:00
Guillaume Nault eb09a15c12 tc: flower: support multiple MPLS LSE match
Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:12:43 -07:00
Guillaume Nault a6c5c952ab ip link: initial support for bareudp devices
Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:11:05 -07:00
Dmitry Yakunin 8f1cd119b3 lib: fix checking of returned file handle size for cgroup
Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:05:54 -07:00
Sorah Fukumori 9e5d246877 ip fou: respect preferred_family for IPv6
ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:03:09 -07:00
Anton Danilov d80a05b795 tc: improve the qdisc show command
Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Stephen Hemminger 085622b1f5 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Bjarni Ingi Gislason 860a5d12d5 devlint-health.8: use a single-font macro for a single argument
Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:47 -07:00
Bjarni Ingi Gislason f9bc806c9d devlink-dev.8: use a single-font macro for one argument
Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:38 -07:00
Bjarni Ingi Gislason 472fb39d55 devlink.8: Use a single-font macro for a single argument
Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:34 -07:00
Bjarni Ingi Gislason 57cfcc62af man8/bridge.8: fix misuse of two-fonts macros
Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:28 -07:00
Bjarni Ingi Gislason 2df0dc2437 libnetlink.3: display section numbers in roman font, not boldface
Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-07-06 10:46:23 -07:00
David Ahern a5c9d01c5c Merge branch 'rdma-raw-format-dumps' into next
Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:49 +00:00
Maor Gottlieb e2bbf737e6 rdma: Add support to get MR in raw format
Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:37 +00:00
Maor Gottlieb 94323e9611 rdma: Add support to get CQ in raw format
Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:33 +00:00
Maor Gottlieb 7c01e0fc9c rdma: Add support to get QP in raw format
Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:00 +00:00
Maor Gottlieb 8f23492823 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:10:50 +00:00
David Ahern 79ea01927c Merge branch 'tc-qevents' into next
Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:45:48 +00:00
Petr Machata d0e4504385 tc: q_red: Add support for qevents "mark" and "early_drop"
The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:49 +00:00
Petr Machata 3cf51fb3c8 man: tc: Describe qevents
Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:45 +00:00
Petr Machata 01bb0bcd00 tc: Add helpers to support qevent handling
Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:27 +00:00
Po Liu bc4d9f982f action police: make 'mtu' could be set independently in police action
Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:34:04 +00:00
Po Liu 3c5570706b action police: change the print message quotes style
Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:33:59 +00:00
Alexandre Cassen 30f3beea0d add support to keepalived rtm_protocol
Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:03:45 +00:00
David Ahern 482e463d6c Merge branch 'devlink-port-mac-addr' into next
Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:53 +00:00
Parav Pandit 4dca81e9a8 devlink: Support setting port function hardware address
Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:32 +00:00
Parav Pandit b3adafd154 devlink: Support querying hardware address of port function
Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:22 +00:00
Parav Pandit 2de449df19 devlink: Move devlink port code at start to reuse
To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:48:34 +00:00
David Ahern e17466e484 Update kernel headers
Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:33:15 +00:00
Stephen Hemminger 2f31d12a25 man/tc: remove obsolete reference to ipchains
It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-24 12:13:46 -07:00
Roi Dayan 473d18e219 ip address: Fix loop initial declarations are only allowed in C99
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 15:05:20 -07:00
Stephen Hemminger 3d66d83d25 uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:52:38 -07:00
Ido Schimmel abda1e9d2b devlink: Add 'mirror' trap action
Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Ido Schimmel fd71244a20 devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Stephen Hemminger 12fafa27c7 devlink: update include files
Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:49:46 -07:00
Roopa Prabhu 468f787f64 bridge: support for nexthop id in fdb entries
This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:58 +00:00
Roopa Prabhu a56d17463c ipnexthop: support for fdb nexthops
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:29 +00:00
David Ahern 5f6f17db3b Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-08 14:40:54 +00:00
Stephen Hemminger e4932ae6b3 uapi: update headers
Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-05 08:36:54 -07:00
Stephen Hemminger 0a5dbbeddb Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2020-06-05 08:33:29 -07:00
Stephen Hemminger 1bfa3b3f66 v5.7.0 2020-06-02 20:35:00 -07:00
Donald Sharp 2c78aba2fb nexthop: Fix Deletion display
Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:08:46 -07:00
Stephen Hemminger 6facadcfb6 uapi: fix comment in xfrm.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:07:02 -07:00
Ian K. Coolidge 5413a735a6 iproute2: ip addr: Add support for setting 'optimistic'
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:33 +00:00
Ian K. Coolidge 9d59c86e57 iproute2: ip addr: Organize flag properties structurally
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:22 +00:00
Roman Mashak bd4b8c632e tc: report time an action was first used
Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 22:51:19 +00:00
Andrea Claudi 354efaec38 bpf: Fixes a snprintf truncation warning
gcc v9.3.1 reports:

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
Andrea Claudi 358abfe004 Revert "bpf: replace snprintf with asprintf when dealing with long buffers"
This reverts commit c0325b0638.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b0638 is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b0638 ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
David Ahern e50290e687 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:08:27 +00:00
Tuong Lien 9a25abde3a tipc: enable printing of broadcast rcv link stats
This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:07:22 +00:00
Alexander Aring 9f91f1b7b8 lwtunnel: add support for rpl segment routing
This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 00:03:17 +00:00
Roman Mashak db35e411ec tc: action: fix time values output in JSON format
Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd6255 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 21:19:04 -07:00
Stephen Hemminger 1c7aa12104 uapi: update to bpf.h
Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:31:54 -07:00
Eric Dumazet d7c67a6ed4 utils: remove trailing zeros in print_time() and print_time64()
Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:30 -07:00
Paul Blakey 924c43778a man: tc-ct.8: Add manual page for ct tc action
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:24 -07:00
Maciej Fijalkowski 42796dcd36 tc: mqprio: reject queues count/offset pair count higher than num_tc
Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-18 14:57:15 +00:00
Dmitry Yakunin 7bd9188581 ss: add checks for bc filter support
As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:38 +00:00
Dmitry Yakunin 14f4bda590 ss: add support for cgroup v2 information and filtering
This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:35 +00:00
Dmitry Yakunin d5e6ee0dac ss: introduce cgroup2 cache and helper functions
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:04 +00:00
Po Liu 965a5f6a1b iproute2-next: add gate action man page
This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:20:12 +00:00
Po Liu 07d5ee70b5 iproute2-next:tc:action: add a gate control action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:19:46 +00:00
David Ahern 0e9b227e2d Update kernel headers and import tc_gate.h
Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:18:15 +00:00
Eric Dumazet 0ecb90b33c tc: fq: fix two issues
My latest patch missed the fact that this file got JSON support.

Also fixes a spelling error added during JSON change.

Fixes: be9ca9d541 ("tc: fq: add timer_slack parameter")
Fixes: d15e2bfc04 ("tc: fq: add support for JSON output")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:27:26 -07:00
Stephen Hemminger 8142c76232 ss: update to bw print
Display kilobit with the standard suffix.
Add comment to describe where data rate suffixes come from.
Add support for terrabit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:18:58 -07:00
Jakub Kicinski ec04b6fc24 devlink: support kernel-side snapshot id allocation
Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 17:10:27 +00:00
Eric Dumazet e133fa9c73 ss: add support for Gbit speeds in sprint_bw()
Also use 'g' specifier instead of 'f' to remove trailing zeros,
and increase precision.

Examples of output :
 Before        After
 8.0Kbps       8Kbps
 9.9Mbps       9.92Mbps
 55001Mbps     55Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 09:50:22 -07:00
David Ahern 8c109059b5 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:49:38 +00:00
David Ahern c1b21f5286 Import rpl.h and rpl_iptunnel.h uapi headers
Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:23:14 +00:00
Davide Caratti 3175bca718 tc: full JSON support for 'bpf' filter
example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:19:06 +00:00
David Ahern ae57e82da0 Update kernel headers
Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:11:22 +00:00
Benjamin Poirier 0501fe734f Replace open-coded instances of print_nl()
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier e0c457b1a5 bridge: Align output columns
Use fixed column widths to improve readability.

Before:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
vx0      2       2
         1010-1020       1010-1020
         1030    65556
vx-longname      2       2

After:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port              vlan-id    tunnel-id
vx0               2          2
                  1010-1020  1010-1020
                  1030       65556
vx-longname       2          2

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 5a07a5df5a json_print: Return number of characters printed
When outputting in normal mode, forward the return value from
color_fprintf().

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier b262a9becb bridge: Fix output with empty vlan lists
Consider this configuration:

ip link add br0 type bridge
ip link add vx0 type vxlan dstport 4789 external
ip link set dev vx0 master br0
bridge vlan del vid 1 dev vx0
ip link add vx1 type vxlan dstport 4790 external
ip link set dev vx1 master br0

	root@vsid:/src/iproute2# ./bridge/bridge vlan
	port    vlan-id
	br0      1 PVID Egress Untagged

	vx0     None
	vx1      1 PVID Egress Untagged

	root@vsid:/src/iproute2#

Note the useless and inconsistent empty lines.

	root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
	port    vlan-id tunnel-id
	br0
	vx0     None
	vx1

What's the difference between "None" and ""?

	root@vsid:/src/iproute2# ./bridge/bridge -j -p vlan tunnelshow
	[ {
		"ifname": "br0",
		"tunnels": [ ]
	    },{
		"ifname": "vx1",
		"tunnels": [ ]
	    } ]

Why does vx0 appear in normal output and not json output?
Why output an empty list for br0 and vx1?

Fix these inconsistencies and avoid outputting entries with no values. This
makes the behavior consistent with other iproute2 commands, for example
`ip -6 addr`: if an interface doesn't have any ipv6 addresses, it is not
part of the listing.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 91b1b49ed3 bridge: Fix typo
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 594b2d7799 bridge: Use consistent column names in vlan output
Fix singular vs plural. Add a hyphen to clarify that each of those are
single fields.

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Xin Long 4e578c78fe tc: f_flower: add options support for erspan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:27 +00:00
Xin Long 93c8d5f72f tc: f_flower: add options support for vxlan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:22 +00:00
Xin Long 668fd9b25d tc: m_tunnel_key: add options support for erpsan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:18 +00:00
Xin Long f72c3ad00f tc: m_tunnel_key: add options support for vxlan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:14 +00:00
Xin Long 39fa047938 iproute_lwtunnel: add options support for erspan metadata
This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:09 +00:00
Xin Long b1bc0f3892 iproute_lwtunnel: add options support for vxlan metadata
This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:03 +00:00
Xin Long ca7614d4c6 iproute_lwtunnel: add options support for geneve metadata
This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.

Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.

With this patch, users can add and dump geneve options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up; ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
  # ip -n b addr add 1.1.1.1/24 dev geneve1
  # ip -n b link set geneve1 up
  # ip -n b route add 2.1.1.0/24 dev geneve1
  # ip -n a link add geneve1 type geneve external
  # ip -n a addr add 2.1.1.1/24 dev geneve1
  # ip -n a link set geneve1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
    1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
    dst 10.1.0.2 dev geneve1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print class and type as uint and print data as hex string.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:31:58 +00:00
Jacob Keller 7ae84fedcb devlink: add support for DEVLINK_CMD_REGION_NEW
Add support to request that a new snapshot be taken immediately for
a devlink region. To avoid confusion, the desired snapshot id must be
provided.

Note that if a region does not support snapshots on demand, the kernel
will reject the request with -EOPNOTSUP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:31:55 -07:00
Stephen Hemminger 2b93f66863 uapi: update bpf.h
Minor spelling in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:30:48 -07:00
Petr Machata 081d6c310d tc: pedit: Support JSON dumping
The action pedit does not currently support dumping to JSON. Convert
print_pedit() to the print_* family of functions so that dumping is correct
both in plain and JSON mode. In plain mode, the output is character for
character the same as it was before. In JSON mode, this is an example dump:

$ tc filter add dev dummy0 ingress prio 125 flower \
         action pedit ex munge udp dport set 12345 \
	                 munge ip ttl add 1        \
			 munge offset 10 u8 clear
$ tc -j filter show dev dummy0 ingress | jq
[
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0
  },
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0,
    "options": {
      "handle": 1,
      "keys": {},
      "not_in_hw": true,
      "actions": [
        {
          "order": 1,
          "kind": "pedit",
          "control_action": {
            "type": "pass"
          },
          "nkeys": 3,
          "index": 1,
          "ref": 1,
          "bind": 1,
          "keys": [
            {
              "htype": "udp",
              "offset": 0,
              "cmd": "set",
              "val": "3039",
              "mask": "ffff0000"
            },
            {
              "htype": "ipv4",
              "offset": 8,
              "cmd": "add",
              "val": "1000000",
              "mask": "ffffff"
            },
            {
              "htype": "network",
              "offset": 8,
              "cmd": "set",
              "val": "0",
              "mask": "ffff00ff"
            }
          ]
        }
      ]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:43:23 +00:00
William Tu 846b6b2da8 erspan: Add type I version 0 support.
The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.

To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
	local 172.16.1.100 remote 172.16.1.200 \
	erspan_ver 0

CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:40:10 +00:00
Paolo Abeni 0c42c6b130 man: ip.8: add reference to mptcp man-page
While at it, additionally fix a mandoc warning in mptcp.8

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 17:36:14 +00:00
David Ahern d38f2a10dd Merge branch 'mptcp' into next
Paolo Abeni  says:

====================

This introduces support for the MPTCP PM netlink interface, allowing admins
to configure several aspects of the MPTCP path manager. The subcommand is
documented with a newly added man-page.

This series also includes support for MPTCP subflow diag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:50:25 +00:00
Paolo Abeni 2d8b5fe93e man: mptcp man page
describe the mptcp subcommands implemented so far.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:47:45 +00:00
Davide Caratti 712fdd98c0 ss: allow dumping MPTCP subflow information
[root@f31 packetdrill]# ss -tni

 ESTAB    0        0           192.168.82.247:8080           192.0.2.1:35273
          cubic wscale:7,8 [...] tcp-ulp-mptcp flags:Mec token:0000(id:0)/5f856c60(id:0) seq:b810457db34209a5 sfseq:1 ssnoff:0 maplen:190

Additionally extends ss manpage to describe the new entry layout.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:44:55 +00:00
Paolo Abeni 7e0767cd86 add support for mptcp netlink interface
Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits

Examples:
1. Allows multiple subflows per MPTCP connection
   $ ip mptcp limits set subflows 2

2. Accept ADD_ADDR announcement from the peer (server):
   $ ip mptcp limits set add_addr_accepted 2

3. Add a ipv4 address to be annunced for backup subflows:
   $ ip mptcp endpoint add 10.99.1.2 signal backup

4. Add an ipv6 address used as source for additional subflows:
   $ ip mptcp endpoint add 2001::2 subflow

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:43:18 +00:00
David Ahern 02ade5a8ea Update kernel headers and import mptcp.h
Update kernel headers to commit
    790ab249b55d ("net: ethernet: fec: Prevent MII event after MII_SPEED write")

and import mptcp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:41:39 +00:00
Eric Dumazet be9ca9d541 tc: fq: add timer_slack parameter
Commit 583396f4ca4d ("net_sched: sch_fq: enable use of hrtimer slack")
added TCA_FQ_TIMER_SLACK parameter, with a default value of 10 usec.

Add the corresponding tc support to get/set this tunable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Eric Dumazet 7868f802e2 tc: fq_codel: add drop_batch parameter
Commit 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
added the new TCA_FQ_CODEL_DROP_BATCH_SIZE parameter, set by default to 64.

Add to tc command the ability to get/set the drop_batch

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Xin Long d27fc6390c xfrm: also check for ipv6 state in xfrm_state_keep
As commit f9d696cf41 ("xfrm: not try to delete ipcomp states when using
deleteall") does, this patch is to fix the same issue for ip6 state where
xsinfo->id.proto == IPPROTO_IPV6.

  # ip xfrm state add src 2000::1 dst 2000::2 spi 0x1000 \
    proto comp comp deflate mode tunnel sel src 2000::1 dst \
    2000::2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

Note that the xsinfo->proto in common states can never be IPPROTO_IPV6.

Fixes: f9d696cf41 ("xfrm: not try to delete ipcomp states when using deleteall")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:37 -07:00
Jiri Pirko 0149dabf2a tc: m_action: check cookie hex string len
Check the cookie hex string len is dividable by 2 as the valid hex
string always should be.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:27 -07:00
David Ahern 60f1075c21 Merge branch 'macsec-offload' into next
Igor Russkikh  says:

====================

From: Mark Starovoytov <mstarovoitov@marvell.com>

This series adds support for selecting the offloading mode of a MACsec
interface at link creation time.
Available modes are for now 'off', 'phy' and 'mac', 'off' being the default
when an interface is created.

First patch adds support for MAC offloading.

Last patch allows a user to change the offloading mode at runtime
through a new attribute, `ip link add link ... offload`:

  # ip link add link enp1s0 type macsec encrypt on offload off
  # ip link add link enp1s0 type macsec encrypt on offload phy
  # ip link add link enp1s0 type macsec encrypt on offload mac

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:20 +00:00
Mark Starovoytov bcbeb35ca4 macsec: add support for specifying offload at link add time
This patch adds support for configuring offload mode upon MACsec
device creation.

If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).

Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:03 +00:00
Mark Starovoytov 998534c99e macsec: add support for MAC offload
This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:31:37 +00:00
Stephen Hemminger b831c5ffcc bridge: man page spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:48:57 -07:00
Bastien Roucariès 8d5d91fd58 State of bridge STP port are now case insensitive
Improve use experience

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 498883a00f Document root_block option
Root_block is also called root port guard, document it.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 19bbebc459 Better documentation of BDPU guard
Document that guard disable the port and how to reenable it

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 420febf961 Document BPDU filter option
Disabled state is also BPDU filter

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 1cad8f8d78 Improve hairpin mode description
Mention VEPA and reflective relay.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 706f7d35e2 Better documentation of mcast_to_unicast option
This option is useful for Wifi bridge but need some tweak.

Document it from kernel patches documentation

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Brian Norris 8b9d5728c1 man: replace $(NETNS_ETC_DIR) and $(NETNS_RUN_DIR) in ip-netns(8)
These can be configured to different paths. Reflect that in the
generated documentation.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Brian Norris 48e05899d0 man: add ip-netns(8) as generation target
Prepare for adding new variable substitutions. Unify the sed rules while
we're at it, since there's no need to write this out 4 times.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Benjamin Lee f03ad792f3 tc: fq_codel: fix class stat deficit is signed int
The fq_codel class stat deficit is a signed int.  This is a regression
from when JSON output was added.

Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:34:56 -07:00
Odin Ugedal 14d2df8874 q_cake: properly print memlimit
Load memlimit so that it will be printed if it isn't set to zero.

Also add a space to properly print it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal 6f883f168c q_cake: Make fwmark uint instead of int
This will help avoid overflow, since setting it to 0xffffffff would
result in -1 when converted to integer, resulting in being "-1", setting
the fwmark to 0x00.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal e07c57e94e tc_util: detect overflow in get_size
This detects overflow during parsing of value using get_size:

eg. running:

$ tc qdisc add dev lo root cake memlimit 11gb

currently gives a memlimit of "3072Mb", while with this patch it errors
with 'illegal value for "memlimit": "11gb"', since memlinit is an
unsigned integer.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:31:01 -07:00
Eran Ben Elisha 4aa0c9c9f8 devlink: Add devlink health auto_dump command support
Add support for configuring auto_dump attribute per reporter.
With this attribute, one can indicate whether the devlink kernel core
should execute automatic dump on error.

The change will be reflected in show, set and man commands.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:27:13 +00:00
David Ahern 59ba1dd011 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:26:27 +00:00
Benjamin Lee fe821d64e6 man: tc-htb.8: fix class prio is not mandatory
Fix description for htb class prio parameter to indicate it's not
mandatory.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee 6ecd0198c0 man: tc-htb.8: add missing class parameter quantum
Add description for htb class parameter quantum.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee d8d59421b6 man: tc-htb.8: add missing qdisc parameter r2q
Add description for htb qdisc parameter r2q.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Petr Machata 20927e0525 ip: link_gre: Do not send ERSPAN attributes to GRE tunnels
In the commit referenced below, ip link started sending ERSPAN-specific
attributes even for GRE and gretap tunnels. Fix by more carefully
distinguishing between the GRE/tap and ERSPAN modes. Do not show
ERSPAN-related help in GRE/tap mode, likewise do not accept ERSPAN
arguments, or send ERSPAN attributes.

Fixes: 83c543af87 ("erspan: set erspan_ver to 1 by default")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:02:54 -07:00
Jiri Pirko c4dfddccef devlink: fix JSON output of mon command
The current JSON output of mon command is broken. Fix it and make sure
that the output is a valid JSON. Also, handle SIGINT gracefully to allow
to end the JSON properly.

Example:
$ devlink mon -j -p
{
    "mon": [ {
            "command": "new",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "eth",
                    "netdev": "eth0",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        } ]
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 13:59:12 -07:00
David Ahern 5c762c3bc2 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:42:33 +00:00
Petr Machata 74c8610f3b man: tc-pedit: Drop the claim that pedit ex is only for IPv4
This sentence predates addition of extended pedit for IPv6 packets.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:59 +00:00
Petr Machata f91f788c70 man: tc-pedit: Add examples for dsfield and retain
Describe a way to update just the DSCP and just the ECN part of the
dsfield. That is useful on its own, but also it shows how retain works.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Petr Machata 2d9a8dc439 tc: p_ip6: Support pedit of IPv6 dsfield
Support keywords dsfield, traffic_class and tos in the IPv6 context.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Jiri Pirko 1c3ed78001 devlink: remove unused "jw" field
This field is not used. Remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:28 +00:00
Stephen Hemminger 27136cab54 man/tc-actions: fix formatting
Fix error from make check.
n-old.tmac: <standard input>: line 86: 'R' is a string (producing the registered sign), not a macro.
Error in tc-actions.8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:07:54 -07:00
Jiri Pirko e00248d296 man: add man page for devlink dpipe
Add simple man page for devlink dpipe.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:06:00 -07:00
Jiri Pirko 885f4b0d7a devlink: remove "dev" object sub help messages
Remove duplicate sub help messages for "dev" object and have them all
show help message for "dev".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b2522187d8 devlink: Fix help message for dpipe
Have one help message for all dpipe commands, as it is done for the rest
of the devlink object. Possible and required options to the help.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 342f462efa devlink: rename dpipe_counters_enable struct field to dpipe_counters_enabled
To be consistent with the rest of the code and name of netlink
attribute, rename the dpipe_counters_enable struct fielt
to dpipe_counters_enabled.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 192e7b3ffa devlink: Add alias "counters_enabled" for "counters" option
To be consistent with netlink attribute name and also with the
"dpipe table show" output, add "counters_enabled" for "counters" in
"dpipe table set" command.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 0b1875cdc6 devlink: fix encap mode manupulation
DEVLINK_ATTR_ESWITCH_ENCAP_MODE netlink attribute carries enum. But the
code assumes bool value. Fix this by treating the encap mode in the same
way as other eswitch mode attributes, switching from "enable"/"disable"
to "basic"/"none", according to the enum. Maintain the backward
compatibility to allow user to pass "enable"/"disable" too. Also to be
in-sync with the rest of the "mode" commands, rename to "encap-mode".
Adjust the help and man page accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 90ce848b05 devlink: Fix help and man of "devlink health set" command
Fix the help and man page of "devlink health set" command to be aligned
with the rest of helps and man pages.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b37a863cb2 devlink: remove custom bool command line options parsing
Change the code that is doing custom processing of boolean command line
options to use dl_argv_bool(). Extend strtobool() to accept
"enable"/"disable" to maintain current behavior.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Stephen Hemminger 5d10f24fdd Merge ../iproute2-next 2020-04-06 10:00:12 -07:00
Jiri Pirko 0827cc53f3 tc: show used HW stats types
If kernel provides the attribute, show the used HW stats types.

Example:

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:30:04 +00:00
Ido Schimmel 0141ca64b8 bash-completion: devlink: Extend bash-completion for new commands
Extend bash-completion for two new commands:

devlink trap policer set DEV policer POLICER [ rate RATE ] [ burst BURST ]
devlink trap policer show DEV policer POLICER

And for "policer" / "nopolicer" parameters in existing command:

devlink trap group set DEV group GROUP [ action { trap | drop } ]
                       [ policer POLICER ] [ nopolicer ]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:13 +00:00
Ido Schimmel 02a2a6683f devlink: Add ability to bind policer to trap group
Add ability to associate a policer with a trap group. The policer can be
unbound by using the 'nopolicer' keyword. In which case, the value
encoded in the 'DEVLINK_ATTR_TRAP_POLICER_ID' attribute will be '0'.
This is consistent with ip-link 'nomaster' keyword and the 'IFLA_MASTER'
attribute.

Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops policer 2
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "policer": 2
            } ]
    }
}

# devlink trap group set netdevsim/netdevsim10 group l3_drops nopolicer
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:07 +00:00
Ido Schimmel a66af55693 devlink: Add devlink trap policer set and show commands
The trap policer set command allows the user to set the parameters of
the packet trap policer, such as rate and burst size. Example:

# devlink trap policer set netdevsim/netdevsim10 policer 1 rate 1000 burst 32

The trap policer show command allows the user to get the current
parameters of an individual policer or a dump of all policers in case
one is not specified. When '-s' is specified the policer's statistics
are shown. Example:

# devlink -jps trap policer show netdevsim/netdevsim10 policer 1
{
    "trap_policer": {
        "netdevsim/netdevsim10": [ {
                "policer": 1,
                "rate": 1000,
                "burst": 32,
                "stats": {
                    "rx": {
                        "dropped": 53
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:24:35 +00:00
David Ahern ce9191ffee Update kernel headers
Update kernel headers to commit:
    7f80ccfe9968 ("net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:23:28 +00:00
Stephen Hemminger 29981db0e0 v5.6.0 2020-03-30 08:06:08 -07:00
Andrea Claudi 0641bed8a3 man: bridge.8: fix bridge link show description
When multiple bridges are present, 'bridge link show' diplays ports
for all bridges. Make this clear in the command description, and
point out the user to the ip command to display ports for a specific
bridge.

Reported-by: Marc Muehlfeld <mmuehlfe@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-30 08:01:02 -07:00
Danielle Ratson 5a3faf2949 bash-completion: devlink: add bash-completion function
Add function for command completion for devlink in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:46:09 +00:00
Petr Machata 6c10fdca70 tc: q_red: Support 'nodrop' flag
Recognize the new configuration option of the RED Qdisc, "nodrop". Add
support for passing flags through TCA_RED_FLAGS, and use it when passing
TC_RED_NODROP flag.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:45:37 +00:00
Jakub Kicinski 1c74c20cbe tc: m_action: rename hw stats type uAPI
Follow the kernel rename to shorten the identifiers.
Rename hw_stats_type to hw_stats.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:42:33 +00:00
David Ahern 1ff1edb6d5 Update kernel headers
Update kernel headers to commit:
    cd556e40fdf3 ("devlink: expand the devlink-info documentation")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:41:49 +00:00
Stephen Hemminger adcab267b8 uapi: update linux/in.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-20 11:05:26 -07:00
Jiri Pirko 341903dd3b tc: m_action: introduce support for hw stats type
Introduce support for per-action hw stats type config.

This patch allows user to specify one of the following types of HW
stats for added action:
immediate - queried during dump time
delayed - polled from HW periodically or sent by HW in async manner
disabled - no stats needed

Note that if "hw_stats" option is not passed, user does not care about
the type, just expects any type of stats.

Examples:
$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats disabled
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 7 sec used 2 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        hw_stats disabled

$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats immediate
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 11 sec used 4 sec
        Action statistics:
        Sent 102 bytes 1 pkt (dropped 1, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 102 bytes 1 pkt
        backlog 0b 0p requeues 0
        hw_stats immediate

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:18:44 +00:00
David Ahern 25091a761f Update kernel headers
Update kernel headers to commit:
    3fd177cb2b47 ("net: stmmac: dwmac_lib: remove unnecessary checks in dwmac_dma_reset()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:17:55 +00:00
Guillaume Nault 72cc0bafb9 iproute2: fix MPLS label parsing
The initial value of "label" in parse_mpls() is 0xffffffff. Therefore
we should test for this value, and not 0, to detect if a label has been
provided. The "!label" test not only fails to detect a missing label
parameter, it also prevents the use of the IPv4 explicit NULL label,
which actually equals 0.

Reproducer:
  $ ip link add name dm0 type dummy
  $ tc qdisc add dev dm0 ingress

  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: act_mpls: Label is required for MPLS push.
  We have an error talking to the kernel
  --> Filter was pushed to the kernel, where it got rejected.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  Error: argument "label" is required
  --> Label 0 was rejected by iproute2.

Expected result:
  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: argument "label" is required
  --> Filter was directly rejected by iproute2.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  --> Filter is accepted.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:56:53 -07:00
Andrea Claudi d9b868436a nexthop: fix error reporting in filter dump
nh_dump_filter is missing a return value check in two cases.
Fix this simply adding an assignment to the proper variable.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:54:42 -07:00
Leslie Monis 94c4ce822c Revert "tc: pie: change maximum integer value of tc_pie_xstats->prob"
This reverts commit 92cfe3260e.

Kernel commit 3f95f55eb55d ("net: sched: pie: change tc_pie_xstats->prob")
removes the need to change the maximum integer value of
tc_pie_stats->prob here.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-10 18:29:26 +00:00
Leslie Monis 92cfe3260e tc: pie: change maximum integer value of tc_pie_xstats->prob
Kernel commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows"),
changes the maximum value of tc_pie_xstats->prob from (2^64 - 1) to
(2^56 - 1).

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-09 02:46:45 +00:00
David Ahern ad19a18e42 Merge branch 'macsec-offload' into next
Antoine Tenart  says:

====================

This series adds support for selecting and reporting the offloading mode
of a MACsec interface. Available modes are for now 'off' and 'phy',
'off' being the default when an interface is created. Modes are not only
'off' and 'on' as the MACsec operations can be offloaded to multiple
kinds of specialized hardware devices, at least to PHYs and Ethernet
MACs. The later isn't currently supported in the kernel though.

The first patch adds support for reporting the offloading mode currently
selected for a given MACsec interface through the `ip macsec show`
command:

   # ip macsec show
   18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c860001 on SA 0
           0: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           0: PN 1, state on, key 01000000000000000000000000000000
->     offload: phy
   19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c880001 on SA 0
           1: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           1: PN 1, state on, key 01000000000000000000000000000000
->     offload: off

The second patch allows an user to change the offloading mode at runtime
through a new subcommand, `ip macsec offload`:

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

If a mode isn't supported, `ip macsec offload` will report an issue
(-EOPNOTSUPP).

Giving the offloading mode when a macsec interface is created was
discussed; it is not implemented in this series. It could come later
on, when needed, as we'll still want to support updating the offloading
mode at runtime (what's implemented in this series).
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:59:35 +00:00
Antoine Tenart c15674d80d macsec: add an accessor for validate_str
This patch adds an accessor for the validate_str array, to handle future
changes adding a member.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:41 +00:00
Antoine Tenart 69166f909b man: document the ip macsec offload command
Add a description of the `ip macsec offload` command used to select the
offloading mode on a macsec interface when the underlying device
supports it.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:36 +00:00
Antoine Tenart 791bc7ee48 macsec: add support for changing the offloading mode
MacSEC can now be offloaded to specialized hardware devices. Offloading
is off by default when creating a new MACsec interface, but the mode can
be updated at runtime. This patch adds a new subcommand,
`ip macsec offload`, to allow users to select the offloading mode of a
MACsec interface. It takes the mode to switch to as an argument, which
can for now either be 'off' or 'phy':

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:30 +00:00
Antoine Tenart da6abdba09 macsec: report the offloading mode currently selected
This patch adds support to report the MACsec offloading mode currently
being enabled, which as of now can either be 'off' or 'phy'. This
information is reported through the `ip macsec show` command:

  # ip macsec show
  18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c860001 on SA 0
          0: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          0: PN 1, state on, key 01000000000000000000000000000000
      offload: phy
  19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c880001 on SA 0
          1: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          1: PN 1, state on, key 01000000000000000000000000000000
      offload: off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:56:41 +00:00
Parav Pandit a5c44b821c devlink: Introduce devlink port flavour virtual
Currently PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.

Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.

An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.

$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:48:11 +00:00
Jiri Pirko 4fe07b8146 devlink: add trap metadata type for flow action cookie
Flow action cookie has been recently added to kernel, print it out.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:46:29 +00:00
David Ahern b6b8e40bf7 Update kernel headers
Update kernel headers to commit
ef71037047b0 ("Merge branch 'act_ct-software-offload-of-established-flows-fixes'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:44:21 +00:00
David Ahern b6de0bf7db Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-28 22:42:49 +00:00
Stephen Hemminger b5a77cf701 uapi: update bpf.h
Updated upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:55:38 -08:00
Andrea Claudi 31824e2299 man: rdma-statistic: Add filter description
Add description for filters on rdma statistics show command.
Also add a filter description on the help message of the command.
Additionally, fix some whitespace issue in the man page.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Andrea Claudi 8f1c9d4a3c man: rdma.8: Add missing resource subcommand description
Add resource subcommand in the OBJECT section and a short
description for it.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Xin Long f9d696cf41 xfrm: not try to delete ipcomp states when using deleteall
In kernel space, ipcomp(sub) states used by main states are not
allowed to be deleted by users, they would be freed only when
all main states are destroyed and no one uses them.

In user space, ip xfrm sta deleteall doesn't filter these ipcomp
states out, and it causes errors:

  # ip xfrm state add src 192.168.0.1 dst 192.168.0.2 spi 0x1000 \
      proto comp comp deflate mode tunnel sel src 192.168.0.1 dst \
      192.168.0.2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

This patch is to fix it by filtering ipcomp states with a check
xsinfo->id.proto == IPPROTO_IPIP.

Fixes: c7699875be ("Import patch ipxfrm-20040707_2.diff")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:50:58 -08:00
Andrea Claudi 229bb886a3 man: ip.8: Add missing vrf subcommand description
Add description to the vrf subcommand and a reference to the
dedicated man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:48:23 -08:00
Donald Sharp 320c5c6e09 ip route: Do not imply pref and ttl-propagate are per nexthop
Currently `ip -6 route show` gives us this output:

sharpd@eva ~/i/ip (master)> ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1 pref medium

Displaying `pref medium` as the last bit of output implies
that the RTA_PREF is a per nexthop value, when it is infact
a per route piece of data.

Change the output to display RTA_PREF and RTA_TTL_PROPAGATE
before the RTA_MULTIPATH data is shown:

sharpd@eva ~/i/ip (master)> ./ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20 pref medium
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:42:59 -08:00
Andrea Claudi 2c7056ac26 nstat: print useful error messages in abort() cases
When nstat temporary file is corrupted or in some other corner cases,
nstat use abort() to stop its execution. This can puzzle some users,
wondering what is the reason for the crash.

This commit replaces abort() with some meaningful error messages and exit()

Reported-by: Renaud Métrich <rmetrich@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:52:08 -08:00
Xin Long 83c543af87 erspan: set erspan_ver to 1 by default
Commit 2897636267 ("erspan: add erspan version II support")
breaks the command:

 # ip link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1

as erspan_ver is set to 0 by default, then IFLA_GRE_ERSPAN_INDEX
won't be set in gre_parse_opt().

  # ip -d link show erspan1
    ...
    erspan remote 10.1.0.1 local 10.1.0.2 ... erspan_index 0 erspan_ver 1
                                              ^^^^^^^^^^^^^^

This patch is to change to set erspan_ver to 1 by default.

Fixes: 2897636267 ("erspan: add erspan version II support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:50:17 -08:00
Stephen Hemminger 0a6ea03be4 uapi: update magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-11 08:16:42 -08:00
Moshe Shemesh 5023df6a21 devlink: Add health error recovery status monitoring
Add support for devlink health error recovery status monitoring.
Update devlink-monitor man page accordingly.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-10 05:29:24 +00:00
Mohit P. Tahiliani 9dced637f8 tc: add support for FQ-PIE packet scheduler
This patch adds support for the FQ-PIE packet Scheduler

Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecn_prob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:24:39 -08:00
Peter Junos 39995691b5 ss: fix tests to reflect compact output
This fixes broken tests in commit c4f5862994 ("ss: use compact output
for undetected screen width")

It also escapes stars as grep is used and more bugs could sneak under
the radar with the previous solution.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:21:06 -08:00
Stephen Hemminger 8f9f2b9cdf devlink: fix warning from unchecked write
Warning seen on Ubuntu

devlink.c: In function ‘cmd_dev_flash’:
devlink.c:3071:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
 3071 |   write(pipe_w, &err, sizeof(err));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-02 04:20:58 -08:00
Andrea Claudi 5cdeb77cd6 ip link: xstats: fix TX IGMP reports string
This restore the string format we have before jsonification, adding a
missing space between v2 and v3 on TX IGMP reports string.

Fixes: a9bc23a792 ("ip: bridge: add xstats json support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:35 -08:00
Andrea Claudi 38dd041bfe ip-xfrm: Fix help messages
After commit 8589eb4efd ("treewide: refactor help messages") help
messages for xfrm state and policy are broken, printing many times the
same protocol in UPSPEC section:

$ ip xfrm state help
[...]
UPSPEC := proto { { tcp | tcp | tcp | tcp } [ sport PORT ] [ dport PORT ] |
                  { icmp | icmp | icmp } [ type NUMBER ] [ code NUMBER ] |
                  gre [ key { DOTTED-QUAD | NUMBER } ] | PROTO }

This happens because strxf_proto function is non-reentrant and gets called
multiple times in the same fprintf instruction.

This commit fix the issue avoiding calls to strxf_proto() with a constant
param, just hardcoding strings for protocol names.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:14 -08:00
David Ahern 8e66c8c112 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-29 15:16:54 +00:00
Stephen Hemminger f9ed2db593 uapi: updates to tcp.h, snmp.h and if_bridge.h
Upstream changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:48:19 -08:00
Stephen Hemminger 74da271551 uapi/pkt_sched: upstream changes from fq_pie
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:46:43 -08:00
Stephen Hemminger aa6d6b223b uapi: update bpf.h and btf.h
Upstream headers from 5.6 pre rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:45:53 -08:00
Stephen Hemminger d80d22d5fd Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Resolved conflict in tc/f_flower.c
2020-01-29 05:44:53 -08:00
Stephen Hemminger d4df55404a v5.5.0 2020-01-27 05:53:09 -08:00
Ron Diskin ff360fe984 devlink: Replace pr_out_bool/uint() wrappers with common print functions
Replace calls for pr_out_bool() and pr_out_uint() with direct calls
to common json_print library function print_bool() and print_uint().

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 5a71671a94 devlink: Replace pr_#type_value wrapper functions with common functions
Replace calls for pr_bool/uint/uint64_value with direct calls for the
matching common json_print library function: print_bool(), print_uint()
and print_u64()

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 3666eb0eb6 devlink: Replace pr_out_str wrapper function with common function
Replace calls for pr_out_str() and pr_out_str_value() with direct calls to
common json_print library functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin b20555e402 devlink: Replace json prints by common library functions
Substitute json prints to use json_print.c common library functions,
instead of directly calling jsonw_functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 98e48e7dd0 json_print: Add new json object function not as array item
Currently new json object opens (and delete_json_obj closes) the object as
an array, what adds prints for the matching bracket '[' ']' at the
start/end of the object. This patch adds new_json_obj_plain() and the
matching delete_json_obj_plain() to enable opening and closing json object,
not as array and leave it to the using function to decide which type of
object to open/close as the main object.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 31ca29b2be json_print: Introduce print_#type_name_value
Until now print_#type functions supported printing constant names and
unknown (variable) values only.
Add functions to allow printing when the name is also sent to the
function as a variable.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Leslie Monis eae5f4b5c8 tc: parse attributes with NLA_F_NESTED flag
The kernel now requires all new nested attributes to set the
NLA_F_NESTED flag. Enable tc {qdisc,class,filter} to parse
attributes that have the NLA_F_NESTED flag set.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:45:48 +00:00
Sabrina Dubroca 22aec42679 ip: xfrm: add espintcp encapsulation
While at it, convert xfrm_xfrma_print and xfrm_encap_type_parse to use
the UAPI macros for encap_type as suggested by David Ahern, and add the
UAPI udp.h header (sync'd from ipsec-next to get the TCP_ENCAP_ESPINTCP
definition).

Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:42:01 +00:00
David Ahern 4df5ad933c Update kernel headers and import udp.h
Update kernel headers to commit:
    4f2c17e0f332 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next")

and import udp.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:40:26 +00:00
Roi Dayan 919046d326 tc: flower: fix print with oneline option
This commit fix all location in flower to use _SL_ instead of \n for
newline to allow support for oneline option.

Example before this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1
  indev ens1f0
  dst_mac 11:22:33:44:55:66
  eth_type ipv4
  ip_proto tcp
  src_ip 2.2.2.2
  src_port 99
  dst_port 1-10\  tcp_flags 0x5/5
  ip_flags frag
  ct_state -trk\  ct_zone 4\  ct_mark 255
  ct_label 00000000000000000000000000000000
  skip_hw
  not_in_hw\    action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 287 sec used 287 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Example output after this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1 \  indev ens1f0\  dst_mac 11:22:33:44:55:66\  eth_type ipv4\  ip_proto tcp\  src_ip 2.2.2.2\  src_port 99\  dst_port 1-10\  tcp_flags 0x5/5\  ip_flags frag\  ct_state -trk\  ct_zone 4\  ct_mark 255\  ct_label 00000000000000000000000000000000\  skip_hw\  not_in_hw\action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 346 sec used 346 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-21 15:40:21 -08:00
Ethan Sommer 5f78bc3e1d make yacc usage POSIX compatible
config: put YACC in config.mk and use environmental variable if present

ss:
use YACC variable instead of hardcoding bison
place options before source file argument
use -b to specify file prefix instead of output file, as -o isn't POSIX
compatible, this generates ssfilter.tab.c instead of ssfilter.c
replace any references to ssfilter.c with references to ssfilter.tab.c

tc:
use -p flag to set name prefix instead of bison-specific api.prefix
directive
remove unneeded bison-specific directives
use -b instead of -o, replace references to previously generated
emp_ematch.yacc.[ch] with references to newly generated
emp_ematch.tab.[ch]

Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:43:22 -08:00
Jan Engelhardt 31f45088c9 build: fix build failure with -fno-common
$ make CCOPTS=-fno-common
gcc ... -o ip
ld: rt_names.o (symbol from plugin): in function "rtnl_rtprot_n2a":
(.text+0x0): multiple definition of "numeric"; ip.o (symbol from plugin):(.text+0x0): first defined here

gcc ... -o tipc
ld: ../lib/libutil.a(utils.o):(.bss+0xc): multiple definition of `pretty';
tipc.o:tipc.c:28: first defined here

References: https://bugzilla.opensuse.org/1160244
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:40:59 -08:00
Stephen Hemminger f4d7ce9bfa ip: use print_nl() to handle one line mode
The helper function print_nl() does the right thing and prints
the newline or backslash.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:32:51 -08:00
Vladis Dronov 970db267a0 ip: fix link type and vlan oneline output
Move link type printing in print_linkinfo() so multiline output does not
break link options line. Add oneline support for vlan's ingress and egress
qos maps.

Before the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>               the option line is broken ^^^
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }   <<< a multiline output despite -oneline
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

After the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 }

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR> \      ingress-qos-map { 1:2 } \      egress-qos-map { 2:1 }

Fixes: 5c302d518f ("vlan support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206241
Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:28:39 -08:00
David Ahern 46eb08dc2e Merge branch 'tc-ets-qdisc' into next
Petr Machata  says:

====================

A new Qdisc, "ETS", has been accepted into Linux at kernel commit
6bff00170277 ("Merge branch 'ETS-qdisc'"). Add iproute2 support for this
Qdisc.

Patch #1, changes libnetlink to admit NLA_F_NESTED in nested attributes.
Patch #2 then adds ETS support as such.

Examples (taken from the kernel patchset):

- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5

- Tweak quantum of one of the classes of the previous Qdisc:

    # tc class ch dev swp1 classid 1:4 ets quantum 1000
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
    # tc class ch dev swp1 classid 1:3 ets quantum 1000
    Error: Strict bands do not have a configurable quantum.

- Purely strict Qdisc with 1:1 mapping between priorities and TCs:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

- Use "bands" to specify number of bands explicitly. Underspecified bands
  are implicitly ETS and their quantum is taken from MTU. The following
  thus gives each band the same weight:

    # tc qdisc add dev swp1 root handle 1: \
	ets bands 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:55:02 +00:00
Petr Machata d2773f1261 tc: Add support for ETS Qdisc
Add a new module to generate and parse options specific to the ETS Qdisc.

Example output:

    bands 8 strict 3 priomap 0 1 2 3 4 5 6 7
qdisc ets 1: root refcnt 2 offloaded bands 8 strict 3 quanta 1514 1514 1514 1514 1514 priomap 0 1 2 3 4 5 6 7 7 7 7 7 7 7 7 7
[
  {
    "kind": "ets",
    "handle": "1:",
    "root": true,
    "refcnt": 2,
    "offloaded": true,
    "options": {
      "bands": 8,
      "strict": 3,
      "quanta": [1514, 1514, 1514, 1514, 1514],
      "priomap": [0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:54:12 +00:00
Petr Machata 0da4cfaa5d libnetlink: parse_rtattr_nested should allow NLA_F_NESTED flag
In kernel commit 8cb081746c03 ("netlink: make validation more configurable
for future strictness"), Linux started implicitly flagging nests with
NLA_F_NESTED, unless the nest is created with nla_nest_start_noflag().

The ETS code uses nla_nest_start() where possible, so it does not work with
the current iproute2 code. Have libnetlink catch up by admitting the flag
in the attribute.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:46:12 +00:00
Ido Schimmel ed81a2a040 ip route: Print "rt_offload" and "rt_trap" indication
The kernel now signals the offload state of a route using the
'RTM_F_OFFLOAD' and 'RTM_F_TRAP' flags. Print these to help users
understand the offload state of each route. The "rt_" prefix is used in
order to distinguish it from the offload state of nexthops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:40:20 +00:00
David Ahern 8b802d20e4 Update kernel headers
Update kernel headers to commit
    9aaa29494030 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:39:15 +00:00
Tuong Lien d5391e186f tipc: fix clang warning in tipc/node.c
When building tipc with clang, the following warning is found:

tipc
    CC       bearer.o
    CC       cmdl.o
    CC       link.o
    CC       media.o
    CC       misc.o
    CC       msg.o
    CC       nametable.o
    CC       node.o
node.c:182:24: warning: field 'key' with variable sized type 'struct tipc_aead_key' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct tipc_aead_key key;

This commit fixes it by putting the memory area allocated for the user
input key along with the variable-sized 'key' structure in the 'union'
form instead.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:48 -08:00
Stephen Hemminger f8bebea915 tc: skbprio: add support for JSON output
Print limit in JSON

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:02 -08:00
Stephen Hemminger 1d6b73be70 tc: prio: fix space in JSON tag
The priomap should not have extra space in the tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:11:41 -08:00
Peter Junos c4f5862994 ss: use compact output for undetected screen width
This change fixes calculation of width in case user pipes the output.

SS output output works correctly when stdout is a terminal. When one
pipes the output, it tries to use 80 or 160 columns. That adds a
line-break if user has terminal width of 100 chars and output is of
the similar width. No width is assumed here.

To reproduce the issue, call
ss | less
and see every other line empty if your screen is between 80 and 160
columns wide.

This second version of the patch fixes screen_width being set to arbitrary
value.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 18:38:08 +00:00
David Ahern 404f2de114 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:49:45 +00:00
Andy Roulin 39ac2d2b80 iplink: bond: print lacp actor/partner oper states as strings
The 802.3ad/LACP actor/partner operating states are only printed as
numbers, e.g,

ad_actor_oper_port_state 15

Add an additional output in ip link show that prints a string describing
the individual 3ad bit meanings in the following way:

ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>

JSON output is also supported, the field becomes a json array:

"ad_actor_oper_port_state_str":
	["active","short_timeout","aggregating","in_sync"]

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:45:32 +00:00
David Ahern a6cf98c23f Update kernel headers
Update kernel headers to commit:
    fe23d63422c8 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:26:50 +00:00
Leslie Monis e819d3a03d tc: fq_codel: fix missing statistic in JSON output
Print JSON object even if tc_fq_codel_xstats->class_stats.drop_next
is negative.

Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 669314e817 tc: tbf: add support for JSON output
Enable proper JSON output for the TBF Qdisc.
Also, fix the style of the statement that's calculating "latency" in
tbf_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 85fdef052b tc: sfq: add support for JSON output
Enable proper JSON output for the SFQ Qdisc.
Use the long double format specifier to print the value of
"probability".
Also, fix the indentation in the online output of the contents in the
tc_sfqred_stats structure.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 46d032d002 tc: sfb: add support for JSON output
Enable proper JSON output for the SFB Qdisc.
Make the output for options "rehash" and "db" explicit.
Use the long double format specifier to print probability values.
Use sprint_time() to print time values.
Also, fix the indentation in sfb_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 0154d096c5 tc: pie: add support for JSON output
Enable proper JSON output for the PIE Qdisc.
Use sprint_time() to print the value of tc_pie_xstats->delay.
Use the long double format specifier to print tc_pie_xstats->prob.
Also, fix the indentation in the oneline output of statistics and update
the man page to reflect this change.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis f6564ed60d tc: hhf: add support for JSON output
Enable proper JSON output for the HHF Qdisc.
Also, use sprint_size() to print size values.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d15e2bfc04 tc: fq: add support for JSON output
Enable proper JSON output for the FQ Qdisc.
Use the "KEY VALUE" format for oneline output of statistics instead of
"VALUE KEY", and remove unnecessary commas from the output.
Use sprint_size() to print size values in fq_print_opt().
Use sprint_time64() to print time values in fq_print_xstats().
Also, update the man page to reflect the changes in the output format.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 90a50a6fa2 tc: codel: add support for JSON output
Enable proper JSON output for the CoDel Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d3136b1e80 tc: choke: add support for JSON output
Enable proper JSON output for the choke Qdisc.
Also, use the long double format specifier to print the value of
"probability".

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d8f673074b tc: cbs: add support for JSON output
Enable proper JSON output for the CBS Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Stephen Hemminger 2dda733f6d utils: fix indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:53:09 -08:00
Antony Antony 2cf4d7af72 ip: xfrm if_id -ve value is error
if_id is u32, error on -ve values instead of setting to 0

after :
 ip link add ipsec1 type xfrm dev lo if_id -10
 Error: argument "-10" is wrong: if_id value is invalid

before : note xfrm if_id 0
 ip link add ipsec1 type xfrm dev lo if_id -10
 ip -d  link show dev ipsec1
 9: ipsec1@lo: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/none 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
    xfrm if_id 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-25 12:38:13 -08:00
Vivien Didelot 0dcf36db1a iplink: add support for STP xstats
Add support for the BRIDGE_XSTATS_STP xstats, as follow:

    # ip link xstats type bridge_slave dev lan4 stp
    lan4
                        STP BPDU:  RX: 0 TX: 61
                        STP TCN:   RX: 0 TX: 0
                        STP Transitions: Blocked: 2 Forwarding: 1

Or below as JSON:

    # ip -j -p link xstats type bridge_slave dev lan0 stp
    [ {
            "ifname": "lan0",
            "stp": {
                "rx_bpdu": 0,
                "tx_bpdu": 500,
                "rx_tcn": 0,
                "tx_tcn": 0,
                "transition_blk": 0,
                "transition_fwd": 0
            }
        } ]

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:31:39 +00:00
Michal Kubecek 2b8e6995fe ip link: show permanent hardware address
Display permanent hardware address of an interface in output of
"ip link show" and "ip addr show". To reduce noise, permanent address is
only shown if it is different from current one.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:28:02 +00:00
David Ahern 974f889c2d Update kernel headers
Update kernel headers to commit:
    6f6dded1385c ("Merge branch 'WireGuard-CI-and-housekeeping'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:25:26 +00:00
Aya Levin af646bf953 devlink: Fix fmsg nesting in non JSON output
When an object or an array opening follows a name (label), add a new
line and indentation before printing the label. When name (label) is
followed by a value, print both at the same line.

Prior to this patch nesting was not visible in a non JSON output:
JSON:
{
    "Common config": {
        "SQ": {
            "stride size": 64,
            "size": 1024
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        } },
    "SQs": [ {
            "channel ix": 0,
            "sqn": 10,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 6,
                "HW status": 0
            }
         },{
            "channel ix": 0,
            "sqn": 14,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 10,
                "HW status": 0
            }
         } ]
}

Before this patch:
Common Config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0

With this patch:
Common config:
  SQ:
    stride size: 64 size: 1024
    CQ:
      stride size: 64 size: 1024
SQs:
  channel ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 6 HW status: 0
  channel ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 10 HW status: 0

Fixes: 7b8baf834d ("devlink: Add devlink health diagnose command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin 746b66b005 devlink: Add a new time-stamp format for health reporter's dump
Introduce a new attribute representing a new time-stamp format: current
time in ns (to comply with y2038) instead of jiffies. If the new
attribute was received, translate the time-stamp accordingly (ns).

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin f678a2d08e devlink: Print health reporter's dump time-stamp in a helper function
Add pr_out_dump_reporter prefix to the helper function's name and
encapsulate the print in it.

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Benjamin Poirier 1a500c78ae bridge: Fix tunnelshow json output
repeats for "vlan tunnelshow" what commit 0f36267485 ("bridge: fix vlan
show formatting") did for "vlan show". This fixes problems in json output.

Note that the resulting json output format of "vlan tunnelshow" is not the
same as the original, introduced in commit 8652eeb3ab ("bridge: vlan:
support for per vlan tunnel info"). Changes similar to the ones done for
"vlan show" in commit 0f36267485 ("bridge: fix vlan show formatting") are
carried over to "vlan tunnelshow".

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Fixes: 0f36267485 ("bridge: fix vlan show formatting")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 955a20be02 bridge: Deduplicate vlan show functions
print_vlan() and print_vlan_tunnel() are almost identical copies, save for
a missing newline in the latter which leads to broken output of "vlan
tunnelshow" in normal mode.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier dfa13e2273 bridge: Fix vni printing
Since commit c7c1a1ef51 ("bridge: colorize output and use JSON print
library"), print_range() is used for vid (16bits) and vni. However, the
latter are 32bits so they get truncated. They got truncated even before
that commit though.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 1f53ba7297 bridge: Fix BRIDGE_VLAN_TUNNEL attribute sizes
As per the kernel's vlan_tunnel_policy, IFLA_BRIDGE_VLAN_TUNNEL_VID and
IFLA_BRIDGE_VLAN_TUNNEL_FLAGS have type NLA_U16.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier df1262155c bridge: Fix src_vni argument in man page
"SRC VNI" is only one argument and should appear as such. Moreover, this
argument to the src_vni option is documented under three forms: "SRC_VNI",
"SRC VNI" and "VNI" in different places. Consistenly use the simplest form,
"VNI".

Fixes: c5b176e5ba ("bridge: fdb: add support for src_vni option")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 43b0b6ec84 bridge: Fix typo in error messages
Fixes: 9eff0e5cc4 ("bridge: Add vlan configuration support")
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier d88a6a98e8 testsuite: Fix line count test
a substring match is not enough, ex: 10 != 1

Fixes: 30383b074d ("tests: Add output testing")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 15322f46c3 json_print: Remove declaration without implementation
Fixes: 6377572f0a ("ip: ip_print: add new API to print JSON or regular format output")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Paolo Lungaroni 0486388a87 add support for table name in SRv6 End.DT* behaviors
it allows to specify also the table name in addition to the table number in
SRv6 End.DT* behaviors.

To add an End.DT6 behavior route specifying the table by name:

    $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table main dev eth0

The ip route show to print output this route:

    $ ip -6 route show 2001:db8::1
    2001:db8::1  encap seg6local action End.DT6 table main dev eth0 metric 1024 pref medium

The JSON output:
    $ ip -6 -j -p route show 2001:db8::1
    [ {
            "dst": "2001:db8::1",
            "encap": "seg6local",
            "action": "End.DT6",
            "table": "main",
            "dev": "eth0",
            "metric": 1024,
            "flags": [ ],
            "pref": "medium"
        } ]

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:22:07 +00:00
Stephen Hemminger 7b0d424abe tc: do not output newline in oneline mode
In oneline mode the line seperator should be \
but several parts of tc aren't doing it right.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:21:10 +00:00
David Ahern f39da545b6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:13:49 +00:00
Brian Vazquez 9eee92a41a ss: fix end-of-line printing in misc/ss.c
The previous change to ss to show header broke the printing of
end-of-line for the last entry.

Tested:

diff <(./ss.old -nltp) <(misc/ss -nltp)
38c38
< LISTEN   0  128   [::1]:35417  [::]:*  users:(("foo",pid=65254,fd=116))
\ No newline at end of file

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:19:00 -08:00
Brian Vazquez 908985c670 tc: fix warning in tc/q_pie.c
Warning was:
q_pie.c:202:22: error: implicit conversion from 'unsigned long' to
'double'

Fixes: 492ec9558b ("tc: pie: change maximum integer value of tc_pie_xstats->prob")
Cc: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:18:54 -08:00
Brian Vazquez cad1b0bc5f tc: fix warning in tc/m_ct.c
Warning was:
m_ct.c:370:13: warning: variable 'nat' is used uninitialized whenever
'if' condition is false

Cc: Paul Blakey <paulb@mellanox.com>
Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:32:53 -08:00
Bjarni Ingi Gislason 9ab56784a2 man: Fix unequal number of .RS and .RE macros
Add missing or excessive ".RE" macros.

  Remove an excessive ".EE" macro.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:13:12 -08:00
Gautam Ramakrishnan 920700a425 tc: pie: add dq_rate_estimator option
PIE now uses per packet timestamps to calculate queuing
delay. The average dequeue rate based queue delay
calculation is now made optional. This patch adds the option
to enable or disable the use of Little's law to calculate
queuing delay.

Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:49:42 -08:00
Stephen Hemminger 42060e8d35 tc_util: break long lines
Try to keep lines less than 100 characters or so.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:45:47 -08:00
Eric Dumazet 81b365eb50 tc_util: support TCA_STATS_PKT64 attribute
Kernel exports 64bit packet counters for qdisc/class stats in linux-5.5

Tested:

$ tc -s -d qd sh dev eth1 | grep pkt
 Sent 4041158922097 bytes 46393862190 pkt (dropped 0, overlimits 0 requeues 2072)
 Sent 501362903764 bytes 5762621697 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 533282357858 bytes 6128246542 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 515878280709 bytes 5875638916 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 516221011694 bytes 5933395197 pkt (dropped 0, overlimits 0 requeues 258)
 Sent 513175109761 bytes 5898402114 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 480207942964 bytes 5519535407 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 483111196765 bytes 5552917950 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 497920120322 bytes 5723104387 pkt (dropped 0, overlimits 0 requeues 271)
$ tc -s -d cl sh dev eth1 | grep pkt
 Sent 513196316238 bytes 5898645862 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 533304444981 bytes 6128500406 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 480227709687 bytes 5519762597 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 501383660279 bytes 5762860276 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 483131168192 bytes 5553147506 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 515899485505 bytes 5875882649 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 497940747031 bytes 5723341475 pkt (dropped 0, overlimits 0 requeues 271)
 Sent 516242376893 bytes 5933640774 pkt (dropped 0, overlimits 0 requeues 258)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:43:46 -08:00
Stephen Hemminger 214299be7b uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:42:32 -08:00
Tuong Lien 24bee3bf97 tipc: add new commands to set TIPC AEAD key
Two new commands are added as part of 'tipc node' command:

 $tipc node set key KEY [algname ALGNAME] [nodeid NODEID]
 $tipc node flush key

which enable user to set and remove AEAD keys in kernel TIPC (requires
the kernel option - 'TIPC_CRYPTO').

For the 'set key' command, the given 'nodeid' parameter decides the
mode to be applied to the key, particularly:

- If NODEID is empty, the key is a 'cluster' key which will be used for
all message encryption/decryption from/to the node (i.e. both TX & RX).
The same key will be set in the other nodes.

- If NODEID is own node, the key is used for message encryption (TX)
from the node. Whereas, if NODEID is a peer node, the key is for
message decryption (RX) from that peer node. This is the 'per-node-key'
mode that each nodes in the cluster has its specific (TX) key.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:14:11 +00:00
David Ahern 7438afd2cc Update kernel headers
Update kernel headers to commit:
    c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:13:09 +00:00
Eli Britstein 482fd40adf tc: flower: support masked port destination and source match
Extend destination and source port match to support masks, accepting
both decimal and hexadecimal formats.
Also add missing documentation to synopsis in manpage.

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_proto tcp dst_port 1234/0xff00 action drop

$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 1234/0xff00
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 26 sec used 26 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

$ tc -p -j filter show dev eth0 parent ffff:
        "options": {
            "keys": {
                "dst_port": 1234,
                "dst_port_mask": 65280
                ...

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:08 +00:00
Eli Britstein 75fb816d9f tc_util: add functions for big endian masked numbers
Add functions for big endian masked numbers as a pre-step towards masked
port numbers.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:01 +00:00
Eli Britstein b20dcd0b31 tc: flower: add u16 big endian parse option
Add u16 big endian parse option as a pre-step towards TCP/UDP/SCTP
ports usage.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:36:25 +00:00
Moshe Shemesh 5ccc365740 ip: fix oneline output
Ip tool oneline option should output each record on a single line. While
oneline option is active the variable _SL_ replaces line feeds with the
'\' character. However, at the end of print_linkinfo() the variable _SL_
shouldn't be used, otherwise the whole output is on a single line.

Before this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00\2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP mode DEFAULT group default qlen 1000\
link/ether 52:54:00:60:0a:db brd ff:ff:ff:ff:ff:ff\3: eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode
DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd brd
ff:ff:ff:ff:ff:ff\

After this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 52:54:00:60:0a:db
brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd
brd ff:ff:ff:ff:ff:ff

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:34:13 +00:00
David Ahern 3d9608b923 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:33:28 +00:00
Stephen Hemminger 74ea2526bf v5.4.0 2019-11-25 08:07:24 -08:00
Stephen Hemminger 1285475fe0 netem: remove redundant README
The README for distribution format was already in netem/
2019-11-25 08:02:43 -08:00
Stephen Hemminger 6fec41d5ef remove no longer useful README for lnstat
Superseded by man page.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-25 08:01:23 -08:00
Jakub Kicinski 668bd8d356 devlink: fix requiring either handle
devlink sb occupancy show requires device or port handle.
It passes both device and port handle bits as required to
dl_argv_parse() so since commit 1896b100af ("devlink: catch
missing strings in dl_args_required") devlink will now
complain that only one is present:

$ devlink sb occupancy show pci/0000:06:00.0/0
BUG: unknown argument required but not found

Drop the bit for the handle which was not found from required.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 1896b100af ("devlink: catch missing strings in dl_args_required")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-21 21:38:52 +00:00
David Ahern 536dcd2016 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/devlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:31:01 +00:00
Ido Kalir b0a688a542 rdma: Rewrite custom JSON and prints logic to use common API
Instead of doing open-coded solution to generate JSON and prints, let's
reuse existing infrastructure and APIs to do the same as ip/*.

Before this change:
 if (rd->json_output)
     jsonw_uint_field(rd->jw, "sm_lid", sm_lid);
 else
     pr_out("sm_lid %u ", sm_lid);

After this change:
 print_uint(PRINT_ANY, "sm_lid", "sm_lid %u ", sm_lid);

All the print functions are converted to support color but for now the
type of color is COLOR_NONE. This is done as a preparation to addition
of color enable option. Such change will require rewrite of command line
arguments parser which is out-of-scope for this patch.

Signed-off-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:27:36 +00:00
Danit Goldberg 738728cc6c ip link: Add support to get SR-IOV VF node GUID and port GUID
Extend iplink to show VF GUIDs (IFLA_VF_IB_NODE_GUID, IFLA_VF_IB_PORT_GUID),
giving the ability for user-space application to print GUID values.
This ability is added to the one of setting new node GUID and port GUID values.

Suitable ip link command:
- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:25:50 +00:00
Stephen Hemminger a7fa739d12 uapi: devlink.h health timestamp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:38:17 -08:00
Eli Britstein 9479ec1ed0 tc: flower: fix output for ip tos and ttl
Fix the output for ip tos and ttl to be numbers in JSON format.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_tos 5/0xf action drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 5/0xf
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ip_tos": "0x5/f",
                ...
After:
        "options": {
            "keys": {
                "ip_tos": 5,
                "ip_tos_mask": 15,
                ...

Fixes: 6ea2c2b1cf ("tc: flower: add support for matching on ip tos and ttl")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein bb3ee8b313 tc_util: fix JSON prints for ct-mark and ct-zone
Fix the output of ct-mark and ct-zone (both for matches and actions) to
be different in JSON/non-JSON mode.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
$ tc -s filter show dev ens1f0_0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ct_zone": "5",
                "ct_mark": "6/0xf"
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": "8/0xf",
                "zone": "7",
                ...
After:
        "options": {
            "keys": {
                "ct_zone": 5,
                "ct_mark": 6,
                "ct_mark_mask": 15
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": 8,
                "mark_mask": 15,
                "zone": 7,
                ...

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 99d5ee8368 tc: flower: fix newline prints for ct-mark and ct-zone
Matches of ct-mark and ct-zone were printed all in the same line. Fix
that so each ct match is printed in a separate line.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Before:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4 ct_zone 5 ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 31 sec used 31 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

After:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 746e6c0fd3 tc_util: add an option to print masked numbers with/without a newline
Add an option to print masked numbers with or without a newline, as a
pre-step towards using a common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 04b215015b tc_util: introduce a function to print JSON/non-JSON masked numbers
Introduce a function to print masked number with a different output for
JSON or non-JSON methods, as a pre-step towards printing numbers using
this common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Roman Mashak cc08619c3c man: tc-ematch.8: documented canid() ematch rule
tc-ematch.8 was missing the description of canid() ematch rule, so document
this.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:31:04 -08:00
Roman Mashak 5d5c394726 man: tc-ematch.8: update list of filter using extended matches
Extended match rules are currently supported by basic, flow and cgroup
filters, so update the man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:28:01 -08:00
Stephen Hemminger d24f5ae3f2 uapi: SPDX license updates
Upstream changes to SPDX licenses in headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:24:10 -08:00
Hritik Vijay 5883c6eba5 ss: show header for --processes/-p
ss by default shows headers for every column but omits it for --processes
for no apparent reason. This patch adds the "Process" header.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:20:14 -08:00
Guillaume Nault 130f549604 man: remove ppp from list of devices not allowed to change netns
PPP devices can be moved to different network namespaces. The feature
was added by commit 79c441ae505c ("ppp: implement x-netns support")
in Linux 4.3.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:19:39 -08:00
David Ahern 611d3123be Merge branch 'nsid-cleanup' into next
Guillaume Nault  says:

====================

It's currently hard to review ipnetns. The netns ids are inconsistently
treated as signed or unsigned and most helper functions aren't prepared
to use negative ids.

Netns id attributes can be negative: NETNSA_NSID_NOT_ASSIGNED =3D=3D -1.
So let's consistently treat nsids as signed and also reject negative
values in functions that are supposed to only handle assigned netns
ids.

While there, let's drop the extra blank line generated by some command
line parsing errors (patch 5/5).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:34:24 +00:00
Guillaume Nault d0b645a51e ipnetns: remove blank lines printed by invarg() messages
Since invarg() automatically adds a '\n' character, having one in the
error message generates an extra blank line.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:06 +00:00
Guillaume Nault 1c9b69276c ipnetns: don't print unassigned nsid in json export
Don't output the nsid and current-nsid json keys if they're not set.
Otherwise a parser would have to special case the "not-assigned"
string.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:05 +00:00
Guillaume Nault 08ba67db7b ipnetns: harden helper functions wrt. negative netns ids
Negative values are invalid netns ids. Ensure that helper functions
don't accidentally try to process them.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:03 +00:00
Guillaume Nault df6da60bcb ipnetns: fix misleading comment about 'ip monitor nsid'
'ip monitor nsid' doesn't call print_nsid().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:02 +00:00
Guillaume Nault f19966efee ipnetns: treat NETNSA_NSID and NETNSA_CURRENT_NSID as signed
These attributes are signed (with -1 meaning NETNSA_NSID_NOT_ASSIGNED).
So let's use rta_getattr_s32() and print_int() instead of their
unsigned counterpart to avoid confusion.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:00 +00:00
Jakub Kicinski c3f69bf923 devlink: allow full range of resource sizes
Resource size is a 64 bit attribute at netlink level.
Make the command line argument 64 bit as well.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:50 +00:00
Jakub Kicinski 1896b100af devlink: catch missing strings in dl_args_required
Currently if dl_args_required doesn't contain a string
for a given option the fact that the option is missing
is silently ignored.

Add a catch-all case and print a generic error.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:44 +00:00
Jakub Kicinski 9a0a2fcbf4 devlink: fix referencing namespace by PID
netns parameter for devlink reload is supposed to take PID
as well as string name. However, the PID parsing has two
bugs:
 - the opts->netns member is unsigned so the < 0
   condition is always false;
 - the parameter list is not rewinded after parsing as
   a name, so parsing as a pid uses the wrong argument.

Fixes: 08e8e1ca3e ("devlink: extend reload command to add support for network namespace change")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:03 +00:00
David Ahern 081140bbc4 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:38:37 +00:00
Jakub Kicinski 0932814458 devlink: require resource parameters
If devlink resource set parameters are not provided it crashes:
$ devlink resource set netdevsim/netdevsim0
Segmentation fault (core dumped)

This is because even though DL_OPT_RESOURCE_PATH and
DL_OPT_RESOURCE_SIZE are passed as o_required, the validation
table doesn't contain a relevant string.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-07 20:36:08 -08:00
Vlad Buslov fb2e033add tc: implement support for action flags
Implement setting and printing of action flags with single available flag
value "no_percpu" that translates to kernel UAPI TCA_ACT_FLAGS value
TCA_ACT_FLAGS_NO_PERCPU_STATS. Update man page with information regarding
usage of action flags.

Example usage:

 # tc actions add action gact drop no_percpu
 # sudo tc actions list action gact
 total acts 1

        action order 0: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 0
        no_percpu

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:44:23 -07:00
David Ahern 17a948c80a Update kernel headers
Update kernel headers to commit:
    c23fcbbc6aa4 ("tc-testing: added tests with cookie for conntrack TC action")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:43:01 -07:00
Michał Łyszczek eca5123948 libnetlink.c, ss.c: properly handle fread() errors
fread(3) returns size_t data type which is unsigned, thus check
`if (fread(...) < 0)' is always false. To check if fread(3) has
failed, user should check error indicator with ferror(3).

This commit also changes read logic a little bit by being less
forgiving for errors. Previous logic was checking if fread(3)
read *at least* required ammount of data, now code checks if
fread(3) read *exactly* expected ammount of data. This makes
sense because code parses very specific binary file, and reading
even 1 less/more byte than expected, will later corrupt data anyway.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:05:41 -07:00
Vlad Buslov cb83101626 tc: remove duplicated NEXT_ARG_FWD() in parse_ct()
Function parse_ct() manually calls NEXT_ARG_FWD() after
parse_action_control_dflt(). This is redundant because
parse_action_control_dflt() modifies argc and argv itself. Moreover, such
implementation parses out any following actions option. For example, adding
action ct with cookie errors:

$ sudo tc actions add action ct cookie 111111111111
Bad action type 111111111111
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

With fix:

$ sudo tc actions add action ct cookie 111111111111
$ sudo tc actions list action ct
total acts 1

        action order 0: ct zone 0 pipe
         index 1 ref 1 bind 0
        cookie 111111111111

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:04:28 -07:00
Julien Fortin 4f73cd7f0d ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-01 09:03:54 -07:00
Michał Łyszczek 6749801b06 rdma/sys.c: fix possible out-of-bound array access
netns_modes_str[] array has 2 elements, when netns_mode is 2,
condition (2 <= 2) will be true and `mode_str = netns_modes_str[2]'
will be executed, which will result in out-of-bound read.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-28 10:33:27 -07:00
David Ahern 7534b3cd8c Merge branch 'alt-names' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:38:24 -07:00
Jiri Pirko afd67550c2 ip: allow to use alternative names as handle
Extend ll_name_to_index() to get the index of a netdevice using
alternative interface name. Allow alternative long names to pass checks
in couple of ip link/addr commands.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 3aa0e51be6 ip: add support for alternative name addition/deletion/list
Implement addition/deletion of lists of properties, currently
alternative ifnames. Also extent the ip link show command to list them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 20fbe90771 lib/ll_map: cache alternative names
Alternative names are related to the "parent name". That means,
whenever ll_remember_index() is called to add/delete/update and it founds
the "parent name" im object by ifindex, processes related
alternative name im objects too. Put them in a list which holds the
relationship with the parent.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
David Ahern 9a1e4561f1 Merge branch 'rdma-mr-stats' into next
Leon Romanovsky  says:

====================

This is supplementary part of "ODP information and statistics"
kernel series.
https://lore.kernel.org/linux-rdma/20191016062308.11886-1-leon@kernel.org

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 10:28:49 -07:00
Erez Alfasi 5c78ffa0e5 rdma: Document MR statistics
Add document of accessing the MR counters into
the rdma-statistic man pages.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:38 -07:00
Erez Alfasi 33552ade17 rdma: Add "stat show mr" support
Show MR counters statistics. Filters are also enabled.

Examples:
~$: rdma stat show mr
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0
dev mlx5_0 mrn 9 page_faults 1221 page_invalidations 0

~$: rdma stat show mr mrn 8
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:30 -07:00
David Ahern c9dc3af42e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 09:53:46 -07:00
Stephen Hemminger 085ab19bc3 don't install examples
No longer relevant
2019-10-23 10:21:06 -07:00
Stephen Hemminger e15011b5e5 remove out of date README
The original old README refers to stuff from the pre 2.6
era including cbz. Just kill it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 10:19:45 -07:00
Michał Łyszczek b7f28e0bd9 ipnetns: do not check netns NAME when -all is specified
When `-all' argument is specified netns runs cmd on all namespaces
and NAME is not used, but netns nevertheless checks if argv[1] is a
valid namespace name ignoring the fact that argv[1] contains cmd
and not NAME. This results in bug where user cannot specify
absolute path to command.

    # ip -all netns exec /usr/bin/whoami
    Invalid netns name "/usr/bin/whoami"

This forces user to have his command in PATH.

Solution is simply to not validate argv[1] when `-all' argument is
specified.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:18:08 -07:00
Stephen Hemminger d49e5c2437 examples: remove diffserv
The diffserv examples here are out of date and incomplete.
Remove them rather than try and fix them.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:13:55 -07:00
Stephen Hemminger 86c0bf5982 examples: remove gaiconf
The gaiconf script is a workaround for something now handled
in distros as part of libc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:12:19 -07:00
Stephen Hemminger 14fd32d3c6 examples: remove out of date cbq stuff
The examples around cbq are out of date and never updated.
There are better ways to achieve same kind of thing with more
modern qdisc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-22 09:53:26 -07:00
Nicolas Dichtel 6ed2915f9c ip-netns.8: document target-nsid and nsid options of list-id
This is a follow up of the commit eaefb07804 ("ipnetns: enable to dump
nsid conversion table").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Nicolas Dichtel 63ab204e7b ip-netns.8: document the 'auto' keyword of 'ip netns set'
This is a follow up of the commit ebe3ce2fcc ("ipnetns: parse nsid as a
signed integer").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Florent Fourcot 10d39984b7 man: remove "defaut group" sentence on ip link
By default, all devices are listed, not only the default group.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Davide Caratti 14cadc707b ss: allow dumping kTLS info
now that INET_DIAG_INFO requests can dump TCP ULP information, extend 'ss'
to allow diagnosing kTLS when it is attached to a TCP socket. While at it,
import kTLS uAPI definitions from the latest net-next tree.

CC: Andrea Claudi <aclaudi@redhat.com>
Co-developed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:21 -07:00
David Ahern 4c23b12865 Update kernel headers and import tls.h
Update kernel headers to commit:
    85a83a8fca7f ("Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA'")

and add tls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:20 -07:00
Nicolas Dichtel eaefb07804 ipnetns: enable to dump nsid conversion table
This patch enables to dump/get nsid from a netns into another netns.

Example:
$ ./test.sh
+ ip netns add foo
+ ip netns add bar
+ touch /var/run/netns/init_net
+ mount --bind /proc/1/ns/net /var/run/netns/init_net
+ ip netns set init_net 11
+ ip netns set foo 12
+ ip netns set bar 13
+ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
+ ip -n foo netns set init_net 21
+ ip -n foo netns set foo 22
+ ip -n foo netns set bar 23
+ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
+ ip -n bar netns set init_net 31
+ ip -n bar netns set foo 32
+ ip -n bar netns set bar 33
+ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
+ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
+ ip -n foo netns list-id target-nsid 21
nsid 11 current-nsid 21 (iproute2 netns name: init_net)
nsid 12 current-nsid 22 (iproute2 netns name: foo)
nsid 13 current-nsid 23 (iproute2 netns name: bar)
+ ip -n bar netns list-id target-nsid 33 nsid 32
nsid 32 current-nsid 32 (iproute2 netns name: foo)
+ ip -n bar netns list-id target-nsid 31 nsid 32
nsid 12 current-nsid 32 (iproute2 netns name: foo)
+ ip netns list-id nsid 13
nsid 13 (iproute2 netns name: bar)

CC: Petr Oros <poros@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Petr Oros <poros@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-14 13:04:19 -07:00
Aya Levin 94ff4d6882 devlink: Fix inconsistency between command input and output
In devlink health show command the reporter's name parameter is called
reporter, but in the output the reporter's name is referred to as name

Before this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   name tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

After this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   reporter tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

Reported-by: Jiri Pirko <jiri@mellanox.com>
Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 7dd3d51b91 devlink: Left justification on FMSG output
FMSG output is dynamic, space separator must be on the left hand side of
the value. Otherwise output has redundant left indentation regardless
the hierarchy.

Before the patch:
 Common config: SQ: stride size: 64 size: 1024
 CQ: stride size: 64 size: 1024
 SQs:
   channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
   channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
   channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
   channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

With the patch:
Common config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
  channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
  channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 56b725a442 devlink: Add helper for left justification print
Introduce a helper function which wraps code that adds a left hand side
space separator unless it follows a newline.

Fixes: e3d0f0c0e3 ("devlink: add option to generate JSON output")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Andrea Claudi e047ca988f tc: fix segmentation fault on gact action
tc segfaults if gact action is used without action or index:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Segmentation fault

We expect tc to fail gracefully with an error message.

This happens if gact is the last argument of the incomplete
command. In this case the "gact" action is parsed, the macro
NEXT_ARG_FWD() is executed and the next matches() crashes
because of null argv pointer.

To avoid this, simply use NEXT_ARG() instead.

With this change in place:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Command line is not complete. Try option "help"

Fixes: fa49588973 ("tc: Fix binding of gact action by index.")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:51 -07:00
Damien Robert 7c503d88d2 man: add reference to `ip route add encap ... src`
The ability to specify the source adresse for 'encap ip' / 'encap ip6'
was added in commit 94a8722f2f but the man
page was not updated.

Also fixes a missing page in ip-route.8.in.

Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:15 -07:00
David Ahern 47a4c1533c Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:02:36 +00:00
Jiri Pirko 08e8e1ca3e devlink: extend reload command to add support for network namespace change
Extend existing devlink reload command by adding option "netns" by which
user can instruct kernel to reload the devlink instance into specified
network namespace.

Example:

$ ip netns add testns1
$ devlink dev reload netdevsim/netdevsim10 netns testns1

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:00:49 +00:00
Jiri Pirko 29993df876 devlink: introduce cmdline option to switch to a different namespace
Similar to ip tool, add an option to devlink to operate under certain
network namespace. Unfortunately, "-n" is already taken, so use "-N"
instead.

Example:

$ devlink -N testns1 dev show

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:59:50 +00:00
Leon Romanovsky f93134841e rdma: Relax requirement to have PID for HW objects
RDMA has weak connection between PIDs and HW objects, because
the latter tied to file descriptors for their lifetime management.

The outcome of such connection is that for the following scenario,
the returned PID will be 0 (not-valid):
 1. Create FD and context
 2. Share it with ephemeral child
 3. Create any object and exit that child

This flow was revealed in testing environment and of course real users
are not running such scenario, because it makes no sense at all in RDMA
world.

Let's do two changes in the code to support such workflow anyway:
 1. Remove need to provide PID/kernel name. Code already supports it,
    just need to remove extra validation.
 2. Ball-out in case PID is 0.

Link: https://lore.kernel.org/linux-rdma/20191002123245.18153-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:54:30 +00:00
David Ahern 9dcd8788fe Update kernel headers
Update kernel headers to commit:
    940f13821528 ("Merge branch 'dpaa2-eth-misc-cleanup'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 20:43:13 +00:00
Stephen Hemminger 2d0445c67b uapi: update btf from 5.4-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-01 08:55:01 -07:00
Roopa Prabhu 6284236237 ipneigh: neigh get support
This patch adds support to lookup a neigh entry
using recently added support in the kernel using RTM_GETNEIGH

example:
$ip neigh get 10.0.2.4 dev test-dummy0
10.0.2.4 dev test-dummy0 lladdr de:ad:be:ef:13:37 PERMANENT

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:23:43 -07:00
Roopa Prabhu 4ed5ad7bd3 bridge: fdb get support
This patch adds support to lookup a bridge fdb entry
using recently added support in the kernel using RTM_GETNEIGH
(and AF_BRIDGE family).

example:
$bridge fdb get 02:02:00:00:00:03 dev test-dummy0 vlan 1002
02:02:00:00:00:03 dev test-dummy0 vlan 1002 master bridge

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:22:32 -07:00
Julien Fortin 4ecefff3cf ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 07:59:47 -07:00
Thomas Haller 0d82ee9939 man: add note to ip-macsec manual about necessary key management
The man page of ip-macsec and the existance of the tool makes it seem like
the user could just configure static keys once, and be done with it. That is
not the case. Some form or key management must be done in user space.

Add a note about that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-26 14:11:27 -07:00
David Ahern 8c2093e5d2 ip vrf: Add json support for show command
Add json support to 'ip vrf sh':
$ ip -j -p vrf ls
[ {
        "name": "mgmt",
        "table": 1001
    } ]

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:35:41 -07:00
David Ahern 92754430a6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:34:34 -07:00
Stephen Hemminger 8d88c37724 uapi: update headers from 5.4-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:38:57 -07:00
Stephen Hemminger 38e9ba9dc9 Merge ../iproute2-next 2019-09-24 12:37:33 -07:00
Stephen Hemminger 18e631bd4b v5.3.0 2019-09-24 12:32:05 -07:00
Joe Stringer e4c4685fd6 bpf: Fix race condition with map pinning
If two processes attempt to invoke bpf_map_attach() at the same time,
then they will both create maps, then the first will successfully pin
the map to the filesystem and the second will not pin the map, but will
continue operating with a reference to its own copy of the map. As a
result, the sharing of the same map will be broken from the two programs
that were concurrently loaded via loaders using this library.

Fix this by adding a retry in the case where the pinning fails because
the map already exists on the filesystem. In that case, re-attempt
opening a fd to the map on the filesystem as it shows that another
program already created and pinned a map at that location.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:29:38 -07:00
David Ahern a32692ac9c Merge branch 'master' into next
Conflicts:
	devlink/devlink.c

Fixed the conflict by updating the numbering for all new attributes
after the ones in master branch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:55:53 -07:00
Jiri Pirko 88fdbf8030 devlink: add reload failed indication
Add indication about previous failed devlink reload.

Example outputs:

$ devlink dev
netdevsim/netdevsim10: reload_failed true
$ devlink dev -j -p
{
    "dev": {
        "netdevsim/netdevsim10": {
            "reload_failed": true
        }
    }
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:51:47 -07:00
Andrea Claudi c0325b0638 bpf: replace snprintf with asprintf when dealing with long buffers
This reduces stack usage, as asprintf allocates memory on the heap.

This indirectly fixes a snprintf truncation warning (from gcc v9.2.1):

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:49:46 -07:00
Nicolas Dichtel 80d0e62673 link_xfrm: don't force to set phydev
Since linux commit 22d6552f827e ("xfrm interface: fix management of
phydev"), phydev is not mandatory anymore.

Note that it also could be useful before the above commit to not force the
user to put a phydev (the kernel was checking it anyway).
For example, it was useful to not set it in case of x-netns, because the
phydev is not available in the current netns:

Before the patch:
$ ip netns add foo
$ ip link add xfrm1 type xfrm dev eth1 if_id 1
$ ip link set xfrm1 netns foo
$ ip -n foo link set xfrm1 type xfrm dev eth1 if_id 2
Cannot find device "eth1"
$ ip -n foo link set xfrm1 type xfrm if_id 2
must specify physical device

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:26:21 +02:00
Andrea Claudi 6296d51825 man: ss.8: add documentation for drop counter
After commit 6df9c7a06a ("ss: add SK_MEMINFO_DROPS display") ss -m
displays also a drop counter for each socket.

This commit properly document it into the man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:25:25 +02:00
Mark Zhang 4e2d9fc4d8 rdma: Check comm string before print in print_comm()
Broken kernels (not-upstream) can provide wrong empty "comm" field.
It causes to segfault while printing in JSON format.

Fixes: 8ecac46a60 ("rdma: Add QP resource tracking information")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:17:38 +02:00
Jiri Pirko 9b13cddfe2 devlink: implement flash status monitoring
Listen to status notifications coming from kernel during flashing and
put them on stdout to inform user about the status.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:25 -07:00
Jiri Pirko 853be43f9e devlink: implement flash update status monitoring
Kernel sends notifications about flash update status, so implement these
messages for monitoring.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:24 -07:00
Dirk van der Merwe 850de16f12 devlink: unknown 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter now supports an unknown value.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:55:16 -07:00
Dirk van der Merwe c240e6748e devlink: add 'reset_dev_on_drv_probe' devlink param
Add support for the new devlink parameter along with string to uint
conversion.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:52:53 -07:00
David Dai 1157a6fc36 iproute2-next: police: support 64bit rate and peakrate in tc utility
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward
binary compatibility.

Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:39:19 -07:00
David Ahern 3d72f125c3 Update kernel headers
Update kernel headers to commit:
    aa2eaa8c272a ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:32:58 -07:00
David Ahern 2caa8012e8 nexthop: Add space after blackhole
Add a space after 'blackhole' is missing to properly separate the
protocol when it is given.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:05:43 -07:00
Andrea Claudi 4fb98f0895 devlink: fix segfault on health command
devlink segfaults when using grace_period without reporter

$ devlink health set pci/0000:00:09.0 grace_period 3500
Segmentation fault

devlink is instead supposed to gracefully fail printing a warning
message

$ devlink health set pci/0000:00:09.0 grace_period 3500
Reporter's name is expected.

This happens because DL_OPT_HEALTH_REPORTER_NAME and
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD are both defined as BIT(27).
When dl_opts_put() parse options and grace_period is set, it erroneously
tries to set reporter name to null.

This is fixed simply shifting by 1 bit enumeration starting with
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD.

Fixes: b18d89195b ("devlink: Add devlink health set command")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:01:19 -07:00
Donald Sharp 84b9168328 ip nexthop: Allow flush|list operations to specify a specific protocol
In the case where we have a large number of nexthops from a specific
protocol, allow the flush and list operations to take a protocol
to limit the commands scopes.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:20 -07:00
David Ahern 1a5141715e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:15 -07:00
Stephen Hemminger 98631f134d uapi: update bpf.h header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-29 16:20:21 -07:00
David Ahern efaef0be95 Merge branch 'devlink-trap' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@mellanox.com>

This patchset adds devlink-trap support in iproute2.

Patch #1 increases the number of options devlink can handle.

Patches #2-#3 gradually add support for all devlink-trap commands.

Patch #4 adds a man page for devlink-trap.

See individual commit messages for example usage and output.

Changes in v2:
* Remove report option and monitor command since monitoring is done
  using drop monitor

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:51:38 -07:00
Ido Schimmel a7a56f6f9d devlink: Add man page for devlink-trap
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:50:32 -07:00
Ido Schimmel 4ede9e9d56 devlink: Add devlink trap group set and show commands
These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:48 -07:00
Ido Schimmel ef12d6dafa devlink: Add devlink trap set and show commands
The trap set command allows the user to set the action of an individual
trap. Example:

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap

The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:

# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "blackhole_route",
                "type": "drop",
                "generic": true,
                "action": "trap",
                "group": "l3_drops",
                "metadata": [ "input_port" ],
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:27 -07:00
Ido Schimmel b83220db37 devlink: Increase number of supported options
Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.

Increase the limit to 64 options.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:52 -07:00
David Ahern e3af717a8d Update kernel headers
Update kernel headers to commit:
    d83d508b74c4 ("Merge branch 'stmmac-next'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:02 -07:00
David Ahern 7ad06c82e7 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:40:30 -07:00
Donald Sharp dba639a598 ip nexthop: Add space to display properly when showing a group
When displaying a nexthop group made up of other nexthops, the display
line shows this when you have additional data at the end:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74proto zebra

Modify code so that it shows:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74 proto zebra

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-15 13:21:15 -07:00
Stephen Hemminger 260dc56ae3 lib: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:21:10 -07:00
Stephen Hemminger 69df9bf981 tc: fix spelling errors
Minor spelling errors found by codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:18:51 -07:00
Stephen Hemminger 42a66ee5f3 uapi: update socket.h
Upstream change to resolve gcc-9 issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:58:49 -07:00
Ido Schimmel 395370035e tc: Fix block-handle support for filter operations
The revert of batchsize accidently reverted more than it should
and broke shared block functionality.  Fix this by restoring the
original functionality.

To reproduce:

	dst_ip 192.0.2.0/24 action drop
Unknown filter "block", hence option "10" is unparsable

Fixes: e991c04d64 ("Revert "tc: Add batchsize feature for filter and actions"")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:31:24 -07:00
Andrea Claudi a8360dd3f2 ip tunnel: add json output
Add json support on iptunnel and ip6tunnel.
The plain text output format should remain the same.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:58 -07:00
Gal Pressman 39307384ce rdma: Add driver QP type string
RDMA resource tracker now tracks driver QPs as well, add driver QP type
string to qp_types_to_str function.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:01 -07:00
David Ahern 74ddde9b5f Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 11:59:19 -07:00
Patrick Talbert 2d7cb22240 ss: sctp: Formatting tweak in sctp_show_info for locals
'locals' output does not include a leading space so it runs up against
skmem:() output. Add a leading space to fix it.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:50:11 -07:00
Patrick Talbert 18db049f6f ss: sctp: fix typo for nodelay
nodealy should be nodelay.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:49:53 -07:00
Jiri Pirko efd12cd2d9 devlink: finish queue.h to list.h transition
Loose the "q" from the names and name the structure fields in the same
way rest of the code does. Also, fix list_add arg order which leads
to segfault.

Fixes: 33267017fa ("iproute2: devlink: port from sys/queue.h to list.h")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-05 11:35:23 -07:00
Stephen Hemminger 4dd599fdb8 tc: fflush after each command in batch mode
Restore behaviour of tc batch mode.
Flush stdout after each command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:55 -07:00
Stephen Hemminger e991c04d64 Revert "tc: Add batchsize feature for filter and actions"
This reverts commit 485d0c6001.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:51 -07:00
Stephen Hemminger bfdda70d59 Revert "tc: fix batch force option"
This reverts commit b133392468.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:46 -07:00
Stephen Hemminger 350bc27cf3 Revert "tc: flush after each command in batch mode"
This reverts commit d66fdfda71.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:42 -07:00
Stephen Hemminger 11120881d9 Revert "tc: Remove pointless assignments in batch()"
This reverts commit 6358bbc381.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:36 -07:00
Yamin Friedman 432b21bec7 rdma: Document adaptive-moderation
Add document of setting the adaptive-moderation for the ib device.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Yamin Friedman 8a56ef325c rdma: Control CQ adaptive moderation (DIM)
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]

rdma dev show -d
0: mlx5_0: node_type ca fw 16.25.0319 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
caps: <BAD_PKEY_CNTR, BAD_QKEY_CNTR, AUTO_PATH_MIG, CHANGE_PHY_PORT,
PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, XRC,
MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, MEM_WINDOW_TYPE_2B,
RAW_IP_CSUM, CROSS_CHANNEL, MANAGED_FLOW_STEERING, SIGNATURE_HANDOVER,
ON_DEMAND_PAGING, SG_GAPS_REG, RAW_SCATTER_FCS, PCI_WRITE_END_PADDING>

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Stephen Hemminger 067925e2e1 json_print: drop extra semi-colons
The _PRINT_FUNC() macro expands to a function call.
Putting a semi-colon is unnecessary and causes warnings with -pedantic

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:45:32 -07:00
Kurt Kanzenbach c875433b14 utils: Fix get_s64() function
get_s64() uses internally strtoll() to parse the value out of a given
string. strtoll() returns a long long. However, the intermediate variable is
long only which might be 32 bit on some systems. So, fix it.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:44:20 -07:00
Stephen Hemminger ab45d91d6a iplink: document 'change' option to ip link
Add the command alias "change" to man page.
Don't show it on usage, since it is not commonly used.

Reported-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Matteo Croce <mcroce@redhat.com>
2019-07-29 08:43:24 -07:00
Antonio Borneo 36e584ad8a iplink_can: fix format output of clock with flag -details
The command
	ip -details link show can0
prints in the last line the value of the clock frequency attached
to the name of the following value "numtxqueues", e.g.
	clock 49500000numtxqueues 1 numrxqueues 1 gso_max_size
	 65536 gso_max_segs 65535

Add the missing space after the clock value.

Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
2019-07-26 15:05:20 -07:00
Sergei Trofimovich 33267017fa iproute2: devlink: port from sys/queue.h to list.h
sys/queue.h does not exist on linux-musl targets and fails build as:

    devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
       28 | #include <sys/queue.h>
          |          ^~~~~~~~~~~~~

The change ports to list.h API and drops dependency of 'sys/queue.h'.
The API maps one-to-one.

Build-tested on linux-musl and linux-glibc.

Bug: https://bugs.gentoo.org/690486
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: netdev@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-26 15:05:20 -07:00
Stephen Hemminger b89d6202c9 uapi: update kernel headers from 5.3-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-22 09:45:09 -07:00
Mark Zhang ca084842da rdma: Document counter statistic
Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a7137e517f rdma: Add default counter show support
Show default counter statistics, which are same through the sysfs
interface: /sys/class/infiniband/<dev>/ports/<port>/hw_counters/

Example:
$ rdma stat show link mlx5_2/1
link mlx5_2/1 rx_write_requests 8 rx_read_requests 4 rx_atomic_requests 0
out_of_buffer 0 out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0
packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0
resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0
resp_remote_access_errors 0 resp_cqe_flush_error 0 req_cqe_flush_error 0
rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0
np_cnp_sent 0 rx_icrc_encapsulated 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a6d0773ebe rdma: Add stat manual mode support
In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.

Examples:
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp bind link mlx5_2/1 lqpn 178 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang cbe10b4e44 rdma: Make get_port_from_argv() returns valid port in strict port mode
When strict_port is set, make get_port_from_argv() returns failure if
no valid port is specified.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 887fc739eb rdma: Add rdma statistic counter per-port auto mode support
With per-QP statistic counter support, a user is allowed to monitor
specific QPs categories, which are bound to/unbound from counters
dynamically allocated/deallocated.

In per-port "auto" mode, QPs are bound to counters automatically
according to common criteria. For example a per "type"(qp type)
scheme, where in each process all QPs have same qp type are bind
automatically to a single counter.
Currently only "type" (qp type) is supported. Examples:

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 1b2ca7ada7 rdma: Add get per-port counter mode support
Add an interface to show which mode is active. Two modes are supported:
- "auto": In this mode all QPs belong to one category are bind automatically
  to a single counter set. Currently only "qp type" is supported;
- "manual": In this mode QPs are bound to a counter manually.

Examples:
$ rdma statistic qp mode
0/1: mlx5_0/1: qp auto off
1/1: mlx5_1/1: qp auto off
2/1: mlx5_2/1: qp auto type on
3/1: mlx5_3/1: qp auto off

$ rdma statistic qp mode link mlx5_0
0/1: mlx5_0/1: qp auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 5937552b42 rdma: Add "stat qp show" support
This patch presents link, id, task name, lqpn, as well as all sub
counters of a QP counter.
A QP counter is a dynamically allocated statistic counter that is
bound with one or more QPs. It has several sub-counters, each is
used for a different purpose.

Examples:
$ rdma stat qp show
link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
req_remote_access_errors 0 resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0
    LQPN: <178>
$ rdma stat show link rocep1s0f5/1
link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
$ rdma stat show link rocep1s0f5/1 -p
link rocep1s0f5/1
    rx_write_requests 0
    rx_read_requests 0
    rx_atomic_requests 0
    out_of_buffer 0
    duplicate_request 0
    rnr_nak_retry_err 0
    packet_seq_err 0
    implied_nak_seq_err 0
    local_ack_timeout_err 0
    resp_local_length_error 0
    resp_cqe_error 0
    req_cqe_error 0
    req_remote_invalid_request 0
    req_remote_access_errors 0
    resp_remote_access_errors 0
    resp_cqe_flush_error 0
    req_cqe_flush_error 0
    rp_cnp_ignored 0
    rp_cnp_handled 0
    np_ecn_marked_roce_packets 0
    np_cnp_sent 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Stephen Hemminger 51a8f9f8fb uapi: fix bpf comment typo
From upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:49:36 -07:00
Ivan Delalande ed54f76484 json: fix backslash escape typo in jsonw_puts
Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:48:38 -07:00
Vedang Patel a794d05237 tc: taprio: Update documentation
Add documentation for the latest options, flags and txtime-delay, to the
taprio manpage.

This also adds an example to run tc in txtime offload mode.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:07 -07:00
Vedang Patel 1738a16de9 tc: etf: Add documentation for skip_sock_check.
Document the newly added option (skip_sock_check) on the etf man-page.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:02 -07:00
Vedang Patel a5e6ee3b34 taprio: add support for setting txtime_delay.
This adds support for setting the txtime_delay parameter which is useful
for the txtime offload mode of taprio.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:36 -07:00
Vinicius Costa Gomes ee000bf217 taprio: Add support for setting flags
This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:31 -07:00
Vedang Patel d9114263d0 etf: Add skip_sock_check
ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them.  So, add an option to skip this check.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:44:21 -07:00
David Ahern 86545eaffb Merge branch 'tc-conntrack' into next
Paul Blakey  says:

====================

This patch series add connection tracking capabilities in tc.
It does so via a new tc action, called act_ct, and new tc flower classifier matching.
Act ct and relevant flower matches, are still under review in net-next mailing list.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:42:13 -07:00
Paul Blakey 2fffb1c030 tc: flower: Add matching on conntrack info
Matches on conntrack state, zone, mark, and label.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:30 -07:00
Paul Blakey c8a494314c tc: Introduce tc ct action
New tc action to send packets to conntrack module, commit
them, and set a zone, labels, mark, and nat on the connection.

It can also clear the packet's conntrack state by using clear.

Usage:
   ct clear
   ct commit [force] [zone] [mark] [label] [nat]
   ct [nat] [zone]

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:02 -07:00
David Ahern f47081beff Import tc_act/tc_ct.h uapi file
Import include/uapi/linux/tc_act/tc_ct.h header from commit of last
kernel headers sync.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:40:07 -07:00
Paul Blakey 18aa9f5583 tc: add NLA_F_NESTED flag to all actions options nested block
Strict netlink validation now requires this flag on all nested
attributes, add it for action options.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:38:09 -07:00
Andrea Claudi ee713339d3 tunnel: factorize printout of GRE key and flags
print_tunnel() functions in ip6tunnel.c and iptunnel.c contains
the same code to print out GRE key and flags

This commit factorize the code in a helper function in tunnel.c

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 10:19:47 -07:00
Andrea Claudi d035cc1b4e ip tunnel: warn when changing IPv6 tunnel without tunnel name
Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.

$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.

This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:26:21 -07:00
Andrea Claudi ad04dbc5b4 Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
This reverts commit ba126dcad2.
It breaks tunnel creation when using 'dev' parameter:

$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff💯:2 local 2001:db8:ffff💯:1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists

dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.

Reported-by: Jianwen Ji <jiji@redhat.com>
Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:25:05 -07:00
Stephen Hemminger 78d3832335 uapi: rdma netlink.h update
From upstream 5.3-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:58:44 -07:00
Stephen Hemminger 03dafe13f4 uapi: update uapi/magic.h
From upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:56:58 -07:00
Aya Levin f359942a25 devlink: Remove enclosing array brackets binary print with json format
Keep pr_out_binary_value function only for printing. Inner relations
like array grouping should be done outside the function.

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin 1d05cca2fd devlink: Fix binary values print
Fix function pr_out_binary_value() to start printing the binary buffer
from offset 0 instead of offset 1. Remove redundant new line at the
beginning of the output

Example:
With patch:
 mlx5e_txqsq:
   05 00 00 00 05 00 00 00 01 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0
Without patch
  mlx5e_txqsq:

  00 00 00 05 00 00 00 01 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin b4d97ef57f devlink: Change devlink health dump show command to dumpit
Although devlink health dump show command is given per reporter, it
returns large amounts of data. Trying to use the doit cb results in
OUT-OF-BUFFER error. This complementary patch raises the DUMP flag in
order to invoke the dumpit cb. We're safe as no existing drivers
implement the dump health reporter option yet.

Fixes: 041e6e651a ("devlink: Add devlink health dump show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Matteo Croce 1f420318bd utils: don't match empty strings as prefixes
iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:48:48 -07:00
Andrea Claudi 6bc13e4a20 tc: util: constrain percentage in 0-100 interval
parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.

Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%

Using values above 100% we have instead:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%

This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.

With this check in place, we have:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:45:59 -07:00
Stephen Hemminger fda6f26e9b uapi: fix bpf.h link
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:36:29 -07:00
Stephen Hemminger d5ddb441a5 tc: print all error messages to stderr
Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.

Also, remove unnecessary initialization and places
where else is used after error return.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:35:07 -07:00
David Ahern 1f250b6c53 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:41:13 -07:00
David Ahern 25c567bdcd Merge branch 'tc-mpls-action' into next
John Hurley  says:

====================

Recent kernel additions to TC allows the manipulation of MPLS headers as
filter actions.

The following patchset creates an iproute2 interface to the new actions
and includes documentation on how to use it.

v1->v2:
- change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
- split long line in explain() message (David Ahern)
- use _SL_ instead of /n in print message (David Ahern)

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:07:42 -07:00
John Hurley 3b810b3b9a man: update man pages for TC MPLS actions
Add a man page describing the newly added TC mpls manipulation actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:36 -07:00
John Hurley fb57b0920f tc: add mpls actions
Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:32 -07:00
John Hurley 11d7087a4e lib: add mpls_uc and mpls_mc as link layer protocol names
Update the llproto_names array to allow users to reference the mpls
protocol ids with the names 'mpls_uc' for unicast MPLS and 'mpls_mc' for
multicast.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:28 -07:00
David Ahern 1827694858 Import tc_mpls.h uapi header
Import tc_mpls.h uapi header from kernel headers at commit:
        1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:05:19 -07:00
Parav Pandit 056399399a devlink: Introduce PCI PF and VF port flavour and attribute
Introduce PCI PF and VF port flavour and port attributes such as PF
number and VF number.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:57:19 -07:00
Vincent Bernat eb105e0d94 ip: bond: add peer notification delay support
Ability to tweak the delay between gratuitous ND/ARP packets has been
added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
specify a delay between peer notifications"), through
IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
value.

Example:

    $ ip -d link set bond0 type bond peer_notify_delay 1000
    $ ip -d link l dev bond0
    2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    state UP mode DEFAULT group default qlen 1000
        link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
        bond mode active-backup active_slave eth0 miimon 100 updelay 0
    downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
    arp_validate none arp_all_targets any primary eth0
    primary_reselect always fail_over_mac active xmit_hash_policy
    layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
    0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
    stable tlb_dynamic_lb 1 addrgenmode eu

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:54:09 -07:00
David Ahern 01db6c4174 Update kernel headers
Update kernel headers to commit:
    1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

import include/uapi/linux/const.h per new dependency in
include/uapi/linux/pkt_cls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:52:48 -07:00
Roman Mashak 26a49de4db tc: document 'mask' parameter in skbedit man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Roman Mashak 82f3df2028 tc: added mask parameter in skbedit action
Add 32-bit missing mask attribute in iproute2/tc, which has been long
supported by the kernel side.

v2: print value in hex with print_hex() as suggested by Stephen Hemminger.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Andrea Claudi 89ce8012d7 ip-route: fix json formatting for metrics
Setting metrics for routes currently lead to non-parsable
json output. For example:

$ ip link add type dummy
$ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
$ ip -j route | jq
parse error: ':' not as part of an object at line 1, column 319

Fixing this opening a json object in the metrics array and using
print_string() instead of fprintf().

This is the output for the above commands applying this patch:

$ ip -j route | jq
[
  {
    "dst": "192.168.2.0",
    "dev": "dummy0",
    "scope": "link",
    "metric": 100,
    "flags": [],
    "metrics": [
      {
        "mtu": 1000,
        "rto_min": 3
      }
    ]
  }
]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Fixes: 968272e791 ("iproute: refactor metrics print")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:30:06 -07:00
Parav Pandit 2eb23f3e7a devlink: Show devlink port number
Show devlink port number whenever kernel reports that attribute.

An example output for a physical port.
$ devlink port show
pci/0000:06:00.1/65535: type eth netdev eth1_p1 flavour physical port 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:56:14 -07:00
David Ahern a257456f96 ss: Change resolve_services to numeric
Commit ca697cee4c ("ip: add a new parameter -Numeric") changed
!resolve_services to numeric in ss.c.

A commit in master:
  d791e75d74 ("ss: in --numeric mode, print raw numbers for data rates")
added another reference to !resolve_services. Convert it to numeric.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:54:34 -07:00
David Ahern 830ac9abe6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:26:44 -07:00
Stephen Hemminger af2583437e v5.2.0 2019-07-08 11:09:59 -07:00
Andrea Claudi 90f0b587d8 tc: netem: fix r parameter in Bernoulli loss model
As the man page for tc netem states:

    To use the Bernoulli model, the only needed parameter is p while the
    others will be set to the default values r=1-p, 1-h=1 and 1-k=0.

However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.

Fixes: 3c7950af59 ("netem: add support for 4 state and GE loss model")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:17:22 -07:00
Tomasz Torcz d791e75d74 ss: in --numeric mode, print raw numbers for data rates
ss by default shows data rates in human-readable form - as Mbps/Gbps etc.
 Enhance --numeric mode to show raw values in bps, without conversion.

  Signed-of-by: Tomasz Torcz <tomasz.torcz@nordea.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:16:23 -07:00
Andrea Claudi c95e17dcba man: tc-netem.8: fix URL for netem page
URL for netem page on sources section points to a no more existent
resource. Fix this using the correct URL.

Fixes: cd72dcf13c ("netem: add man-page")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-02 17:44:51 -07:00
Denis Kirjanov 0f48f9f46a ipaddress: correctly print a VF hw address in the IPoIB case
Current code assumes that we print ethernet mac and
that doesn't work in the IPoIB case with SRIOV-enabled hardware

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state
disable,
    trust off, query_rss off
    ...

After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: updated kernel headers to uapi commit
v2->v3: fixed alignment
v3->v4: aligned print statements as used through the source

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ committer note: flipped argument order for print_vfinfo to keep fp first
  and fixed alignment issues ]
2019-06-28 16:20:12 -07:00
David Ahern ea985eb42d Update kernel headers
Update kernel headers to commit:
    5cdda5f1d6ad ("ipv4: enable route flushing in network namespaces")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:14:25 -07:00
Andrea Claudi 1e5746d5e1 utils: move parse_percent() to tc_util
As parse_percent() is used only in tc.

This reduces ip, bridge and genl binaries size:

$ bloat-o-meter -t bridge/bridge bridge/bridge.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=50973, After=50864, chg -0.21%

$ bloat-o-meter -t genl/genl genl/genl.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=30298, After=30189, chg -0.36%

$ bloat-o-meter ip/ip ip/ip.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=674164, After=674055, chg -0.02%

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:06:26 -07:00
Hoang Le 1df17579d8 tipc: support interface name when activating UDP bearer
Support for indicating interface name has an ip address in parallel
with specifying ip address when activating UDP bearer.
This liberates the user from keeping track of the current ip address
for each device.

Old command syntax:
$tipc bearer enable media udp name NAME localip IP

New command syntax:
$tipc bearer enable media udp name NAME [localip IP|dev DEVICE]

v2:
    - Removed initial value for fd
    - Fixed the returning value for cmd_bearer_validate_and_get_addr
      to make its consistent with using: zero or non-zero
v3: - Switch to use helper 'get_ifname' to retrieve interface name
v4: - Replace legacy SIOCGIFADDR by netlink
v5: - Fix leaky rtnl_handle

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:03:16 -07:00
Baruch Siach d0272f5404 devlink: fix libc and kernel headers collision
Since commit 2f1242efe9 ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:

In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
 struct sysinfo {
        ^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
                 from ../include/uapi/linux/netlink.h:5,
                 from ../include/uapi/linux/genetlink.h:6,
                 from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
 struct sysinfo {
        ^~~~~~~

Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Baruch Siach ee09370a72 devlink: fix format string warning for 32bit targets
32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:

devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
    pr_out("%s %lu", name, val);
           ^
devlink.c:59:21: note: in definition of macro ‘pr_out’
   fprintf(stdout, ##args);   \
                     ^~~~

Use uint64_t specific conversion specifiers in the format string to fix
that.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Andrea Claudi 68c46872ce ip address: do not set mngtmpaddr option for IPv4 addresses
'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set mngtmpaddr on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global mngtmpaddr dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.

Fixes: 5b7e21c417 ("add support for IFA_F_MANAGETEMPADDR")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi e4448b6c7d ip address: do not set home option for IPv4 addresses
'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set home on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global home dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi 8ae99cc46d ip address: do not set nodad option for IPv4 addresses
Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.

However it is possible to set nodad on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global nodad dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Stefano Brivio b5cf263670 iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:

  564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED
  ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries"
  3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()
  bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps
  1e47b4837f3b ipv6: Dump route exceptions if requested
  40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()

However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.

Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.

v2: Instead of creating a separate 'filter' function dealing with
    RTM_F_CACHED only, use the existing iproute_dump_filter() and get
    table and oif kernel filtering for free. Suggested by David Ahern.

Fixes: aba5acdfdb ("(Logical change 1.3)")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:27:00 -07:00
Hangbin Liu 5a403866f3 ip/iptoken: fix dump error when ipv6 disabled
When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like

$ ip token list
Dump terminated

which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:23:12 -07:00
Stephen Hemminger f799505372 devlink: replace print macros with functions
Using functions is safer, and printing is not performance
critical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 09:18:18 -07:00
Eyal Birger bfa757e02f tc: adjust xtables_match and xtables_target to changes in recent iptables
iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.

This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-24 16:12:17 -07:00
David Ahern f7eef91897 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/snmp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:59:24 -07:00
Jakub Kicinski b3cf1167e7 tc: q_netem: JSON-ify the output
Add JSON output support to q_netem.

The normal output is untouched.

In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.

v2: less macroes

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:51:35 -07:00
Nicolas Dichtel 6d77d9c6ae ip monitor: display interfaces from all groups
Only interface from group 0 were displayed.

ip monitor calls ipaddr_reset_filter() and there is no reason to not reset
the filter group in this function.

Fixes: c4fdf75d3d ("ip link: fix display of interface groups")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-21 12:59:50 -07:00
Matteo Croce b2e2922373 netns: make netns_{save,restore} static
The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce d81d4ba15d ip vrf: use hook to change VRF in the child
On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce 903818fbf9 netns: switch netns in the child when executing commands
'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:

	# ip -b - <<-'EOF'
		link add type vrf table 100
		link set vrf0 up
		link add type dummy
		link set dummy0 vrf vrf0 up
		netns add ns1
	EOF
	# ip -all -b - <<-'EOF'
		vrf exec vrf0 true
		netns exec setsid -f sleep 1h
	EOF
	# ip vrf pids vrf0
	  314  sleep
	# ps 314
	  PID TTY      STAT   TIME COMMAND
	  314 ?        Ss     0:00 sleep 1h

Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.

Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.

After this refactor some utility functions became unused, so remove them.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Pete Morici b16f525323 Add support for configuring MACsec gcm-aes-256 cipher type.
Signed-off-by: Pete Morici <pmorici@dev295.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:55:51 -07:00
Andrea Claudi 8063feebba Makefile: use make -C
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover on testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:52:58 -07:00
Stephen Hemminger 77a380379f uapi: update headers and add if_link.h and if_infiniband.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:48:21 -07:00
Michael Forney 578cadcc68 ipmroute: Prevent overlapping storage of `filter` global
This variable has the same name as `struct xfrm_filter filter` in
ip/ipxfrm.c, but overrides that definition since `struct rtfilter`
is larger.

This is visible when built with -Wl,--warn-common in LDFLAGS:

	/usr/bin/ld: ipxfrm.o: warning: common of `filter' overridden by larger common from ipmroute.o

Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:43:29 -07:00
Hangbin Liu ca697cee4c ip: add a new parameter -Numeric
Add a new parameter '-Numeric' to show the number of protocol, scope,
dsfield, etc directly instead of converting it to human readable name.
Do the same on tc and ss.

This patch is based on David Ahern's previous patch.

Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-18 08:37:47 -07:00
David Ahern e92d221022 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 07:29:40 -07:00
David Ahern 82cdb4d445 tools: Fix include path for generate_nlmsg
Compile of tools directory fails with:

make -C tools
    CC       generate_nlmsg
../../lib/libnetlink.c:28:27: fatal error: linux/nexthop.h: No such file or directory
 #include <linux/nexthop.h>
                           ^
compilation terminated.

Add local uapi to build path.

Fixes: 74829ca7dd ("libnetlink: Add helper to create nexthop dump request")
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:50:55 -07:00
Andrea Claudi 41bf0c69c0 Makefile: use make -C to change directory
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover in testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:44:39 -07:00
Stephen Hemminger b0a09ace39 testsuite: intent if/else in Makefile
Indent both arms of if/else equally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:48:33 -07:00
Moshe Shemesh c934da8aaa devlink: mnlg: Catch returned error value of dumpit commands
Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:43:14 -07:00
David Ahern 2357abbbfa Merge branch 'nexthop-objects' into next
David Ahern  says:

====================

This set adds support for nexthop objects to the ip command. The syntax
for nexthop objects is identical to the current 'ip route .. nexthop ...'
syntax making it easy to convert existing use cases.

v2
- Fixed header use in rtnl_nexthopdump_req as noted by roopa
- made rth_del static per Stephen's request and fixed coding style
- removed print_nh_gateway and exported print_rta_gateway to reuse
  the iproute.c code (keeps consistency in output)
- added examples to commit message
- fixed monitor use when specific groups requested
- fixed usage in 'ip nexthop'
- added manpage

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:32:07 -07:00
David Ahern e7cd93e7af ipmonitor: Add nexthop option to monitor
Add capability to ip-monitor to listen and dump nexthop messages.
Since the nexthop group = 32 which exceeds the max groups bit
field, 2 separate flags are needed - one that defaults on to indicate
nexthop group is joined by default and a second that indicates a
specific selection by the user (e.g, ip mon nexthop route).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-11 10:31:30 -07:00
David Ahern 12387e2c14 ip route: Add option to use nexthop objects
Add nhid option for routes to use nexthop objects by id.

Example:
  $ ip nexthop add id 1 via 10.99.1.2 dev veth1
  $ ip route add 10.100.1.0/24 nhid 1
  $ ip route ls
  ...
  10.100.1.0/24 nhid 1 via 10.99.1.2 dev veth1

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:28 -07:00
David Ahern 42cce67e71 ip: Add man page for nexthop command
Document 'ip nexthop' options in a man page with a few examples.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:06 -07:00
David Ahern 63df8e8543 Add support for nexthop objects
Add nexthop subcommand to ip. Implement basic commands for creating,
deleting and dumping nexthop objects. Syntax follows 'nexthop' syntax
from existing 'ip route' command.

Examples:
1. Single path
    $ ip nexthop add id 1 via 10.99.1.2 dev veth1
    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link

2. ECMP
    $ ip nexthop add id 2 via 10.99.3.2 dev veth3
    $ ip nexthop add id 1001 group 1/2
      --> creates a nexthop group with 2 component nexthops:
          id 1 and id 2 both the same weight

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2

3. Weighted multipath
    $ ip nexthop add id 1002 group 1,10/2,20
      --> creates a nexthop group with 2 component nexthops:
          id 1 with a weight of 10 and id 2 with a weight of 20

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2
    id 1002 group 1,10/2,20

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:58 -07:00
David Ahern 48a1e96d90 ip route: Export print_rt_flags, print_rta_if and print_rta_gateway
Export print_rt_flags and print_rta_if for use by the nexthop
command.

Change print_rta_gateway to take the family versus rtmsg struct and
export for use by the nexthop command.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:55 -07:00
David Ahern 74829ca7dd libnetlink: Add helper to create nexthop dump request
Add rtnl_nexthopdump_req to initiate a dump request of nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:53 -07:00
David Ahern 10631938f1 uapi: Import nexthop object API
Add nexthop.h from kernel with the uapi for nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:50 -07:00
David Ahern 9860becfe3 libnetlink: Add helper to add a group via setsockopt
groups > 31 have to be joined using the setsockopt. Since the nexthop
group is 32, add a helper to allow 'ip monitor' to listen for nexthop
messages.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:48 -07:00
David Ahern 7392401027 lwtunnel: Pass encap and encap_type attributes to lwt_parse_encap
lwt_parse_encap currently assumes the encap attribute is RTA_ENCAP
and the type is RTA_ENCAP_TYPE. Change lwt_parse_encap to take these
as input arguments for reuse by nexthop code which has the attributes
as NHA_ENCAP and NHA_ENCAP_TYPE.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:46 -07:00
David Ahern 2360b8cb21 libnetlink: Set NLA_F_NESTED in rta_nest
Kernel now requires NLA_F_NESTED to be set on new nested
attributes. Set NLA_F_NESTED in rta_nest.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:39 -07:00
Mahesh Bandewar ba126dcad2 ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds
Inclusion of 'dev' is allowed by the syntax but not handled
correctly by the command. It produces no output for show
command and falsely successful for change command but does
not make any changes.

can be verified with the following steps
  # ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none
  # ip -6 tunnel show ip6tnl1
  <correct output>
  # ip -6 tunnel show dev ip6tnl1
  <no output but correct output after this change>
  # ip -6 tunnel change dev ip6tnl1 local 2001🔢:1 remote 2001🔢:2 encaplimit none ttl 127 tos inherit allow-localremote
  # echo $?
  0
  # ip -6 tunnel show ip6tnl1
  <no changes applied, but changes are correctly applied after this change>

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:43:09 -07:00
Matteo Croce 80a931d41c ip: reset netns after each command in batch mode
When creating a new netns or executing a program into an existing one,
the unshare() or setns() calls will change the current netns.
In batch mode, this can run commands on the wrong interfaces, as the
ifindex value is meaningful only in the current netns. For example, this
command fails because veth-c doesn't exists in the init netns:

    # ip -b - <<-'EOF'
        netns add client
        link add name veth-c type veth peer veth-s netns client
        addr add 192.168.2.1/24 dev veth-c
    EOF
    Cannot find device "veth-c"
    Command failed -:7

But if there are two devices with the same name in the init and new netns,
ip will build a wrong ll_map with indexes belonging to the new netns,
and will execute actions in the init netns using this wrong mapping.
This script will flush all eth0 addresses and bring it down, as it has
the same ifindex of veth0 in the new netns:

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
           valid_lft 3598sec preferred_lft 3598sec

    # ip -b - <<-'EOF'
        netns add client
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
    4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
    5: veth-ns@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client

The same issue can be triggered by the netns exec subcommand with a
sligthy different script:

    # ip netns add client
    # ip -b - <<-'EOF'
        netns exec client true
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

Fix this by adding two netns_{save,reset} functions, which are used
to get a file descriptor for the init netns, and restore it after
each batch command.
netns_save() is called before the unshare() or setns(),
while netns_restore() is called after each command.

Fixes: 0dc34c7713 ("iproute2: Add processless network namespace support")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:42:14 -07:00
David Ahern 9a4f0ba478 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:32:07 -07:00
Kevin Darbyshire-Bryant d7f2bccd0f tc: add support for action act_ctinfo
ctinfo is a tc action restoring data stored in conntrack marks to
various fields.  At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.

It understands a number of parameters specific to this action in
additional to the usual action syntax.  Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]

DSCP mode

dscp enables copying of a DSCP stored in the conntrack mark into the
ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is located.  It must be 6
contiguous bits long. eg. 0xfc000000 would restore the DSCP from the
upper 6 bits of the conntrack mark.

The DSCP copying may be optionally controlled by a statemask.  The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask.  The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark ANDed with the statemask
yield a non zero result.

eg. dscp 0xfc000000 0x01000000 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
example.

CPMARK mode

cpmark enables copying of the conntrack mark to the packet skb mark.  In
this mode it is completely equivalent to the existing act_connmark
action.  Additional functionality is provided by the optional mask
parameter, whereby the stored conntrack mark is logically ANDed with the
cpmark mask before being stored into skb mark.  This allows shared usage
of the conntrack mark between applications.

eg. cpmark 0x00ffffff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]
where :
	dscp MASK is the bitmask to restore DSCP
	     STATEMASK is the bitmask to determine conditional restoring
	cpmark MASK mask applied to restored packet mark
	ZONE is the conntrack zone
	CONTROL := reclassify | pipe | drop | continue | ok |
		   goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:24:38 -07:00
David Ahern ed624243da uapi: Import tc_ctinfo uapi
Add tc_ctinfo.h uapi file from kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:23:32 -07:00
David Ahern b2f8eb7f8a Update kernel headers
Update kernel headers to commit:
    ad3a9ee0b623 ("ocelot: remove unused variable 'rc' in vcap_cmd()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 09:39:08 -07:00
Davide Caratti 0ee4d17954 tc: simple: don't hardcode the control action
the following TDC test case:

 b776 - Replace simple action with invalid goto chain control

checks if the kernel correctly validates the 'goto chain' control action,
when it is specified in 'act_simple' rules. The test systematically fails
because the control action is hardcoded in parse_simple(), i.e. it is not
parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
Because of that, the following command:

 # tc action add action simple sdata "test" drop index 7

installs an 'act_simple' rule that never drops packets, and whose 'index'
is the first IDR available, plus an 'act_gact' rule with 'index' equal to
7, that drops packets.

Use parse_action_control_dflt(), like we did on many other TC actions, to
make the control action configurable also with 'act_simple'. The expected
results of test b776 are summarized below:

 iproute2
   v       kernel->| 5.1-rc2 (and previous)  | 5.1-rc3 (and subsequent)
 ------------------+-------------------------+-------------------------
 5.1.0             | FAIL (bad IDR)          | FAIL (bad IDR)
 5.1.0(patched)    | FAIL (no rule/bad sdata)| PASS

Changes since v1:
 - reword commit message, thanks Stephen Hemminger

Fixes: 087f46ee4e ("tc: introduce simple action")
CC: Andrea Claudi <aclaudi@redhat.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:43:08 -07:00
Roman Mashak fa49588973 tc: Fix binding of gact action by index.
The following operation fails:
% sudo tc actions add action pipe index 1
% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact index 1

Bad action type index
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

However, passing a control action of gact rule during filter binding works:

% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact pipe index 1

Binding by reference, i.e. by index, has to consistently work with
any tc action.

Since tc is sensitive to the order of keywords passed on the command line,
we can teach gact to skip parsing arguments as soon as it sees 'gact'
followed by 'index' keyword.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:31 -07:00
Parav Pandit 2cc10ce81d devlink: Increase bus, device buffer size to 64 bytes
Device name on mdev bus is 36 characters long which follow standard uuid
RFC 4122.
This is probably the longest name that a kernel will return for a
device.

Hence increase the buffer size to 64 bytes.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:17 -07:00
Davide Caratti 4ae441e3d1 man: tc-skbedit.8: document 'inheritdsfield'
while at it, fix missing square bracket near 'ptype' and a typo in the
action description (it's -> its).

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-04 09:39:53 -07:00
David Ahern 339b14ab5e Merge branch 'rdma-net-namespace' into next
Parav Pandit  says:

====================

RDMA subsystem can be running in either of the modes.
(a) Sharing RDMA devices among multiple net namespaces or
(b) Exclusive mode where RDMA device is bound to single net namespace

This patch series adds
(1) query command to query rdma subsystem sharing mode
(2) set command to change rdma subsystem sharing mode
(3) assign rdma device to a net namespace

rdma tool examples:
(a) Query current rdma subsys net namespace sharing mode
$ rdma sys show
netns shared

(b) Change rdma subsys mode to exclusive mode
$ rdma sys set netns exclusive

$ rdma sys show
netns exclusive

(c) Assign rdma device to a specific newly created net namespace
$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:55 -07:00
Parav Pandit c2ffce5d39 rdma: Add man page for rdma dev set netns command
Add man page to describe additional set netns command
for rdma device.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:33 -07:00
Parav Pandit d17a0248a2 rdma: Add an option to set net namespace of rdma device
Enrich rdmatool with an option to set network namespace of RDMA
device. After successful execution of it, rdma device will
be accessible only in assigned network namespace.

rdma tool command examples and output.

First set netns mode to exclusive.

$ rdma system set netns exclusive

Now create network namespace and assign RDMA device to this
network namespace.

$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:32 -07:00
Parav Pandit e861272015 rdma: Add man pages for rdma system commands
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:31 -07:00
Parav Pandit c4572a465b rdma: Add an option to query,set net namespace sharing sys parameter
Enrich rdmatool with an option to query rdma subsystem parameter
whether rdma devices are shared among multiple network namespaces
or exclusive to single network namespace.

rdma tool command examples and output.

$ rdma system show
netns shared

$ rdma system set netns exclusive

$ rdma system show
netns exclusive

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:29 -07:00
Nicolas Dichtel c442234858 iplink: don't try to get ll addr len when creating an iface
It will obviously fail. This is a follow up of the
commit 757837230a ("lib: suppress error msg when filling the cache").

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:03:20 -07:00
Nikolay Aleksandrov a9661b8b0f bridge: mdb: restore text output format
While I fixed the mdb json output, I did overlook the text output.
This patch returns the original text output format:
 dev <bridge> port <port> grp <mcast group> <temp|permanent> <flags> <timer>
Example (old format, restored by this patch):
 dev br0 port eth8 grp 239.1.1.11 temp

Example (changed format after the commit below):
 23: br0  eth8  239.1.1.11  temp

We had some reports of failing scripts which were parsing the output.
Also the old format matches the bridge mdb command syntax which makes
it easier to build commands out of the output.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:01:53 -07:00
David Ahern 2761769472 Update kernel headers
Update kernel headers to commit:
    1167187f2759 ("Merge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-29 12:36:58 -07:00
Lukasz Czapnik 767b6fd620 tc: flower: fix port value truncation
sscanf truncates read port values silently without any error. As sscanf
man says:
(...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
do not specify the ERANGE error.

Replace sscanf with safer get_be16 that returns error when value is out
of range.

Example:
tc filter add dev eth0 protocol ip parent ffff: prio 1 flower ip_proto
tcp dst_port 70000 hw_tc 1

Would result in filter for port 4464 without any warning.

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:27:01 -07:00
Nicolas Dichtel 757837230a lib: suppress error msg when filling the cache
Before the patch:
$ ip netns add foo
$ ip link add name veth1 address 2a:a5:5c:b9:52:89 type veth peer name veth2 address 2a:a5:5c:b9:53:90 netns foo
RTNETLINK answers: No such device
RTNETLINK answers: No such device

But the command was successful. This may break script. Let's remove those
error messages.

Fixes: 55870dfe7f ("Improve batch and dump times by caching link lookups")
Reported-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:23:52 -07:00
Stephen Hemminger 1bb38f6c5e uapi: minor upstream btf.h header change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-24 15:51:06 -07:00
Paolo Abeni 6eccf7ecdb m_mirred: don't bail if the control action is missing
The mirred act admits an optional control action, defaulting
to TC_ACT_PIPE. The parsing code currently emits an error message
if the control action is not provided on the command line, even
if the command itself completes with no error.

This change shuts down the error message, using the appropriate
parsing helper.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-22 11:51:31 -07:00
Stephen Hemminger cd35c95423 man: fix macaddr section of ip-link
The formatting of setting mac address was confusing.
Break lines and fix highlighting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-21 11:27:14 -07:00
Matteo Croce 8589eb4efd treewide: refactor help messages
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:

        ip/ip:
        add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
        Total: Before=672591, After=667898, chg -0.70%
        ip/rtmon:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
        Total: Before=48879, After=48825, chg -0.11%
        tc/tc:
        add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
        Total: Before=351912, After=346661, chg -1.49%
        bridge/bridge:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
        Total: Before=70502, After=70043, chg -0.65%
        misc/lnstat:
        add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
        Total: Before=9960, After=9522, chg -4.40%
        tipc/tipc:
        add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
        Total: Before=79182, After=79138, chg -0.06%

While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:35:07 -07:00
David Ahern 2cc9b5f4fa Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:34:26 -07:00
Stephen Hemminger f99ea67684 rdma: update uapi headers
Based on 5.2-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:39 -07:00
Gal Pressman 7087f7c0ce rdma: Update node type strings
Fix typo in usnic_udp node type and add a string for the unspecified
node type.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:35 -07:00
Stephen Hemminger b60ed9a372 uapi: merge bpf.h from 5.2
Upstream commit to fix spelling errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:53:07 -07:00
Stephen Hemminger 2f31cb4fd6 uapi: add sockios.h
Forgot to add this to earlier commit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:51:15 -07:00
Stephen Hemminger 441f67de19 mailmap: map David's mail address
Cleans up multiple mail addresses in shortlog output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:50:42 -07:00
Stephen Hemminger a99b08624d mailmap: add myself
Put entries in for past commit mail addresses

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:30:47 -07:00
Stephen Hemminger afa588490b uapi: update headers to import asm-generic/sockios.h
import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:56:15 -07:00
Stephen Hemminger 0812dc7025 uapi: add include/linux/net.h
All kernel headers must come from this repo,
and ss is including linux/net.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:54:26 -07:00
David Ahern d53d7ce382 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 12:01:01 -07:00
David Ahern 5106aeaaf2 Update kernel headers and add asm-generic/sockios.h
Update kernel headers to commit
    b970afcfcabd ("Merge tag 'powerpc-5.2-1'")

and import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 10:06:41 -07:00
Stephen Hemminger f9339f8a8f uapi: update to elf-em header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-10 08:56:52 -07:00
Stephen Hemminger f9e2cf35eb Merge ../iproute2-next 2019-05-10 08:55:11 -07:00
Stephen Hemminger 3eea00d777 v5.1.0 2019-05-10 08:45:14 -07:00
Phil Sutter cd21ae4013 ip-xfrm: Respect family in deleteall and list commands
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.

Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 13:32:44 -07:00
Zhiqiang Liu 9bf2c538a0 ipnetns: use-after-free problem in get_netnsid_from_name func
Follow the following steps:
 # ip netns add net1
 # export MALLOC_MMAP_THRESHOLD_=0
 # ip netns list
then Segmentation fault (core dumped) will occur.

In get_netnsid_from_name func, answer is freed before
rta_getattr_u32(tb[NETNSA_NSID]), where tb[] refers to answer`s
content. If we set MALLOC_MMAP_THRESHOLD_=0, mmap will be adoped to
malloc memory, which will be freed immediately after calling free
func.  So reading tb[NETNSA_NSID] will access the released memory
after free(answer).

Here, we will call get_netnsid_from_name(tb[NETNSA_NSID]) before free(answer).

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reported-by: Huiying Kou <kouhuiying@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:36:18 -07:00
Ido Schimmel e73306048f devlink: Fix monitor command
The command is supposed to allow users to filter events related to
certain objects, but returns an error when an object is specified:

# devlink mon dev
Command "dev" not found

Fix this by allowing the command to process the specified objects.

Example:

# devlink/devlink mon dev &
# echo "10 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim10

# devlink/devlink mon port &
# echo "11 1" > /sys/bus/netdevsim/new_device
[port,new] netdevsim/netdevsim11/0: type notset flavour physical
[port,new] netdevsim/netdevsim11/0: type eth netdev eth1 flavour physical

# devlink/devlink mon &
# echo "12 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim12
[port,new] netdevsim/netdevsim12/0: type notset flavour physical
[port,new] netdevsim/netdevsim12/0: type eth netdev eth2 flavour physical

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:35:44 -07:00
Vinicius Costa Gomes 92f4b6032e taprio: Add support for cycle_time and cycle_time_extension
This allows a cycle-time and a cycle-time-extension to be specified.

Specifying a cycle-time will truncate that cycle, so when that instant
is reached, the cycle will start from its beginning.

A cycle-time-extension may cause the last entry of a cycle, just
before the start of a new schedule (the base-time of the "admin"
schedule) to be extended by at maximum "cycle-time-extension"
nanoseconds. The idea of this feauture, as described by the IEEE
802.1Q, is too avoid too narrow gate states.

Example:

tc qdisc change dev IFACE parent root handle 100 taprio \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      cycle-time-extension 100000 \
	      cycle-time 9000000 \
	      base-time 12345678900000000

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Vinicius Costa Gomes 602fae856d taprio: Add support for changing schedules
This allows for a new schedule to be specified during runtime, without
removing the current one.

For that, the semantics of the 'tc qdisc change' operation in the
context of taprio is that if "change" is called and there is a running
schedule, a new schedule is created and the base-time (let's call it
X) of this new schedule is used so at instant X, it becomes the
"current" schedule. So, in short, "change" doesn't change the current
schedule, it creates a new one and sets it up to it becomes the
current one at some point.

In IEEE 802.1Q terms, it means that we have support for the
"Oper" (current and read-only) and "Admin" (future and mutable)
schedules.

Example of creating the first schedule, then adding a new one:

(1)
tc qdisc add dev IFACE parent root handle 100 taprio \
      	      num_tc 1 \
	      map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	      queues 1@0 \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      base-time 100000000 \
	      clockid CLOCK_TAI

(2)
tc qdisc change dev IFACE parent root handle 100 taprio \
	      base-time 7500000000000 \
	      sched-entry S 0x0 5000000 \
              sched-entry S 0x1 5000000 \

It was necessary to fix a bug, so the clockid doesn't need to be
specified when changing the schedule.

Most of the changes are related to make it easier to reuse the same
function for printing the "admin" and "oper" schedules.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Paolo Abeni c865c52365 tc: add support for plug qdisc
sch_plug can be used to perform functional qdisc unit tests
controlling explicitly the queuing behaviour from user-space.

Plug support lacks since its introduction in 2012. This change
introduces basic support, to control the tc status.

v1 -> v2:
 - use the SPDX identifier

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:14 -07:00
David Ahern fd6580972b Update kernel headers
Update kernel headers to commit
   a734d1f4c2fc ("net: openvswitch: return an error instead of doing BUG_ON()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:13:26 -07:00
David Ahern 420b36a874 uapi: wrap SIOCGSTAMP and SIOCGSTAMPNS in ifndef
These warnings:
    ../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
    ../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are from kernel commit 0768e17073dc5 ("net: socket: implement 64-bit
timestamps"). This commit moved the definitions of SIOCGSTAMP and
SIOCGSTAMPNS from include/asm-generic/sockios.h to
include/uapi/linux/sockios.h. Older OS'es already define them in
/usr/include/asm-generic/sockios.h resulting in ugly compile errors now:

In file included from ll_types.c:24:0:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
 #define SIOCGSTAMP SIOCGSTAMP_OLD

In file included from /usr/include/x86_64-linux-gnu/asm/sockios.h:1:0,
                 from /usr/include/asm-generic/socket.h:5,
                 from /usr/include/x86_64-linux-gnu/asm/socket.h:1,
                 from /usr/include/x86_64-linux-gnu/bits/socket.h:368,
                 from /usr/include/x86_64-linux-gnu/sys/socket.h:38,
                 from ll_types.c:17:
/usr/include/asm-generic/sockios.h:11:0: note: this is the location of the previous definition
 #define SIOCGSTAMP 0x8906  /* Get stamp (timeval) */

so wrap them in #ifndef.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-03 08:23:08 -07:00
Josh Hunt 296b5de724 ss: add option to print socket information on one line
Multi-line output in ss makes it difficult to search for things with
grep. This new option will make it easier to find sockets matching
certain criteria with simple grep commands.

Example without option:
$ ss -emoitn
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port
ESTAB      0      0      127.0.0.1:13265              127.0.0.1:36743               uid:1974 ino:48271 sk:1 <->
	 skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.245/16.616 ato:40 mss:65483 cwnd:10 bytes_acked:41865496 bytes_received:21580440 segs_out:242496 segs_in:351446 data_segs_out:242495 data_segs_in:242495 send 511.3Mbps lastsnd:2383 lastrcv:2383 lastack:2342 pacing_rate 1022.6Mbps rcv_rtt:92427.6 rcv_space:43725 minrtt:0.007

Example with new option:
$ ss -emoitnO
State    Recv-Q Send-Q          Local Address:Port            Peer Address:Port
ESTAB    0      0                   127.0.0.1:13265              127.0.0.1:36743 uid:1974 ino:48271 sk:1 <-> skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.067/16.429 ato:40 mss:65483 pmtu:65535 rcvmss:536 advmss:65483 cwnd:10 bytes_sent:41868244 bytes_acked:41868244 bytes_received:21581866 segs_out:242512 segs_in:351469 data_segs_out:242511 data_segs_in:242511 send 520.4Mbps lastsnd:14355 lastrcv:14355 lastack:14314 pacing_rate 1040.7Mbps delivery_rate 74837.7Mbps delivered:242512 app_limited busy:1861946ms rcv_rtt:92427.6 rcv_space:43725 rcv_ssthresh:43690 minrtt:0.007

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-02 16:06:06 -07:00
Ido Schimmel 517ea57c6d devlink: Increase column size for larger shared buffers
With current number of spaces the output is mangled if the shared buffer
is congested.

Before:

# devlink sb occupancy show swp25
swp25:
  pool: 0:    33384960/39344256 1:          0/0       2:          0/0       3:          0/0
        4:          0/720     5:          0/0       6:          0/0       7:          0/0
        8:          0/288     9:          0/0      10:          0/0
  itc:  0(0): 33272064/39344256 1(0):       0/0       2(0):       0/0       3(0):       0/0
        4(0):       0/0       5(0):       0/0       6(0):       0/0       7(0):       0/0
  etc:  0(4):       0/720     1(4):       0/0       2(4):       0/0       3(4):       0/0
        4(4):       0/0       5(4):       0/0       6(4):       0/0       7(4):       0/0
        8(8):       0/288     9(8):       0/0      10(8):       0/0      11(8):       0/0
       12(8):       0/0      13(8):       0/0      14(8):       0/0      15(8):       0/0

After:

# devlink sb occupancy show swp25
swp25:
  pool: 0:      39070080/39344256   1:             0/0          2:             0/0          3:             0/0
        4:             0/720        5:             0/0          6:             0/0          7:             0/0
        8:             0/288        9:             0/0         10:             0/0
  itc:  0(0):   39062016/39344256   1(0):          0/0          2(0):          0/0          3(0):          0/0
        4(0):          0/0          5(0):          0/0          6(0):          0/0          7(0):          0/0
  etc:  0(4):          0/720        1(4):          0/0          2(4):          0/0          3(4):          0/0
        4(4):          0/0          5(4):          0/0          6(4):          0/0          7(4):          0/0
        8(8):          0/288        9(8):          0/0         10(8):          0/0         11(8):          0/0
       12(8):          0/0         13(8):          0/0         14(8):          0/0         15(8):          0/0

v2:
* Increase number of spaces to make the change more future-proof

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-30 11:23:08 -07:00
Nikolay Aleksandrov 09e0528cf9 ip: mroute: add fflush to print_mroute
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

After this patch ip monitor mroute &>log works properly.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-29 15:04:18 -07:00
Lucas Siba 2019-04-20 11:40 UTC 4d9e90f36b Update tc-bpf.8 man page examples
This patch updates the tc-bpf.8 example application for changes to the
struct bpf_elf_map definition. In it's current form, things compile, but
the resulting object file is rejected by the verifier when attempting to
load it through tc.

Signed-off-by: Lucas Siba <lucas.siba@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ dropped the unnecessary flags initialization on commit ]
2019-04-26 14:05:47 -07:00
David Ahern 10fb5faec1 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:13:54 -07:00
Mike Manning 3f2e457ae4 iplink_vlan: add support for VLAN bridge binding flag
This patch adds support for the VLAN bridge binding flag that is
provided in net-next kernel by the series merged by 1ab839281cf7
("net-support-binding-vlan-dev-link-state-to-vlan-member-bridge-ports")

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:12:58 -07:00
David Ahern 70de8a7fa7 Update kernel headers
Update kernel headers to commit
    148f025d41a8 ("Merge branch 'hns3-next'")

Note, these warnings:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are due to kernel commit
    0768e17073dc5 ("net: socket: implement 64-bit timestamps")

which moved the definitions from include/asm-generic/sockios.h
to include/uapi/linux/sockios.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:11:03 -07:00
Stephen Hemminger 38983334f6 tc/ematch: fix deprecated yacc warning
Newer versions of Bison deprecated some directives.

    YACC     emp_ematch.yacc.c
emp_ematch.y:11.1-14: warning: deprecated directive, use ‘%define parse.error verbose’ [-Wdeprecated]
 %error-verbose
 ^~~~~~~~~~~~~~
emp_ematch.y:12.1-22: warning: deprecated directive, use ‘%define api.prefix {ematch_}’ [-Wdeprecated]
 %name-prefix "ematch_"

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:10:22 -07:00
Thomas Haller 62de07faf7 iprule: always print realms keyword for rule
# rule add priority 10 realms 1/0xF
    # rule add priority 10 realms 0/0xF
    # ip rule
    10:     from all lookup main 15
    10:     from all lookup main realms 1/15

The previous behavior was there since the beginning.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 927632d4da iprule: refactor print_rule() to use leading space before printing attribute
When printing the actions, we avoid adding the trailing space after the
attribute. Possibly because we expect the action to be the last output
on the line and not end with a space.

But for FR_ACT_TO_TBL nothing is printed. That means, we add double
spaces if a protocol is printed as well:

    # ip rule add priority 10 protocol 10 type 1

will be printed as

    10:     from all lookup 1  proto mrt

The only visible effect of the patch is to avoid the double-space and
avoid a trailing space if the action is FR_ACT_TO_TBL.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 461f0405f3 iprule: avoid trailing space in print_rule() after printing protocol
It seems print_rule() tries to avoid a trailing space at the end
of the line. At least, when printing details about the actions,
they no longer append the space. Probably expecting to be the
last attribute that will be printed.

Don't let the protocol add the trailing space. The space at the end
of the line should be printed consistently (or not).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 6f87b544ca iprule: avoid printing extra space after gateway for nat action
For all other actions we avoid the trailing space, so do it here
as well.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Kristian Evensen 112112b8eb ip fou: Support binding FOU ports
This patch adds support for binding FOU ports using iproute2.
Kernel-support was added in 1713cb37bf67 ("fou: Support binding FoU
socket").

The parse function now handles new arguments for setting the
binding-related attributes, while the print function writes the new
attributes if they are set. Also, the man page has been updated.

v2->v3:
* Remove redundant ll_init_map()-calls (thanks David Ahern).

v1->v2 (all changes suggested by David Ahern):
* Fix reverse Christmas tree ordering.
* Remove redundant peer_port_set-variable, it is enough to check
peer_port.
* Add proper error handling of invalid local/peer addresses.
* Use interface name and not index.
* Remove updating fou-header file, it is already done.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-22 11:42:54 -07:00
Nikolay Aleksandrov 90306a1440 iplink: bridge: add support for vlan_stats_per_port
Add support for manipulating and showing the vlan_stats_per_port bridge
option which can be toggled only when there are no port VLANs
configured. Also update the man page with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:47:39 -07:00
Ido Schimmel 185ba5e2d4 ipneigh: Print neighbour offload indication
Print the offload indication in case it is set on the neighbour.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:23:23 -07:00
Nikolay Aleksandrov 6e982e7b9b bridge: vlan: fix standard stats output
Each of the commits below broke the vlan stats output in a different
way:
- 45fca4ed94 ("bridge: fix vlan show stats formatting")
 Added a second print of an interface name (e.g. eth4eth4)
- c7c1a1ef51 ("bridge: colorize output and use JSON print library")
 Broke normal vlan stats output by not printing a new line after them
 Also printed interfaces without any vlans when printing stats

This fix is not pretty but it brings back the previous behaviour.

Before this fix:
$ bridge -s vlan show
port             vlan id
br0br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetseth4eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetsroot@debian:~/

After this fix:
$ bridge -s vlan show
port             vlan id
br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
                 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets

Fixes: 45fca4ed94 ("bridge: fix vlan show stats formatting")
Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:30:05 -07:00
Nikolay Aleksandrov b11b495e7c bridge: mdb: restore valid json output
Since the commit below mdb's json output has been invalid and also with
changed format. Restore it to a valid json like the previous format.
Also takes care of a double "Deleted" print when monitoring for changes.

Example bridge -p -d -j mdb show:
 [ {
        "mdb": [ {
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::1:fffb:1939",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::1:ffd0:f61f",
                "state": "temp",
                "flags": [ ]
            } ],
        "router": {
            "virbr0": [ {
                    "port": "vnet1"
                },{
                    "port": "vnet0"
                } ],
            "virbr1": [ {
                    "port": "vnet5"
                } ]
        }
    } ]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:27:06 -07:00
Beniamino Galvani d6abae5a7a ip: add missing space after 'external' in detailed mode
Add a missing space after the 'external' keyword in the detailed mode
of tunnel links output:

 # ip -d link
 79: geneve1: <BROADCAST,MULTICAST> mtu 65465 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether da:e9:e4:2b:f9:d4 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65465
     geneve externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 80: vxlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 7a:a8:19:07:da:01 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     vxlan externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 84: gre1@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/none 00:00:00:00 brd 00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0
     gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 87: ip6gre1@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/gre6 :: brd :: promiscuity 0 minmtu 0 maxmtu 0
     ip6gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 88: ip6tnl1@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65407
     ip6tnl externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 90: ipip1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 0 maxmtu 0
     ipip externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 00ff4b8e31 ("ip/tunnel: Be consistent when printing tunnel collect metadata")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:26:31 -07:00
David Ahern 188c7fe6ea Update kernel headers
Update kernel headers to commit
    6b0a7f84ea1f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 14:07:48 -07:00
David Ahern 43de4ef694 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 13:59:44 -07:00
Eyal Birger aed63ae1ac ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies
The XFRMA_IF_ID attribute is set in policies/states for them to be
associated with an XFRM interface (4.19+).

Add support for setting / displaying this attribute.

Note that 0 is a valid value therefore set XFRMA_IF_ID if any value
was provided in command line.

Tested-by: Antony Antony <antony@phenome.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:26:43 -07:00
Ralf Baechle 8391023680 ip: display netrom link type
For a NETROM "ip link show dev nr0" will show

4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/generic 88:98:6a:a4:84:40:0a brd 00:00:00:00:00:00:00

But rather link/netrom is expected to be displayed.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:25:50 -07:00
Matt Ellison 286446c1e8 ip: support for xfrm interfaces
Interfaces take a 'if_id' which is an interface id which can be set on
an xfrm policy as its interface lookup key (XFRMA_IF_ID).

Signed-off-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:05:00 -07:00
Toke Høiland-Jørgensen d5d27f27d8 q_cake: Add support for setting the fwmark option
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:01:31 -07:00
Stephen Hemminger 41fc3fa04c uapi: update bpf.h
Updated bpf.h from 5.1-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:00:48 -07:00
David Ahern 806a553c81 Merge branch 'rdma-dynamic-link-create' into next
Steve Wise  says:

====================

This series adds rdmatool support for creating/deleting rdma links.
This will be used, mainly, by soft rdma drivers to allow adding/deleting
rdma links over netdev interfaces.  It provides the user side for
the following kernel changes merged in linux-5.1.

Changes since v2:

- move checks for required parameters in the parameter handlers
- move final 'link add' processing to link_add_netdev()
- added reviewed-by tags

Changes since v1:

- move error receive checking from rd_sendrecv_msg() to rd_recv_msg().
- Add rd->suppress_errors to allow control over whether errors when
  reading a response should be ignored.  Namely: resource queries can
  get errors like "none found" when querying for resources, and this
  error should not be displayed.  So on a rd object basis, error
  suppression can be controlled.
- Rebased on rdma/for-next UABI (no need to sync rdma_netlink.h now)
- use chains of struct rd_cmd and rd_exec_cmd vs open coding the parsing
  for the 'link add' command.
- minor nit resolution
- added .mailmap file.  If this is not desired for iproute2, then please
  drop the patch.

Changes since RFC:

- add rd_sendrecv_msg() and make use of it in dev_set as well
  as the new link commands.
- fixed problems with the man pages
- changed the command line to use "netdev" as the keyword
  for the network device, do avoid confused with the ib_device
  name.
- got rid of the "type" parameter for link delete.  Also pass
  down the device index instead of the name, using the common
  rd services for validating the device name and fetching the
  index.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:05:17 -07:00
Steve Wise 1d45bf724e rdma: man page update for link add/delete
Update the 'rdma link' man page with 'link add/delete' info.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:33 -07:00
Steve Wise 4336c5821a rdma: add 'link add/delete' commands
Add new 'link' subcommand 'add' and 'delete' to allow binding a soft-rdma
device to a netdev interface.

EG:

rdma link add rxe_eth0 type rxe netdev eth0
rdma link delete rxe_eth0

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:30 -07:00
Steve Wise 8f5cfd23cd rdma: add helper rd_sendrecv_msg()
This function sends the constructed netlink message and then
receives the response.

Change rd_recv_msg() to display any error messages.

Change 'rdma dev set' to use rd_sendrecv_msg().

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:25 -07:00
Steve Wise 65147bbe8f Add .mailmap file
.mailmap allows tracking multiple email addresses to the proper user name.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:00 -07:00
Leslie Monis 519ace17f9 tc: pie: update man page
Update man page to reflect the changes made in Linux.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Leslie Monis 492ec9558b tc: pie: change maximum integer value of tc_pie_xstats->prob
tc_pie_xstats->prob has a maximum value of (2^64 - 1).

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Stephen Hemminger 6754e1d978 ip: fix typo in iplink_vlan usage message
Need to use bar "|" rather than slash to indicate alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-27 07:56:07 -07:00
Hoang Le 35114a4cfe tipc: add link broadcast man page
Add a man page describing tipc link broadcast command get and set

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:21 -07:00
Hoang Le 5027f233e3 tipc: add link broadcast get
The command prints the actually method that multicast
is running in the system.
Also 'ratio' value for AUTOSELECT method.

A sample usage is shown below:
$tipc link get broadcast
BROADCAST

$tipc link get broadcast
AUTOSELECT ratio:30%

$tipc link get broadcast -j -p
[ {
        "method": "AUTOSELECT"
    },{
        "ratio": 30
    } ]

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:16 -07:00
Hoang Le 0ea46945bc tipc: add link broadcast set method and ratio
The command added here makes it possible to forcibly configure the
broadcast link to use either broadcast or replicast, in addition to
the already existing auto selection algorithm.

A sample usage is shown below:
$tipc link set broadcast BROADCAST
$tipc link set broadcast AUTOSELECT ratio 25

$tipc link set broadcast -h
Usage: tipc link set broadcast PROPERTY

PROPERTIES
 BROADCAST         - Forces all multicast traffic to be
                     transmitted via broadcast only,
                     irrespective of cluster size and number
                     of destinations

 REPLICAST         - Forces all multicast traffic to be
                     transmitted via replicast only,
                     irrespective of cluster size and number
                     of destinations

 AUTOSELECT        - Auto switching to broadcast or replicast
                     depending on cluster size and destination
                     node number

 ratio SIZE        - Set the AUTOSELECT criteria, percentage of
                     destination nodes vs cluster size

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:49 -07:00
David Ahern cdeb2674aa Update kernel headers
Update kernel headers to
    fa7e428c6b7e ("openvswitch: add seqadj extension when NAT is used.")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:05 -07:00
Stephen Hemminger f76ad635f2 man: break long lines in man page sources
No impact for output, just easier to edit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 10:05:31 -07:00
Tobias Jungel b5a754b1db ip: bridge: add mcast to unicast config flag
This adds configuration for the IFLA_BRPORT_MCAST_TO_UCAST flag that
allows multicast packets to be replicated as unicast packets.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 09:44:49 -07:00
David Ahern 3efdd43667 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-20 00:38:08 -07:00
Stephen Hemminger dd4a2b6833 uapi: bpf add set_ce
New api from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:55 -07:00
Stephen Hemminger 828132fdd1 uapi: in6.h add router alert isolate
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:28 -07:00
Stephen Hemminger a7cd7baded uapi: add CAKE FWMARK
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:36:56 -07:00
Stephen Hemminger 8e1554625d rdma: update uapi headers from 5.1-rc1
Update from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:34:32 -07:00
Stephen Hemminger 50cf634899 Merge branch 'master' of ../iproute2-next 2019-03-19 10:32:45 -07:00
Stephen Hemminger 93d0d92f61 v5.0.0 2019-03-19 10:06:19 -07:00
Matteo Croce a0a639d9c0 ip route: get: print JSON output when -j is given
The ip -j option to print output as JSON is ignored when using 'route get':

    $ ip -j route get 127.0.0.1
    local 127.0.0.1 dev lo src 127.0.0.1 uid 1000
        cache <local>

Enable JSON output in iproute_get(), and don't let print_cache_flags() close
the JSON output, as it's not always the last called JSON function.

Tested on different route types:

    $ ip -j -p route get 127.0.0.1
    [ {
            "type": "local",
            "dst": "127.0.0.1",
            "dev": "lo",
            "prefsrc": "127.0.0.1",
            "flags": [ ],
            "uid": 1000,
            "cache": [ "local" ]
        } ]

    $ ip -d -j -p route get 192.0.2.1
    [ {
            "type": "unicast",
            "dst": "192.0.2.1",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "table": "main",
            "prefsrc": "192.168.85.2",
            "flags": [ ],
            "uid": 1000,
            "cache": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:50:01 -07:00
Matteo Croce 0736617738 ip route: print route type in JSON output
ip route generates an invalid JSON if the route type has to be printed,
eg. when detailed mode is active, or the type is different that unicast:

    $ ip -d -j -p route show
    [ {"unicast",
            "dst": "192.168.122.0/24",
            "dev": "virbr0",
            "protocol": "kernel",
            "scope": "link",
            "prefsrc": "192.168.122.1",
            "flags": [ "linkdown" ]
        } ]

    $ ip -j -p route show
    [ {"unreachable",
            "dst": "192.168.23.0/24",
            "flags": [ ]
        },{"prohibit",
            "dst": "192.168.24.0/24",
            "flags": [ ]
        },{"blackhole",
            "dst": "192.168.25.0/24",
            "flags": [ ]
        } ]

Fix it by printing the route type as the "type" attribute:

    $ ip -d -j -p route show
    [ {
            "type": "unicast",
            "dst": "default",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "protocol": "dhcp",
            "scope": "global",
            "metric": 600,
            "flags": [ ]
        },{
            "type": "unreachable",
            "dst": "192.168.23.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "prohibit",
            "dst": "192.168.24.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "blackhole",
            "dst": "192.168.25.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:36 -07:00
Kevin 'ldir' Darbyshire-Bryant ef1e02e6ac tc: m_connmark: fix action error messages
action m_connmark returns error messages identifying itself as the
'simple' action instead of 'connmark' action. e.g.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
simple: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

In what is most likely a copy/paste error from the simple action example
code, fix connmark error messages to identify themselves as coming from
connmark.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
connmark: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

While we're here also fixup the 'Illegal "Zone"' error code to say
'Illegal "zone"' instead of 'Illegal "index"'

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:07 -07:00
David Ahern 7081aa6556 Merge branch 'bond-bridge-xstats-json' into next
Nikolay Aleksandrov  says:

====================

This set adds json output support to the xstats API (patch 01) and then
adds json support to the bridge xstats output (patch 02) and adds xstats
output support (both plain text and json) for the bonding (patch 03).
It doesn't change the bridge's plain text output, but it fixes an
inconsistency that could happen if new bridge xstats attributes were
added (print the interface name once for each group of xstats attrs).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 14:01:35 -07:00
Nikolay Aleksandrov 440c5075d6 ip: bond: add xstats support
Add bond and bond_slave xstats support with optional json output.
Example:
- Plain text:
$ ip link xstats type bond 802.3ad
 bond0
                    LACPDU Rx 2017
                    LACPDU Tx 2038
                    LACPDU Unknown type Rx 0
                    LACPDU Illegal Rx 0
                    Marker Rx 0
                    Marker Tx 0
                    Marker response Rx 0
                    Marker response Tx 0
                    Marker unknown type Rx 0

- JSON:
$ ip -j -p link xstats type bond 802.3ad
  [ {
        "ifname": "bond0",
        "802.3ad": {
            "lacpdu_rx": 219,
            "lacpdu_tx": 241,
            "lacpdu_unknown_rx": 0,
            "lacpdu_illegal_rx": 0,
            "marker_rx": 0,
            "marker_tx": 0,
            "marker_response_rx": 0,
            "marker_response_tx": 0,
            "marker_unknown_rx": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:16 -07:00
Nikolay Aleksandrov a9bc23a792 ip: bridge: add xstats json support
Add json support for bridge's xstats output.
The plain text output format should remain the same.
Note that this patch pulls the interface out of the attribute
loop, this was an oversight when the set was upstreamed. This does not
change the output format, but fixes it when new xstats attributes show
up.

Example:
$ ip -p -j link xstats type bridge
  [ {
        "ifname": "br0",
        "multicast": {
            "igmp_queries": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_reports": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_leaves": {
                "rx": 0,
                "tx": 0
            },
            "igmp_parse_errors": 0,
            "mld_queries": {
                "rx_v1": 33,
                "rx_v2": 0,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_reports": {
                "rx_v1": 66,
                "rx_v2": 2,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_leaves": {
                "rx": 0,
                "tx": 0
            },
            "mld_parse_errors": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:09 -07:00
Nikolay Aleksandrov 8ff3d1d3a3 ip: xstats: add json output support
This adds only initial object support if json argument is specified.
Later patches convert the current xstats users to json.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:55:57 -07:00
Stephen Hemminger f36f8fe535 ipaddress: print error message on stderr
Convention is to print error messages only on stderr.
Helps when scripting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-15 08:30:26 -07:00
Thomas Haller 546109a7cf iprule: fix printing hint about unresolved iifname and oifname
was displayed as

    10:     from all iif eth1 [detached] goto 10000unresolved proto mrt

now:

    10:     from all iif eth1 [detached] goto 10000 [unresolved] proto mrt

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-07 16:14:09 -08:00
David Ahern be029b3a58 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:55:05 -08:00
Roopa Prabhu c5b176e5ba bridge: fdb: add support for src_vni option
We already print src_vni for a fdb entry when present.
This patch adds the ability to set src_vni on a fdb
entry. When not specified, kernel will use vni specified
on the vxlan device. This can be used on a vxlan fdb entry
when the vxlan device is in external or collect metadata
mode.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:52:34 -08:00
Petr Vorel 8b5f9338e5 man: Document COLORFGBG environment variable
Default colors are not contrast enough on dark backround
and this functionality, which uses more suitable colors
is hidden in the code.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
2019-03-01 11:09:18 -08:00
Dmytro Linkin 2f103545a5 tc/pedit: Fix wrong pedit ipv6 structure id
Tc pedit action with more than two ip6 munge in a row cause infinite
loop.

Example:

$ tc filter add dev eth0 protocol ipv6 parent ffff: \
flower ip_proto sctp \
    action pedit ex \
        munge ip6 hoplimit set 0x1 \
        munge ip6 src set 2001:0db8:0:f101::1 \
        munge that cause infinite loop

The example command never returns, instead of failing with parse error
as expected. Pedit ipv6 structure has wrong id, which leads to the
creation linked list with one node in tc/m_pedit.c:get_pedit_kind(),
referring to itself. This node is created if command have two ip6 munge
in a row, and any third ip6 munge will cause infinite loop.
Changing this id from "ipv6" to "ip6" solves the problem.

Fixes: f3e1b2448a ("pedit: Introduce ipv6 support")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-01 11:05:00 -08:00
David Ahern 83a8b8d9f1 Merge branch 'devlink-health' into next
Aya Levin  says:

====================

This series adds support for devlink health commands:
 devlink health show     [ DEV reporter REPORTER_NAME ]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME { grace_period | auto_recover } { msec | boolean }

The first patch refactors the validation of input parameters, which
grow way too long. Second and third patches fix bugs that were
discovered during the devlink health development. The forth patch adds
helper functions which enable output of value and labels separately.
Patches 5-10 add the devlink health functionality by command, the last
is the man page.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 08:00:19 -08:00
Aya Levin 3147e0d372 devlink: Add devlink-health man page
Add a man page describing devlink health's command set. Also add a
reference link from devlink main man page.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:21 -08:00
Aya Levin b18d89195b devlink: Add devlink health set command
Add devlink set command which enables the user to configure parameters
related to the devlink health mechanism per reporter.
1) grace_period [msec] time interval between auto recoveries.
2) auto_recover [true/false] whether the devlink should execute automatic
recover on error.
Add a helper function to retrieve a boolean value as an input parameter.
Example:
$ devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$ devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:16 -08:00
Aya Levin 04b583d0ff devlink: Add devlink health dump clear command
Add devlink dump clear command which deletes the last saved dump file.
Clearing the last saved dump enables a new dump file to be saved.
Example:
$ devlink health dump clear pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:10 -08:00
Aya Levin 041e6e651a devlink: Add devlink health dump show command
Add devlink dump show command which displays the last saved dump.
Devlink health saves a single dump. If a dump is not already stored
by the devlink for this reporter, devlink generates a new dump. The dump
can be generated automatically when a reporter reports on an
error or manually by user's request.
The dump's output is defined by the reporter. The command uses the
infra structure for flexible format output introduced in previous patch.
Example:
$ devlink health dump show pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:04 -08:00
Aya Levin 7b8baf834d devlink: Add devlink health diagnose command
Add devlink health diagnose command: enabling retrieval of diagnostics data
by the user on a reporter on a device. The command's output is a
free text defined by the reporter.

This patch also introduces an infra structure for flexible format
output. This allow the command to display different data fields
according to the reporter.
Example:
$ devlink health diagnose pci/0000:00:0a.0 reporter tx
SQs:
  sqn: 4403 HW state: 1 stopped: false
  sqn: 4408 HW state: 1 stopped: false
  sqn: 4413 HW state: 1 stopped: false
  sqn: 4418 HW state: 1 stopped: false
  sqn: 4423 HW state: 1 stopped: false

$ devlink health diagnose pci/0000:00:0a.0 reporter tx -jp
{
 "SQs":[
      {
       "sqn":4403,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4408,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4413,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4418,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4423,
       "HW state":1,
       "stopped":false
     }
   ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:57 -08:00
Aya Levin 9083a1344d devlink: Add devlink health recover command
Add devlink health recover command which enables the user to initiate a
recovery on a reporter (if a recovery cb was supplied by the reporter).
This operation will increment the recoveries counter displayed in the
show command.
Example:
$ devlink health recover pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:50 -08:00
Aya Levin 2f1242efe9 devlink: Add devlink health show command
Add devlink health show command which displays status and configuration
info on a specific reporter on a device or dump the info on all
reporters on all devices. Add helper functions to display status and
dump's time stamp.
Example:
$ devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
 name tx
  state healthy error 0 recover 1 last_dump_date 2019-02-14 last_dump_time 10:10:10 grace_period 600 auto_recover true
$ devlink health show pci/0000:00:09.0 reporter tx -jp
{
 "health":{
  "pci/0000:00:0a.0":[
     {
     "name":"tx",
     "state":"healthy",
     "error":0,
     "recover":1,
     "last_dump_date":"2019-Feb-14",
     "last_dump_time":"10:10:10",
     "grace_period":600,
     "auto_recover":true
    }
  ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:41 -08:00
Aya Levin 844a61764c devlink: Add helper functions for name and value separately
Add a new helper functions which outputs only values (without name
label) for different types: boolean, uint, uint64, string and binary.
In addition add a helper function which prints only the name label.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:35 -08:00
Aya Levin 8257e6c49c devlink: Fix boolean JSON print
This patch removes the inverted commas from boolean values in JSON
format: true/false instead of "true"/"false".

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:30 -08:00
Aya Levin 86648a1960 devlink: Fix print of uint64_t
This patch prints uint64_t with its corresponding format and avoid implicit
cast to uint32_t.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:54:32 -08:00
Aya Levin ae72e65518 devlink: Refactor validation of finding required arguments
Introducing argument's metadata structure matching a bitmap flag per
required argument and an error message if missing. Using this static
array to refactor validation of finding required arguments in devlink
command line and to ease further maintenance.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:53:36 -08:00
Leon Romanovsky 359f69f76f rdma: Add the prefix for driver attributes
There is a need to distinguish between driver vs. general exposed
attributes. The most common use case is to expose some internal
garbage under extremely common and sexy name, e.g. pi, ci e.t.c

In order to achieve that, we will add "drv_" prefix to all strings
which were received through RDMA_NLDEV_ATTR_DRIVER_* attributes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>a
Tested-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:47 -08:00
Jakub Kicinski d326d79c22 devlink: add support for updating device flash
Add new command for updating flash of devices via devlink API.
Example:

$ cp flash-boot.bin /lib/firmware/
$ devlink dev flash pci/0000:05:00.0 file flash-boot.bin

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:14 -08:00
David Ahern 41fda879a1 Update kernel headers
Update kernel headers to commit:
    ff8285f81822 ("net: sched: pie: fix 64-bit division")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:23:22 -08:00
David Ahern 65a94784fb Merge branch 'rdma-object-ids' into next
Leon Romanovsky says:

====================

This series adds ability to present and query all known to rdmatool
object by their respective, unique IDs (e.g. pdn. mrn, cqn e.t.c).
All objects which have "parent" object has this information too.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:13:21 -08:00
Leon Romanovsky 78728b7ee0 rdma: Provide and reuse filter functions
Globally replace all filter function in safer variants of those
is_filtered functions, which take into account the availability/lack
of netlink attributes.

Such conversion allowed to fix a number of places in the code, where
the previous implementation didn't honor filter requests if netlink
attribute wasn't present.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:24 -08:00
Leon Romanovsky 5a823593d6 rdma: Perform single .doit call to query specific objects
If user provides specific index, we can speedup query
by using .doit callback and save full dump and filtering
after that.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:15 -08:00
Leon Romanovsky 127ff95610 rdma: Unify netlink attribute checks prior to prints
Place check if netlink attribute available in general place,
instead of doing the same check in many paces.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:09 -08:00
Leon Romanovsky 6da9d2517c rdma: Move QP code to separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:05 -08:00
Leon Romanovsky f9a73796d1 rdma: Place PD parsing print routine into separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:00 -08:00
Leon Romanovsky 46695227d6 rdma: Move MR code to be suitable for per-line parsing
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:46 -08:00
Leon Romanovsky 83ea72289e rdma: Refactor CQ prints
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:40 -08:00
Leon Romanovsky 7d06b31f0e rdma: Simplify CM_ID print code
Refactor our the CM_ID print code.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:33 -08:00
Leon Romanovsky 05846c9cd3 rdma: Simplify code to reuse existing functions
Remove duplicated functions in favour general res_print_uint() call.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:28 -08:00
Leon Romanovsky 835d83216b rdma: Properly mark RDMAtool license
RDMA subsystem is dual-licensed with "GPL-2.0 OR Linux-OpenIB" proper
license and Mellanox submission are supposed to have this type of license.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:19 -08:00
Leon Romanovsky 687daf98f9 rdma: Move resource QP logic to separate file
Logically separate resource QP logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:56 -08:00
Leon Romanovsky 438fac3a25 rdma: Move out resource CM-ID logic to separate file
Logically separate resource CM-ID logic to separate file,
in order to make CM-ID specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:48 -08:00
Leon Romanovsky fcdd2e0c68 rdma: Move out resource CQ logic to separate file
Logically separate resource CQ logic to separate file,
in order to make CQ specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:40 -08:00
Leon Romanovsky 42ed283e4a rdma: Refactor out resource MR logic to separate file
Logically separate resource MR logic to separate file,
in order to make MR specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:25 -08:00
Leon Romanovsky cc6131276c rdma: Move resource PD logic to separate file
Logically separate resource PD logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:12 -08:00
Leon Romanovsky 96f59e7fdc rdma: Provide parent context index for all objects except CM_ID
Allow users to correlate allocated object with relevant parent

[leonro@server ~]$ rdma res show pd
dev mlx5_0 users 5 pid 0 comm [ib_core] pdn 1
dev mlx5_0 users 7 pid 0 comm [ib_ipoib] pdn 2
dev mlx5_0 users 0 pid 0 comm [mlx5_ib] pdn 3
dev mlx5_0 users 2 pid 548 comm ibv_rc_pingpong ctxn 0 pdn 4

[leonro@server ~]$ rdma res show cq cqn 0-100
dev mlx5_0 cqe 2047 users 6 poll-ctx UNBOUND_WORKQUEUE pid 0 comm [ib_core] cqn 2
dev mlx5_0 cqe 255 users 2 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 3
dev mlx5_0 cqe 511 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 4
dev mlx5_0 cqe 255 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 5
dev mlx5_0 cqe 255 users 0 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 6
dev mlx5_0 cqe 511 users 2 pid 548 comm ibv_rc_pingpong cqn 7 ctxn 0

[leonro@server ~]$ rdma res show mr
dev mlx5_0 mrlen 4096 pid 548 comm ibv_rc_pingpong mrn 4 pdn 0

[leonro@nps-server-14-015 ~]$ /images/leonro/src/iproute2/rdma/rdma res show qp
link mlx5_0/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 8 type UD state RTS sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_0/1 lqpn 9 pdn 4 rqpn 0 type RC state INIT rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 548 comm ibv_rc_pingpong

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:05:43 -08:00
Leon Romanovsky 1dc035865d rdma: Provide unique indexes for all visible objects
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:04:02 -08:00
Leon Romanovsky beac6a3990 rdma: Remove duplicated print code
There is no need to keep same print functions for
uint32_t and uint64_t, unify them into one function.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:58 -08:00
Leon Romanovsky a985cc06bd rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
f2a0e45f36b0 RDMA/nldev: Don't expose number of not-visible entries

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:14 -08:00
David Ahern 55870dfe7f Improve batch and dump times by caching link lookups
ip route uses ll_name_to_index and ll_index_to_name to convert between
device names and indices. At the moment both use for the ioctl based glibc
functions if_nametoindex and if_indextoname and does not cache the result.
When using a batch file or dumping large number of routes this means the
same device lookups can be done repeatedly adding unnecessary overhead
(socket + ioctl + close for each device lookup).

Add a new function, ll_link_get, to send a netlink based RTM_GETLINK. If
successful, cache the result in idx_head and name_head so future lookups
can re-use the entry. Update ll_name_to_index and ll_index_to_name to use
ll_link_get and only fallback to the glibc functions if it fails.

With this change the time to install 720,022 routes with 2 ecmp nexthops
where the nexthop device is given is reduced from 31.4 seconds to 19.2
seconds. A dump of those routes drops from 13.3 to 2.8 seconds.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:20 -08:00
David Ahern db1aafd883 ip link: Drop cache entry on any changes
Remove any entry from the link cache when the link is modified.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:18 -08:00
David Ahern 25c6339b22 ll_map: Add function to remove link cache entry by index
Add ll_drop_by_index to remove an entry from the link cache.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:15 -08:00
David Ahern 9f78e995a8 Merge branch 'iproute2-master' into next
Conflicts:
	misc/ss.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:50:39 -08:00
Stefano Brivio aa5bd6a252 ss: Render buffer to output every time a number of chunks are allocated
Eric reported that, with 10 million sockets, ss -emoi (about 1000 bytes
output per socket) can easily lead to OOM (buffer would grow to 10GB of
memory).

Limit the maximum size of the buffer to five chunks, 1M each. Render and
flush buffers whenever we reach that.

This might make the resulting blocks slightly unaligned between them, with
occasional loss of readability on lines occurring every 5k to 50k sockets
approximately. Something like (from ss -tu):

[...]
CLOSE-WAIT   32       0           192.168.1.50:35232           10.0.0.1:https
ESTAB        0        0           192.168.1.50:53820           10.0.0.1:https
ESTAB       0        0           192.168.1.50:46924            10.0.0.1:https
CLOSE-WAIT  32       0           192.168.1.50:35228            10.0.0.1:https
[...]

However, I don't actually expect any human user to scroll through that
amount of sockets, so readability should be preserved when it matters.

The bulk of the diffstat comes from moving field_next() around, as we now
call render() from it. Functionally, this is implemented by six lines of
code, most of them in field_next().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:45:45 -08:00
Thomas De Schampheleire 9700927a00 ss: fix compilation under glibc < 2.18
Commit c759116a0b introduced support for
AF_VSOCK. This define is only provided since glibc version 2.18, so
compilation fails when using older toolchains.

Provide the necessary definitions if needed.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:40:52 -08:00
Stephen Hemminger 6f618a6a82 uapi: update inet_diag_info.h
Upstream changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:24:07 -08:00
Vivien Didelot 02723cf230 bridge: make mcast_flood description consistent
This patch simply changes the description of the mcast_flood flag
with "flood" instead of "be flooded with" to avoid confusion, and be
consistent with the description of the flooding flag, which "Controls
whether a given port will *flood* unicast traffic for which there is
no FDB entry."

At the same time, fix the documentation for the "flood" flag which
is incorrectly described as "flooding on" or "flooding off".

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:22:05 -08:00
Jiri Pirko 0e7e181945 devlink: relax dpipe table show dependency on resources
Dpipe table show command has a depencency on getting resources.
If resource get command is not supported by the driver, dpipe table
show fails. However, resource is only additional information
in dpipe table show output. So relax the dependency and let
the dpipe tables be shown even if resources get command fails.

Fixes: ead180274c ("devlink: Add support for resource/dpipe relation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:21:19 -08:00
Phil Sutter d7cf2416fc ip-address: Use correct max attribute value in print_vf_stats64()
IFLA_VF_MAX is larger than the highest valid index in vf array.

Fixes: a1b99717c7 ("Add displaying VF traffic statistics")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:16:08 -08:00
Thomas Haller f5f8e96953 ip-rule: fix json key "to_tbl" for unspecific rule action
The key should not be called "to_tbl" because it is exactly
not a FR_ACT_TO_TBL action. Change it to "action".

    # ip rule add blackhole
    # ip -j rule | python -m json.tool
    ...
    {
        "priority": 0,
        "src": "all",
        "to_tbl": "blackhole"
    },

This is an API break of JSON output as it was added in v4.17.0.
Still change it as the API is relatively new and unstable.

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:21:06 -08:00
Luca Boccassi c2f9dc14c4 ip route: get: allow zero-length subnet mask
A /0 subnet mask is theoretically valid, but ip route get doesn't allow
it:

$ ip route get 1.0.0.0/0
need at least a destination address

Change the check and remember whether we found an address or not, since
according to the documentation it's a mandatory parameter.

$ ip/ip route get 1.0.0.0/0
1.0.0.0 via 192.168.1.1 dev eth0 src 192.168.1.91 uid 1000
    cache

Reported-by: Clément Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:19:31 -08:00
Matteo Croce 619765fe14 iplink: document XDP subcommand to force the XDP mode.
When attaching an eBPF program to a device, ip link can force the XDP mode
by using the xdp{generic,drv,offload} keyword instead of just 'xdp'.
Document this behaviour also in the help output.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Fixes: 14683814 ("bpf: add xdpdrv for requesting XDP driver mode")
Fixes: 1b5e8094 ("bpf: allow requesting XDP HW offload")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 14:02:44 -08:00
Konstantin Khlebnikov 0f3f0ca3a2 ss: add option --tos for requesting ipv4 tos and ipv6 tclass
Also show socket class_id/priority used by classful qdisc.
Kernel report this together with tclass since commit
("inet_diag: fix reporting cgroup classid and fallback to priority")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:59:11 -08:00
Eric Dumazet bb5ae621d0 lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.

Commit 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.

See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.

Fixes: 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:51:44 -08:00
Davide Caratti ca81444303 use print_{,h}hu instead of print_uint when format specifier is %{,h}hu
in this way, a useless cast to unsigned int is avoided in bpf_print_ops()
and print_tunnel().

Tested with:
 # ./tdc.py -c bpf

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-10 19:00:59 -08:00
Marcos Antonio Moraes 9e46c5c206 tc: use bits not mbits/sec in rate percent
As /sys/class/net/<iface>/speed indicates a value in Mbits/sec, the
conversion is necessary to create the correct limits.

This guarantees the same result for the following commands in an
1000Mbit/sec device:

tc class add ... htb rate 500Mbit
tc class add ... htb rate 50%

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Marcos Antonio Moraes <marcos.antonio@digirati.com.br>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-08 09:59:45 -08:00
Stephen Hemminger 817204d0b0 tc: avoid problems with hard coded rate string length
The parse_percent_rate function assumed the buffer was 20 characters.
Better to pass length in case the size ever changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:49:47 -08:00
Stephen Hemminger 2d603d55a8 tc: fix memory leak in error path
If value passed to parse_percent was not valid, it would
leak the dynamic allocation from sscanf.

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:41:58 -08:00
Jakub Kicinski 05bc89e95e devlink: add info subcommand
Add support for reading the device serial number, driver name
and various versions.  Example:

$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
  driver nfp
  serial_number 16240145
  versions:
      fixed:
        board.id AMDA0081-0001
        board.rev 15
        board.vendor SMA
        board.model hydrogen
      running:
        fw.mgmt 010181.010181.0101d4
        fw.cpld 0x1030000
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536
      stored:
        fw.mgmt 010181.010181.0101d4
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536

$ devlink -jp dev info pci/0000:82:00.0
{
    "info": {
        "pci/0000:82:00.0": {
            "driver": "nfp",
            "serial_number": "16240145",
            "versions": {
                "fixed": {
                    "board.id": "AMDA0081-0001",
                    "board.rev": "15",
                    "board.vendor": "SMA",
                    "board.model": "hydrogen"
                },
                "running": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.cpld": "0x1030000",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                },
                "stored": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                }
            }
        }
    }
}

v5:
 - remove spurious new line.
v4:
 - more commit message improvements.
v3:
 - show up-to-date output in the commit message.
v2 (Jiri):
 - remove filtering;
 - add example in the commit message.
RFCv2:
 - make info subcommand of dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:50 -08:00
Jakub Kicinski 9d886c6609 devlink: report cell size
Print the value of DEVLINK_ATTR_SB_POOL_CELL_SIZE, if reported.

Example:
pci/0000:82:00.0:
  sb 1 pool 0 type egress size 40945664 thtype static cell_size 2048
  sb 2 pool 0 type egress size 258867200 thtype static cell_size 10240
...

v3: - don't double space.
v2: - fix spelling.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:22 -08:00
David Ahern 6b2d60bdfc Update kernel headers
Update kernel headers to commit:
bfbae2eafe05 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:45:41 -08:00
Yonghong Song 3da6d055d9 bpf: add btf func and func_proto kind support
The issue is discovered for bpf selftest test_skb_cgroup.sh.
Currently we have,
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  Object has unknown BTF type: 13!
  [PASS]

In the above the BTF type 13 refers to BTF kind
BTF_KIND_FUNC_PROTO.
This patch added support of BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC during type parsing.
With this patch, I got
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  [PASS]

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:29:20 -08:00
Ido Schimmel 264be1d887 bridge: fdb: Fix FDB dump with strict checking disabled
While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.

Fix this by using RTM_GETNEIGH instead.

Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002

After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent

Fixes: 05880354c2 ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:27:28 -08:00
Chris Mi 17ed56fdf3 libnetlink: linkdump_req: AF_PACKET family also expects ext_filter_mask
Without this fix, the VF info can't be showed using command
"ip link".

146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
    vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")

Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:25:43 -08:00
Davide Caratti e8a3d76919 tc: add 'kind' property to 'csum' action
unlike other TC actions already supporting JSON printout, 'csum' does not
print the value of TCA_KIND in the 'kind' property: remove 'csum' word
from 'csum' property, and add a separate 'kind' property containing the
action name. The human-readable printout is preserved.

Tested with:
 # ./tdc.py -c csum

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:38 -08:00
Davide Caratti 52d57f6bbd tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.

Example using eBPF:

 # tc actions flush action bpf
 # tc action add action bpf object bpf/action.o section 'action-ok'
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bpf_name": "action.o:[action-ok]",
         "prog": {
           "id": 33,
           "tag": "a04f5eef06a7f555",
           "jited": 1
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 1,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Example using cBPF:

 # tc actions flush action bpf
 # a=$(mktemp)
 # tcpdump -ddd not ether proto 0x888e >$a
 # tc action add action bpf bytecode-file $a index 42
 # rm $a
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bytecode": {
           "length": 4,
           "insns": [
             {
               "code": 40,
               "jt": 0,
               "jf": 0,
               "k": 12
             },
             {
               "code": 21,
               "jt": 0,
               "jf": 1,
               "k": 34958
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 0
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 262144
             }
           ]
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 42,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Tested with:
 # ./tdc.py -c bpf

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:10 -08:00
Björn Töpel 2abc3d76e3 ss: add AF_XDP support
AF_XDP is an address family that is optimized for high performance
packet processing.

This patch adds AF_XDP support to ss(8) so that sockets can be queried
and monitored.

Example:
$ sudo ss --xdp -e -p -m
Recv-Q      Send-Q           Local Address:Port             Peer Address:Port

0           0                   enp134s0f0:q20                          *
 users:(("xdpsock",pid=17787,fd=3)) ino:39424 sk:4
        rx(entries:2048)
        tx(entries:2048)
        umem(id:1,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:7,
qid:20,zc:0,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)
0           0                    enp24s0f0:q0                           *
 users:(("xdpsock",pid=17780,fd=3)) ino:37384 sk:5
        rx(entries:2048)
        tx(entries:2048)
        umem(id:0,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:6,
qid:0,zc:1,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-30 20:57:45 -08:00
David Ahern f79b7733b4 Update kernel headers and add xdp_diag.h
Update kernel headers to commit:
c829f5f52db9 ("cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()")

and import xdp_diag.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:34:40 -08:00
Matteo Croce e3dbcb2a12 netns: add subcommand to attach an existing network namespace
ip tracks namespaces with dummy files in /var/run/netns/, but can't see
namespaces created with other tools.
Creating the dummy file and bind mounting the correct procfs entry will
make ip aware of that namespace.
Add an ip netns subcommand to automate this task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:18:03 -08:00
Stephen Hemminger 6f1940da8e tc: replace left side comparison
The kernel (and iproute2) don't use the if (NULL == x) style
and instead prefer if (!x)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-28 08:51:03 -08:00
Hans Dedecker 2874714662 f_flower: fix build with musl libc
XATTR_SIZE_MAX requires the usage of linux/limits.h; let's include it

Signed-off-by: Hans Dedecker <dedeckeh@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:20:03 +13:00
wenxu 3d65cefbef iproute: Set ip/ip6 lwtunnel flags
ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

User can set flags with key, csum, seq
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 key csum dev gretap

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:17:27 +13:00
David Ahern b45664e064 Merge 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-22 08:30:38 -08:00
Jakub Kicinski 8513f4a926 ip route: get: only set RTM_F_LOOKUP_TABLE flag for IPv4
Kernel ignores the RTM_F_LOOKUP_TABLE flag for all families
but IPv4.  Don't set it, otherwise it may fall foul of
strict checking policies.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:13 +13:00
Adi Nissim dc0332b1e8 tc: m_tunnel_key: Allow key-less tunnels
Change the id parameter of the tunnel_key set action from mandatory to
optional.

Some tunneling protocols (e.g. GRE) specify the id as an optional field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:07 +13:00
Stephen Hemminger 3bc2dc7668 uapi: in.h change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:03:31 +13:00
Benedict Wong a6af9f2e61 xfrm: add option to hide keys in state output
ip xfrm state show currently dumps keys unconditionally. This limits its
use in logging, as security information can be leaked.

This patch adds a nokeys option to ip xfrm ( state show | monitor ), which
prevents the printing of keys. This allows ip xfrm state show to be used
in logging without exposing keys.

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:31:20 -08:00
Cong Wang b0ca46a1f8 tc: add hit counter for matchall
Cc: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:30:07 -08:00
David Ahern dad02ef478 Update kernel headers
Update kernel headers to commit
28f9d1a3d4fe ("Merge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:29:26 -08:00
Leon Romanovsky b058f969df rdma: Add unbound workqueue to list of poll context types
Kernel commit f794809a7259 ("IB/core: Add an unbound WQ type to the new CQ API")
added new CQ poll context type, reflect this change in rdmatool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:19:44 -08:00
Leon Romanovsky 54a0ade6d4 clang-format: add configuration file
The codebase of iproute2 follows Linux kernel coding style,
so it will be very helpful to reuse existing clang configuration
file to reliably format code.

For more information see kernel commit d4ef8d3ff005
("clang-format: add configuration file").

Updated upto commit v5.0-rc1 with small number of ForEachMacros.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-17 13:38:23 -08:00
Luca Boccassi 0cf061183e Makefile: check manpages for syntax errors
Pass the same parameters Lintian uses in Debian.

$ make check
<...>
Checking manpages for syntax errors...
<standard input>:48: warning: macro `Q' not defined
Error in tc-taprio.8
Makefile:27: recipe for target 'check' failed

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi 8242808ced man: tc-taprio.8: fix syntax error
.Q does not exist so groff complains and the "queues" word is actually
not displayed.

Fixes: 579acb4bc5 ("taprio: Add manpage for tc-taprio(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi cffeeb3946 man: ss.8: more line breaks
groff stiff complains about unbreakable lines:
  96: warning [p 2, 3.0i]: can't break line

Indent it some more.

Fixes: 7f5047524c ("man: ss.8: break and indent long line")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
David Ahern 3d14706e54 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-07 16:30:13 -08:00
Dmitry V. Levin db4ad742e1 configure: fix typo in check_xt_old_internal_h
Fixes: 377a09902a ("configure: Minor code cleanup")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 14:42:01 -08:00
Stephen Hemminger 80e5ddec14 rdma: update uapi headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 11:41:39 -08:00
Stephen Hemminger e1ccc46bdd uapi: update headers from 4.21-rc1
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2019-01-07 11:39:26 -08:00
Stephen Hemminger 724ec5aeb0 Merge ../iproute2-next 2019-01-07 11:36:41 -08:00
David Ahern 97b44d571d libnetlink: linkdump_req is done for AF_BRIDGE as well
The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-07 08:36:58 -08:00
David Ahern dfa2c3787f Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iprule.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:22:47 -08:00
David Ahern 6267d9533b Merge branch 'strict-updates' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:19:37 -08:00
David Ahern 05880354c2 bridge: fdb: Fix filtering with strict checking disabled
Older kernels expect an ifinfomsg struct as the ancillary header, and
after kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header") can handle either ifinfomsg or ndmsg. Strict data checking only
allows ndmsg.

Use the new RTNL_HANDLE_F_STRICT_CHK flag to know which header to send.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-04 12:17:19 -08:00
David Ahern 285033bfeb libnetlink: Add RTNL_HANDLE_F_STRICT_CHK flag
Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.

Extracted from patch from Ido to fix filtering with strict checking
enabled.

Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:17 -08:00
David Ahern 66b4199f22 bridge: Update fdb show to use rtnl_neighdump_req
Add fdb_dump_filter to set filter attributes in dump request
and convert fdb_show to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:15 -08:00
David Ahern 101ec10a76 ip neigh: Convert do_show_or_flush to use rtnl_neighdump_req
Add ipneigh_dump_filter to add filter attributes to the neighbor
dump request and update do_show_or_flush to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:13 -08:00
David Ahern f255ab1225 libnetlink: Add filter function to rtnl_neighdump_req
Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:11 -08:00
Ido Schimmel 66e8e73edc bridge: fdb: Use 'struct ndmsg' for FDB dumping
Since commit aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on
socket") iproute2 uses strict checking on kernels that support it. This
causes FDB dumping to fail [1], as iproute2 uses 'struct ifinfomsg'
whereas the kernel expects 'struct ndmsg'.

Note that with this change iproute2 continues to work on old kernels
that do not support strict checking, but contain the fix introduced in
kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header").

[1]
# bridge fdb show
[ 5365.137224] netlink: 4 bytes leftover after parsing attributes in process `bridge'.
Error: bytes leftover after parsing attributes.
Dump terminated

Fixes: aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-30 16:56:34 -08:00
Michael Guralnik 40fc8c2cec rdma: Add print of link CapabilityMask2 flags
CapabilityMask2 is defined in IBTA spec as a member of PortInfo.
Add translation to string of new CapabilityMask2 expansion of link caps.

The flags are concatenated to current caps print as seen in this example
printing EXT_INFO flag:

root@server-22 $ rdma -d link
1/1: mlx5_0/1: subnet_prefix fe80:0000:0000:0000 lid 2 sm_lid 2 lmc 0
	state ACTIVE physical_state LINK_UP
caps: <SM, TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS,
	CAP_MASK2, CM, DEVICE_MGMT, VENDOR_CLASS, CAP_MASK_NOTICE,
	CLIENT_REG, OTHER_LOCAL_CHANGES, MULT_PKER_TRAP, EXT_INFO>

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:41:21 -08:00
David Ahern 0c187b7f24 Merge branch 'strict-dumps' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:44 -08:00
David Ahern 6b83edc061 neighbor: Add support for protocol attribute
Add support to set protocol on neigh entries and to print the protocol
on dumps.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:12 -08:00
David Ahern 8d4f35de17 ip route: Rename do_ipv6 to dump_family
do_ipv6 is really the preferred dump family. Rename it to make
that apparent.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:51 -08:00
David Ahern aea41afcfd ip bridge: Set NETLINK_GET_STRICT_CHK on socket
iproute2 has been updated for the new strict policy in the kernel. Add a
helper to call setsockopt to enable the feature. Add a call to ip.c and
bridge.c

The setsockopt fails on older kernels and the error can be safely ignored
- any new fields or attributes are ignored by the older kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:29 -08:00
David Ahern 8847097850 ip address: Set device index in dump request
Add a filter function to rtnl_addrdump_req to set device index in the
address dump request if the user is filtering addresses by device. In
addition, add a new ipaddr_link_get to do a single RTM_GETLINK request
instead of a device dump yet still store the data in the linfo list.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:49 -08:00
David Ahern 7ca9cee8d8 ip address: Split ip_linkaddr_list into link and addr functions
Split ip_linkaddr_list into one function that generates a list of devices
and a second that generates the list of addresses.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:14 -08:00
David Ahern e41ede8939 mroute: Add table id attribute for kernel side filtering
Similar to 'ip route' add the table id to the dump request for
kernel side filtering if it is supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:50 -08:00
David Ahern 98ce99273f mroute: fix up family handling
Only ipv4 and ipv6 have multicast routing. Set family
accordingly and just return for other cases.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:28 -08:00
David Ahern c7e6371bc4 ip route: Add protocol, table id and device to dump request
Add protocol, table id and device to dump request if set in filter. If
kernel side filtering is supported it is used to reduce the amount of
data sent to userspace.

Older kernels do not parse attributes on a route dump request, so these
are silently ignored and ip will do the filtering in userspace.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:59 -08:00
David Ahern 43fd93ae46 ip route: Remove rtnl_rtcache_request
Add a filter option to rtnl_routedump_req and use it to set rtm_flags
removing the need for rtnl_rtcache_request for dump requests.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:34 -08:00
David Ahern d97b16b2c9 libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask
Only AF_UNSPEC handled by rtnl_dump_ifinfo expects an ext_filter_mask
on a dump request. Update the linkdump request functions to only set
and send ext_filter_mask for AF_UNSPEC.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:05 -08:00
David Ahern 92e03242c4 libnetlink: Use NLMSG_LENGTH to set nlmsg_len
Change nlmsg_len from sizeof(req) to use NLMSG_LENGTH on the header.
2 of the inner headers are not 4-byte aligned, so add a 0-length buf
after the header with the __aligned(NLMSG_ALIGNTO) to ensure the size
of the request is large enough. Use NLMSG_ALIGN in NLMSG_LENGTH to set
nlmsg_len.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:57 -08:00
David Ahern 2750252d7e libnetlink: dump extack string in done message
Print any extack message that has been appended to a NLMSG_DONE message.
To avoid duplication, move the existing print code to a new helper.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:31 -08:00
David Ahern fdce94d0d1 Update kernel headers
Update kernel headers to commit
ce28bb445388 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-22 07:36:52 -08:00
David Ahern 17689d3075 Update kernel headers
Update kernel headers to commit
   055722716c39 ("tipc: fix uninitialized value for broadcast retransmission")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:47:29 -08:00
Stephen Hemminger db9af1f1e3 testsuite: drop unrunnable test
The classifier testbed test never worked and was always being
skipped. It depended on some files it tests/cls which never made
it into the iproute2 git repository.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:10:36 -08:00
Stephen Hemminger e5cd5a51f9 doc: remove trailing whitespace
Run whitespace scrubbing script to remove unnecessary trailing
blanks at end of line and end of file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:38 -08:00
David Ahern 6065ddfaa7 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:17 -08:00
Stephen Hemminger 738aebe52b drop support for DECnet
DECnet belongs in the history museum of dead protocols along
with Appletalk and IPX.

Linux support has outlived its natural life and the time has
come to remove it from iproute2. Dead code is a source
of bugs and exploits.

If anyone actually has DECnet running on some old distribution
they can just keep to the old version of iproute2.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-13 12:50:01 -08:00
David Ahern fbe7da2306 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:02:08 -08:00
Shalom Toledo a463fd4fa4 devlink: Add support for 'fw_load_policy' generic parameter
Add string to uint conversion for 'fw_load_policy' generic parameter.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:40 -08:00
Shalom Toledo 2557dca2b0 devlink: Add string to uint{8,16,32} conversion for generic parameters
Allow setting u{8,16,32} generic parameters as a well defined strings in
devlink user space tool.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:38 -08:00
Amritha Nambiar 8930840e67 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v6:
Modified to change json output format as object for sport/dport.

 "dst_port":{
           "start":2000,
           "end":6000
 },
 "src_port":{
           "start":50,
           "end":60
 }

v5:
Simplified some code and used 'sscanf' for parsing. Removed
space in output format.

v4:
Added man updates explaining filtering based on port ranges.
Removed 'range' keyword.

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:02:58 -08:00
David Ahern dd7d522a67 Revert "tc: flower: Classify packets based port ranges"
This reverts commit e20e50b0c1.

Inadvertently pushed v3 of this patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:01:07 -08:00
David Ahern fb417073a3 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 15:39:29 -08:00
Eric Dumazet 6d03d6f7d9 man: tc: update man page for fq packet scheduler
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-25 09:40:36 -08:00
Eric Dumazet 55e106c480 tc: fq: support ce_threshold attribute
Kernel commit 48872c11b772 ("net_sched: sch_fq: add dctcp-like marking")
added support for TCA_FQ_CE_THRESHOLD attribute.

This patch adds iproute2 support for it.

It also makes sure fq_print_xstats() can deal with smaller tc_fq_qd_stats
structures given by older kernels.

Usage :

FQATTRS="ce_threshold 4ms"
TXQS=8

for ETH in eth0
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
  tc qd add dev $ETH parent 1:$i fq $FQATTRS
 done
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:30:24 -08:00
Stephen Hemminger ce5071eda6 drop support for IPX
IPX has been depracted then removed from upstream kernels.
Drop support from ip route as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:27:56 -08:00
David Ahern 27f4e51003 Merge branch 'tc-gred-json-perf-vq-config' into iproute2-next
Jakub Kicinski  says:

====================

This set brings GRED support up to date with recent kernel changes.
In particular the new netlink attributes for more fine-grained stats
and per-virtual queue flags.

To make GRED usable in modern deployments the patch set starts with
adding JSON output.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:17:30 -08:00
Jakub Kicinski f7a8749aff tc: gred: allow controlling and dumping per-DP RED flags
Kernel now support setting ECN and HARDDROP flags per-virtual
queue.  Allow users to tweak the settings, and print them on
dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:40 -08:00
Jakub Kicinski 2d7c564a1e tc: gred: support controlling RED flags
Kernel GRED qdisc supports ECN marking, and the harddrop flag
but setting and dumping this flag is not possible with iproute2.
Add the support.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:36 -08:00
Jakub Kicinski fdaff63c6a tc: gred: use extended stats if available
Use the extended attributes with extra and better stats, when
possible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:19 -08:00
Jakub Kicinski c3e1cd28c1 tc: gred: separate out stats printing
Printing GRED statistics is long and deserves a function on its own.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:09 -08:00
Jakub Kicinski 6475e6a580 tc: gred: jsonify GRED output
Make GRED dump JSON-compatible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:04 -08:00
Jakub Kicinski 33021752cd tc: move RED flag printing to helper
Number of qdiscs use the same set of flags to control shared RED
implementation.  Add a helper for printing those flags.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:10:58 -08:00
Jakub Kicinski b640e85d2d json: add %hhu helpers
Add helpers for printing char-size values.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:09:53 -08:00
Jakub Kicinski c8f201e3d2 tc: gred: remove unclear comment
The comment about providing a proper message seems similar to
the comment in the kernel which says:

    /* hack -- fix at some point with proper message
       This is how we indicate to tc that there is no VQ
       at this DP */

it's unclear what that message would be, and whether it's needed.
Remove the confusing comment.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:08:16 -08:00
David Ahern 6ae54b1326 Revert "rdma: make local functions static"
This reverts commit e99c4443ae.

Patch added to iproute2-master breaks builds of -next because of a
more recent patch in -next that relies on the exports. Revert the
offending patch. Unfortunately this leaves a window where builds
break.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:17 -08:00
David Ahern 0868c8ab07 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:11 -08:00
Amritha Nambiar e20e50b0c1 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:34:56 -08:00
David Ahern 8d42678dfb Update kernel headers
Update kernel headers to
  b1a200484143 ("net-next/hinic: fix a bug in rx data flow")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:33:09 -08:00
Stefano Brivio 64dbd03ea1 iplink_geneve: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:47 -08:00
Stefano Brivio 3d98eba4fe iplink_vxlan: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:12 -08:00
David Ahern 3a7246dce4 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:50:50 -08:00
Leon Romanovsky e89feffae3 rdma: Document IB device renaming option
[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-05 19:12:03 -08:00
David Ahern cde2f706aa Merge branch 'rdma-dev-rename' into iproute2-next
Leon Romanovsky  says:

====================

An example:

[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:41:12 -07:00
Leon Romanovsky 4443c9c6a0 rdma: Add an option to rename IB device interface
Enrich rdmatool with an option to rename IB devices,
the command interface follows Iproute2 convention:
"rdma dev set [OLD-DEVNAME] name NEW-DEVNAME"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:56 -07:00
Leon Romanovsky a14ceed325 rdma: Introduce command execution helper with required device name
In contradiction to various show commands, the set command explicitly
requires to use device name as an argument. Provide new command
execution helper which enforces it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:18 -07:00
Leon Romanovsky 3fb00075d9 rdma: Update kernel include file to support IB device renaming
Bring kernel header file changes upto commit 05d940d3a3ec
("RDMA/nldev: Allow IB device rename through RDMA netlink")

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:15 -07:00
David Ahern b2e8bf1584 ip rule: Add ipproto and port range to filter list
Allow ip rule dumps and flushes to filter based on ipproto, sport
and dport. Example:

$ ip ru ls ipproto udp
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001
$ ip ru ls dport 53
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:37:14 -07:00
David Ahern 70bc1f6550 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:34:18 -07:00
517 changed files with 49763 additions and 11364 deletions

130
.clang-format Normal file
View File

@ -0,0 +1,130 @@
# SPDX-License-Identifier: GPL-2.0
#
# clang-format configuration file. Intended for clang-format >= 4.
#
# For more information, see:
#
# Documentation/process/clang-format.rst
# https://clang.llvm.org/docs/ClangFormat.html
# https://clang.llvm.org/docs/ClangFormatStyleOptions.html
#
---
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
#AlignEscapedNewlines: Left # Unknown to clang-format-4.0
AlignOperands: true
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
#AfterExternBlock: false # Unknown to clang-format-5.0
BeforeCatch: false
BeforeElse: false
IndentBraces: false
#SplitEmptyFunction: true # Unknown to clang-format-4.0
#SplitEmptyRecord: true # Unknown to clang-format-4.0
#SplitEmptyNamespace: true # Unknown to clang-format-4.0
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
#BreakBeforeInheritanceComma: false # Unknown to clang-format-4.0
BreakBeforeTernaryOperators: false
BreakConstructorInitializersBeforeComma: false
#BreakConstructorInitializers: BeforeComma # Unknown to clang-format-4.0
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: false
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
#CompactNamespaces: false # Unknown to clang-format-4.0
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 8
ContinuationIndentWidth: 8
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
#FixNamespaceComments: false # Unknown to clang-format-4.0
# Taken from:
# git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ \
# | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \
# | sort | uniq
ForEachMacros:
- 'list_for_each_entry'
- 'list_for_each_entry_safe'
- 'mnl_attr_for_each_nested'
- 'hlist_for_each'
- 'hlist_for_each_safe'
- 'hlist_for_each_entry'
#IncludeBlocks: Preserve # Unknown to clang-format-5.0
IncludeCategories:
- Regex: '.*'
Priority: 1
IncludeIsMainRegex: '(Test)?$'
IndentCaseLabels: false
#IndentPPDirectives: None # Unknown to clang-format-5.0
IndentWidth: 8
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: Inner
#ObjCBinPackProtocolList: Auto # Unknown to clang-format-5.0
ObjCBlockIndentWidth: 8
ObjCSpaceAfterProperty: true
ObjCSpaceBeforeProtocolList: true
# Taken from git's rules
#PenaltyBreakAssignment: 10 # Unknown to clang-format-4.0
PenaltyBreakBeforeFirstCallParameter: 30
PenaltyBreakComment: 10
PenaltyBreakFirstLessLess: 0
PenaltyBreakString: 10
PenaltyExcessCharacter: 100
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
ReflowComments: false
SortIncludes: false
#SortUsingDeclarations: false # Unknown to clang-format-4.0
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
#SpaceBeforeCtorInitializerColon: true # Unknown to clang-format-5.0
#SpaceBeforeInheritanceColon: true # Unknown to clang-format-5.0
SpaceBeforeParens: ControlStatements
#SpaceBeforeRangeBasedForLoopColon: true # Unknown to clang-format-5.0
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInContainerLiterals: false
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp03
TabWidth: 8
UseTab: Always
...

22
.mailmap Normal file
View File

@ -0,0 +1,22 @@
#
# This list is used by git-shortlog to fix a few botched name translations
# in the git archive, either because the author's full name was messed up
# and/or not always written the same way, making contributions from the
# same person appearing not to be so or badly displayed.
#
# Format
# Full name <goodaddress> <badaddress>
Steve Wise <larrystevenwise@gmail.com> <swise@opengridcomputing.com>
Steve Wise <larrystevenwise@gmail.com> <swise@chelsio.com>
Stephen Hemminger <stephen@networkplumber.org> <sthemmin@microsoft.com>
Stephen Hemminger <stephen@networkplumber.org> <shemming@brocade.com>
Stephen Hemminger <stephen@networkplumber.org> <stephen.hemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@linux-foundation.org>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@osdl.org>
Stephen Hemminger <stephen@networkplumber.org> <osdl.org!shemminger>
Stephen Hemminger <stephen@networkplumber.org> <osdl.net!shemminger>
David Ahern <dsahern@gmail.com> <dsa@cumulusnetworks.com>

View File

@ -1,6 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# Top level Makefile for iproute2
-include config.mk
ifeq ("$(origin V)", "command line")
VERBOSE = $(V)
endif
@ -13,7 +15,6 @@ MAKEFLAGS += --no-print-directory
endif
PREFIX?=/usr
LIBDIR?=$(PREFIX)/lib
SBINDIR?=/sbin
CONFDIR?=/etc/iproute2
NETNS_RUN_DIR?=/var/run/netns
@ -40,28 +41,31 @@ DEFINES+=-DCONFDIR=\"$(CONFDIR)\" \
-DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \
-DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\"
#options for decnet
ADDLIB+=dnet_ntop.o dnet_pton.o
#options for AX.25
ADDLIB+=ax25_ntop.o
#options for ipx
ADDLIB+=ipx_ntop.o ipx_pton.o
#options for AX.25
ADDLIB+=rose_ntop.o
#options for mpls
ADDLIB+=mpls_ntop.o mpls_pton.o
#options for NETROM
ADDLIB+=netrom_ntop.o
CC := gcc
HOSTCC ?= $(CC)
DEFINES += -D_GNU_SOURCE
# Turn on transparent support for LFS
DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
CCOPTS = -O2
CCOPTS = -O2 -pipe
WFLAGS := -Wall -Wstrict-prototypes -Wmissing-prototypes
WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS)
YACCFLAGS = -d -t -v
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma man
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma dcb man vdpa
LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
LDLIBS += $(LIBNETLINK)
@ -69,7 +73,9 @@ LDLIBS += $(LIBNETLINK)
all: config.mk
@set -e; \
for i in $(SUBDIRS); \
do echo; echo $$i; $(MAKE) $(MFLAGS) -C $$i; done
do echo; echo $$i; $(MAKE) -C $$i; done
.PHONY: clean clobber distclean check cscope version
help:
@echo "Make Targets:"
@ -79,48 +85,52 @@ help:
@echo " install - install binaries on local machine"
@echo " check - run tests"
@echo " cscope - build cscope database"
@echo " snapshot - generate version number header"
@echo " version - update version"
@echo ""
@echo "Make Arguments:"
@echo " V=[0|1] - set build verbosity level"
config.mk:
sh configure $(KERNEL_INCLUDE)
@if [ ! -f config.mk -o configure -nt config.mk ]; then \
sh configure $(KERNEL_INCLUDE); \
fi
install: all
install -m 0755 -d $(DESTDIR)$(SBINDIR)
install -m 0755 -d $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(ARPDDIR)
install -m 0755 -d $(DESTDIR)$(HDRDIR)
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples/diffserv
install -m 0644 README.iproute2+tc $(shell find examples -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples
install -m 0644 $(shell find examples/diffserv -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples/diffserv
@for i in $(SUBDIRS); do $(MAKE) -C $$i install; done
install -m 0644 $(shell find etc/iproute2 -maxdepth 1 -type f) $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/tc $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/devlink $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR)
snapshot:
echo "static const char SNAPSHOT[] = \""`date +%y%m%d`"\";" \
> include/SNAPSHOT.h
version:
echo "static const char version[] = \""`git describe --tags --long`"\";" \
> include/version.h
clean:
@for i in $(SUBDIRS) testsuite; \
do $(MAKE) $(MFLAGS) -C $$i clean; done
do $(MAKE) -C $$i clean; done
clobber:
touch config.mk
$(MAKE) $(MFLAGS) clean
$(MAKE) clean
rm -f config.mk cscope.*
distclean: clobber
check: all
cd testsuite && $(MAKE) && $(MAKE) alltests
$(MAKE) -C testsuite
$(MAKE) -C testsuite alltests
@if command -v man >/dev/null 2>&1; then \
echo "Checking manpages for syntax errors..."; \
$(MAKE) -C man check; \
else \
echo "man not installed, skipping checks for syntax errors."; \
fi
cscope:
cscope -b -q -R -Iinclude -sip -slib -smisc -snetem -stc

15
README
View File

@ -28,17 +28,12 @@ The makefile will automatically build a config.mk file which
contains definitions of libraries that may or may not be available
on the system such as: ATM, ELF, MNL, and SELINUX.
3. To make documentation, cd to doc/ directory , then
look at start of Makefile and set correct values for
PAGESIZE=a4 , ie: a4 , letter ... (string)
PAGESPERPAGE=2 , ie: 1 , 2 ... (numeric)
and make there. It assumes, that latex, dvips and psnup
are in your path.
3. include/uapi
4. This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different
kernel include files.
This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different
kernel include files.
Stephen Hemminger
stephen@networkplumber.org

View File

@ -1,33 +0,0 @@
Here are a few quick points about DECnet support...
o iproute2 is the tool of choice for configuring the DECnet support for
Linux. For many features, it is the only tool which can be used to
configure them.
o No name resolution is available as yet, all addresses must be
entered numerically.
o Remember to set the hardware address of the interface using:
ip link set ethX address xx:xx:xx:xx:xx:xx
(where xx:xx:xx:xx:xx:xx is the MAC address for your DECnet node
address)
if your Ethernet card won't listen to more than one unicast
mac address at once. If the Linux DECnet stack doesn't talk to
any other DECnet nodes, then check this with tcpdump and if its
a problem, change the mac address (but do this _before_ starting
any other network protocol on the interface)
o Whilst you can use ip addr add to add more than one DECnet address to an
interface, don't expect addresses which are not the same as the
kernels node address to work properly with 2.4 kernels. This should
be fine with 2.6 kernels as the routing code has been extensively
modified and improved.
o The DECnet support is currently self contained. It does not depend on
the libdnet library.
Steve Whitehouse <steve@chygwyn.com>

View File

@ -1,95 +0,0 @@
I. About the distribution tables
The table used for "synthesizing" the distribution is essentially a scaled,
translated, inverse to the cumulative distribution function.
Here's how to think about it: Let F() be the cumulative distribution
function for a probability distribution X. We'll assume we've scaled
things so that X has mean 0 and standard deviation 1, though that's not
so important here. Then:
F(x) = P(X <= x) = \int_{-inf}^x f
where f is the probability density function.
F is monotonically increasing, so has an inverse function G, with range
0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may
have singularities if X has point masses, i.e., points x such that
P(X = x) > 0.)
Now we create a tabular representation of G as follows: Choose some table
size N, and for the ith entry, put in G(i/N). Let's call this table T.
The claim now is, I can create a (discrete) random variable Y whose
distribution has the same approximate "shape" as X, simply by letting
Y = T(U), where U is a discrete uniform random variable with range 1 to N.
To see this, it's enough to show that Y's cumulative distribution function,
(let's call it H), is a discrete approximation to F. But
H(x) = P(Y <= x)
= (# of entries in T <= x) / N -- as Y chosen uniformly from T
= i/N, where i is the largest integer such that G(i/N) <= x
= i/N, where i is the largest integer such that i/N <= F(x)
-- since G and F are inverse functions (and F is
increasing)
= floor(N*F(x))/N
as desired.
II. How to create distribution tables (in theory)
How can we create this table in practice? In some cases, F may have a
simple expression which allows evaluating its inverse directly. The
Pareto distribution is one example of this. In other cases, and
especially for matching an experimentally observed distribution, it's
easiest simply to create a table for F and "invert" it. Here, we give
a concrete example, namely how the new "experimental" distribution was
created.
1. Collect enough data points to characterize the distribution. Here, I
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
That's far more data than is really necessary, but it was fairly painless to
collect it, so...
2. Normalize the data so that it has mean 0 and standard deviation 1.
3. Determine the cumulative distribution. The code I wrote creates a table
covering the range -10 to +10, with granularity .00005. Obviously, this
is absurdly over-precise, but since it's a one-time only computation, I
figured it hardly mattered.
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
(here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table
for the ("normalized") inverse of size TABLESIZE, covering its domain 0
to 1 with granularity 1/TABLESIZE. Note that even with the granularity
used in creating the table for F, it's possible not all the entries in
the table for G will be filled in. So, make a pass through the
inverse's table, filling in any missing entries by linear interpolation.
III. How to create distribution tables (in practice)
If you want to do all this yourself, I've provided several tools to help:
1. maketable does the steps 2-4 above, and then generates the appropriate
header file. So if you have your own time distribution, you can generate
the header simply by:
maketable < time.values > header.h
2. As explained in the other README file, the somewhat sleazy way I have
of generating correlated values needs correction. You can generate your
own correction tables by compiling makesigtable and makemutable with
your header file. Check the Makefile to see how this is done.
3. Warning: maketable, makesigtable and especially makemutable do
enormous amounts of floating point arithmetic. Don't try running
these on an old 486. (NIST Net itself will run fine on such a
system, since in operation, it just needs to do a few simple integral
calculations. But getting there takes some work.)
4. The tables produced are all normalized for mean 0 and standard
deviation 1. How do you know what values to use for real? Here, I've
provided a simple "stats" utility. Give it a series of floating point
values, and it will return their mean (mu), standard deviation (sigma),
and correlation coefficient (rho). You can then plug these values
directly into NIST Net.

View File

@ -1,123 +0,0 @@
iproute2+tc*
It's the first release of Linux traffic control engine.
NOTES.
* csz scheduler is inoperational at the moment, and probably
never will be repaired but replaced with h-pfq scheduler.
* To use "fw" classifier you will need ipfwchains patch.
* No manual available. Ask me, if you have problems (only try to guess
answer yourself at first 8)).
Micro-manual how to start it the first time
-------------------------------------------
A. Attach CBQ to eth1:
tc qdisc add dev eth1 root handle 1: cbq bandwidth 10Mbit allot 1514 cell 8 \
avpkt 1000 mpu 64
B. Add root class:
tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 10Mbit rate 10Mbit \
allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000
C. Add default interactive class:
tc class add dev eth1 parent 1:1 classid 1:2 cbq bandwidth 10Mbit rate 1Mbit \
allot 1514 cell 8 weight 100Kbit prio 3 maxburst 20 avpkt 1000 split 1:0 \
defmap c0
D. Add default class:
tc class add dev eth1 parent 1:1 classid 1:3 cbq bandwidth 10Mbit rate 8Mbit \
allot 1514 cell 8 weight 800Kbit prio 7 maxburst 20 avpkt 1000 split 1:0 \
defmap 3f
etc. etc. etc. Well, it is enough to start 8) The rest can be guessed 8)
Look also at more elaborated example, ready to start rsvpd,
in rsvp/cbqinit.eth1.
Terminology and advices about setting CBQ parameters may be found in Sally Floyd
papers.
Pairs X:Y are class handles, X:0 are qdisc handles.
weight should be proportional to rate for leaf classes
(I repeated it ten times less, but it is not necessary)
defmap is bitmap of logical priorities served by this class.
E. Another qdiscs are simpler. F.e. let's join TBF on class 1:2
tc qdisc add dev eth1 parent 1:2 tbf rate 64Kbit buffer 5Kb/8 limit 10Kb
F. Look at all that we created:
tc qdisc ls dev eth1
tc class ls dev eth1
G. Install "route" classifier on root of cbq and map destination from realm
1 to class 1:2
tc filter add dev eth1 parent 1:0 protocol ip prio 100 route to 1 classid 1:2
H. Assign routes to 10.11.12.0/24 to realm 1
ip route add 10.11.12.0/24 dev eth1 via whatever realm 1
etc. The same thing can be made with rules.
I still did not test ipchains, but they should work too.
Setup and code example of BPF classifier and action can be found under
examples/bpf/, which should explain everything for getting started.
Setup of rsvp and u32 classifiers is more hairy.
If you read RSVP specs, you will understand how rsvp classifier
works easily. What's about u32... That's example:
#! /bin/sh
TC=/home/root/tc
# Setup classifier root on eth1 root (it is cbq)
$TC filter add dev eth1 parent 1:0 prio 5 protocol ip u32
# Create hash table of 256 slots with ID 1:
$TC filter add dev eth1 parent 1:0 prio 5 handle 1: u32 divisor 256
# Add to 6th slot of hash table rule to select tcp/telnet to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (32kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:6: \
match ip dst 193.233.7.75 \
match tcp dst 0x17 0xffff \
flowid 1:4 \
police rate 32kbit buffer 5kb/8 mpu 64 mtu 1514 index 1
# Add to 1th slot of hash table rule to select icmp to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (10kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:: \
sample ip protocol 1 0xff \
match ip dst 193.233.7.75 \
flowid 1:4 \
police rate 10kbit buffer 5kb/8 mpu 64 mtu 1514 index 2
# Lookup hash table, if it is not fragmented frame
# Use protocol as hash key
$TC filter add dev eth1 parent 1:0 prio 5 handle ::1 u32 ht 800:: \
match ip nofrag \
offset mask 0x0F00 shift 6 \
hashkey mask 0x00ff0000 at 8 \
link 1:
Alexey Kuznetsov
kuznet@ms2.inr.ac.ru

View File

@ -1,81 +0,0 @@
lnstat - linux networking statistics
(C) 2004 Harald Welte <laforge@gnumonks.org
======================================================================
This tool is a generalized and more feature-complete replacement for the old
'rtstat' program.
In addition to routing cache statistics, it supports any kind of statistics
the linux kernel exports via a file in /proc/net/stat. In a stock 2.6.9
kernel, this is
per-protocol neighbour cache statistics
(ipv4, ipv6, atm, decnet)
routing cache statistics
(ipv4)
connection tracking statistics
(ipv4)
Please note that lnstat will adopt to any additional statistics that might be
added to the kernel at some later point
I personally always like examples more than any reference documentation, so I
list the following examples. If somebody wants to do a manpage, feel free
to send me a patch :)
EXAMPLES:
In order to get a list of supported statistics files, you can run
lnstat -d
It will display something like
/proc/net/stat/arp_cache:
1: entries
2: allocs
3: destroys
[...]
/proc/net/stat/rt_cache:
1: entries
2: in_hit
3: in_slow_tot
You can now select the files/keys you are interested by something like
lnstat -k arp_cache:entries,rt_cache:in_hit,arp_cache:destroys
arp_cach|rt_cache|arp_cach|
entries| in_hit|destroys|
6| 6| 0|
6| 0| 0|
6| 2| 0|
You can specify the interval (e.g. 10 seconds) by:
lnstat -i 10
You can specify to only use one particular statistics file:
lnstat -f ip_conntrack
You can specify individual field widths
lnstat -k arp_cache:entries,rt_cache:entries -w 20,8
You can specify not to print a header at all
lnstat -s 0
You can specify to print a header only at start of the program
lnstat -s 1
You can specify to print a header at start and every 20 lines:
lnstat -s 20
You can specify the number of samples you want to take (e.g. 5):
lnstat -c 5

1006
bash-completion/devlink Normal file

File diff suppressed because it is too large Load Diff

View File

@ -3,8 +3,8 @@
# Copyright 2016 Quentin Monnet <quentin.monnet@6wind.com>
QDISC_KIND=' choke codel bfifo pfifo pfifo_head_drop fq fq_codel gred hhf \
mqprio multiq netem pfifo_fast pie red rr sfb sfq tbf atm cbq drr \
dsmark hfsc htb prio qfq '
mqprio multiq netem pfifo_fast pie fq_pie red rr sfb sfq tbf atm \
cbq drr dsmark hfsc htb prio qfq '
FILTER_KIND=' basic bpf cgroup flow flower fw route rsvp tcindex u32 matchall '
ACTION_KIND=' gact mirred bpf sample '
@ -302,7 +302,7 @@ _tc_qdisc_options()
;;
gred)
_tc_once_attr 'setup vqs default grio vq prio limit min max avpkt \
burst probability bandwidth'
burst probability bandwidth ecn harddrop'
return 0
;;
hhf)
@ -323,6 +323,15 @@ _tc_qdisc_options()
_tc_once_attr 'limit target tupdate alpha beta'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
fq_pie)
_tc_once_attr 'limit flows target tupdate \
alpha beta quantum memory_limit ecn_prob'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
red)

View File

@ -8,8 +8,13 @@
void print_vlan_info(struct rtattr *tb, int ifindex);
int print_linkinfo(struct nlmsghdr *n, void *arg);
int print_mdb_mon(struct nlmsghdr *n, void *arg);
int print_fdb(struct nlmsghdr *n, void *arg);
int print_mdb(struct nlmsghdr *n, void *arg);
void print_stp_state(__u8 state);
int parse_stp_state(const char *arg);
int print_vlan_rtm(struct nlmsghdr *n, void *arg, bool monitor,
bool global_only);
void br_print_router_port_stats(struct rtattr *pattr);
int do_fdb(int argc, char **argv);
int do_mdb(int argc, char **argv);

View File

@ -12,7 +12,7 @@
#include <string.h>
#include <errno.h>
#include "SNAPSHOT.h"
#include "version.h"
#include "utils.h"
#include "br_common.h"
#include "namespace.h"
@ -37,10 +37,10 @@ static void usage(void)
fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
" bridge [ -force ] -batch filename\n"
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -color -p[retty] -j[son] }\n");
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -color -p[retty] -j[son] }\n");
exit(-1);
}
@ -77,45 +77,23 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1;
}
static int br_batch_cmd(int argc, char *argv[], void *data)
{
return do_cmd(argv[0], argc, argv);
}
static int batch(const char *name)
{
char *line = NULL;
size_t len = 0;
int ret = EXIT_SUCCESS;
if (name && strcmp(name, "-") != 0) {
if (freopen(name, "r", stdin) == NULL) {
fprintf(stderr,
"Cannot open file \"%s\" for reading: %s\n",
name, strerror(errno));
return EXIT_FAILURE;
}
}
int ret;
if (rtnl_open(&rth, 0) < 0) {
fprintf(stderr, "Cannot open rtnetlink\n");
return EXIT_FAILURE;
}
cmdlineno = 0;
while (getcmdline(&line, &len, stdin) != -1) {
char *largv[100];
int largc;
rtnl_set_strict_dump(&rth);
largc = makeargs(line, largv, 100);
if (largc == 0)
continue; /* blank line */
if (do_cmd(largv[0], largc, largv)) {
fprintf(stderr, "Command failed %s:%d\n",
name, cmdlineno);
ret = EXIT_FAILURE;
if (!force)
break;
}
}
if (line)
free(line);
ret = do_batch(name, force, br_batch_cmd, NULL);
rtnl_close(&rth);
return ret;
@ -139,7 +117,7 @@ main(int argc, char **argv)
if (matches(opt, "-help") == 0) {
usage();
} else if (matches(opt, "-Version") == 0) {
printf("bridge utility, 0.0\n");
printf("bridge utility, %s\n", version);
exit(0);
} else if (matches(opt, "-stats") == 0 ||
matches(opt, "-statistics") == 0) {
@ -171,9 +149,9 @@ main(int argc, char **argv)
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-force") == 0) {
++force;
} else if (matches(opt, "-json") == 0) {
@ -205,6 +183,8 @@ main(int argc, char **argv)
if (rtnl_open(&rth, 0) < 0)
exit(1);
rtnl_set_strict_dump(&rth);
if (argc > 1)
return do_cmd(argv[1], argc-1, argv+1);

View File

@ -30,16 +30,21 @@
#include "rt_names.h"
#include "utils.h"
static unsigned int filter_index, filter_vlan, filter_state;
static unsigned int filter_index, filter_dynamic, filter_master,
filter_state, filter_vlan;
static void usage(void)
{
fprintf(stderr,
"Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n"
" [ self ] [ master ] [ use ] [ router ] [ extern_learn ]\n"
" [ sticky ] [ local | static | dynamic ] [ dst IPADDR ]\n"
" [ vlan VID ] [ port PORT] [ vni VNI ] [ via DEV ]\n"
" bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] [ state STATE ] ]\n");
" [ sticky ] [ local | static | dynamic ] [ vlan VID ]\n"
" { [ dst IPADDR ] [ port PORT] [ vni VNI ] | [ nhid NHID ] }\n"
" [ via DEV ] [ src_vni VNI ]\n"
" bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ]\n"
" [ state STATE ] [ dynamic ] ]\n"
" bridge fdb get [ to ] LLADDR [ br BRDEV ] { brport | dev } DEV\n"
" [ vlan VID ] [ vni VNI ] [ self ] [ master ] [ dynamic ]\n");
exit(-1);
}
@ -59,7 +64,10 @@ static const char *state_n2a(unsigned int s)
if (s & NUD_REACHABLE)
return "";
sprintf(buf, "state=%#x", s);
if (is_json_context())
sprintf(buf, "%#x", s);
else
sprintf(buf, "state=%#x", s);
return buf;
}
@ -164,6 +172,9 @@ int print_fdb(struct nlmsghdr *n, void *arg)
if (filter_vlan && filter_vlan != vid)
return 0;
if (filter_dynamic && (r->ndm_state & NUD_PERMANENT))
return 0;
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELNEIGH)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
@ -181,10 +192,13 @@ int print_fdb(struct nlmsghdr *n, void *arg)
"mac", "%s ", lladdr);
}
if (!filter_index && r->ndm_ifindex)
if (!filter_index && r->ndm_ifindex) {
print_string(PRINT_FP, NULL, "dev ", NULL);
print_color_string(PRINT_ANY, COLOR_IFNAME,
"ifname", "dev %s ",
"ifname", "%s ",
ll_index_to_name(r->ndm_ifindex));
}
if (tb[NDA_DST]) {
int family = AF_INET;
@ -197,9 +211,11 @@ int print_fdb(struct nlmsghdr *n, void *arg)
RTA_PAYLOAD(tb[NDA_DST]),
RTA_DATA(tb[NDA_DST]));
print_string(PRINT_FP, NULL, "dst ", NULL);
print_color_string(PRINT_ANY,
ifa_family_color(family),
"dst", "dst %s ", dst);
"dst", "%s ", dst);
}
if (vid)
@ -234,6 +250,10 @@ int print_fdb(struct nlmsghdr *n, void *arg)
ll_index_to_name(ifindex));
}
if (tb[NDA_NH_ID])
print_uint(PRINT_ANY, "nhid", "nhid %u ",
rta_getattr_u32(tb[NDA_NH_ID]));
if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY,
"linkNetNsId", "link-netnsid %d ",
@ -256,20 +276,49 @@ int print_fdb(struct nlmsghdr *n, void *arg)
return 0;
}
static int fdb_linkdump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
if (filter_index) {
struct ifinfomsg *ifm = NLMSG_DATA(nlh);
ifm->ifi_index = filter_index;
}
if (filter_master) {
err = addattr32(nlh, reqlen, IFLA_MASTER, filter_master);
if (err)
return err;
}
return 0;
}
static int fdb_dump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
if (filter_index) {
struct ndmsg *ndm = NLMSG_DATA(nlh);
ndm->ndm_ifindex = filter_index;
}
if (filter_master) {
err = addattr32(nlh, reqlen, NDA_MASTER, filter_master);
if (err)
return err;
}
return 0;
}
static int fdb_show(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ifinfomsg ifm;
char buf[256];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.ifm.ifi_family = PF_BRIDGE,
};
char *filter_dev = NULL;
char *br = NULL;
int msg_size = sizeof(struct ifinfomsg);
int rc;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
@ -290,6 +339,8 @@ static int fdb_show(int argc, char **argv)
if (state_a2n(&state, *argv))
invarg("invalid state", *argv);
filter_state |= state;
} else if (strcmp(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (matches(*argv, "help") == 0)
usage();
@ -304,8 +355,7 @@ static int fdb_show(int argc, char **argv)
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
addattr32(&req.n, sizeof(req), IFLA_MASTER, br_ifindex);
msg_size += RTA_LENGTH(4);
filter_master = br_ifindex;
}
/*we'll keep around filter_dev for older kernels */
@ -313,10 +363,13 @@ static int fdb_show(int argc, char **argv)
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
req.ifm.ifi_index = filter_index;
}
if (rtnl_dump_request(&rth, RTM_GETNEIGH, &req.ifm, msg_size) < 0) {
if (rth.flags & RTNL_HANDLE_F_STRICT_CHK)
rc = rtnl_neighdump_req(&rth, PF_BRIDGE, fdb_dump_filter);
else
rc = rtnl_fdb_linkdump_req_filter_fn(&rth, fdb_linkdump_filter);
if (rc < 0) {
perror("Cannot send dump request");
exit(1);
}
@ -352,9 +405,11 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
inet_prefix dst;
unsigned long port = 0;
unsigned long vni = ~0;
unsigned long src_vni = ~0;
unsigned int via = 0;
char *endptr;
short vid = -1;
__u32 nhid = 0;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@ -366,6 +421,10 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
duparg2("dst", *argv);
get_addr(&dst, *argv, preferred_family);
dst_ok = 1;
} else if (strcmp(*argv, "nhid") == 0) {
NEXT_ARG();
if (get_u32(&nhid, *argv, 0))
invarg("\"id\" value is invalid\n", *argv);
} else if (strcmp(*argv, "port") == 0) {
NEXT_ARG();
@ -385,6 +444,12 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "src_vni") == 0) {
NEXT_ARG();
src_vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(src_vni >> 24) || src_vni == ULONG_MAX)
invarg("invalid src VNI\n", *argv);
} else if (strcmp(*argv, "via") == 0) {
NEXT_ARG();
via = ll_name_to_index(*argv);
@ -434,6 +499,11 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
return -1;
}
if (nhid && (dst_ok || port || vni != ~0)) {
fprintf(stderr, "dst, port, vni are mutually exclusive with nhid\n");
return -1;
}
/* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF;
@ -455,6 +525,8 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
if (vid >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vid);
if (nhid > 0)
addattr32(&req.n, sizeof(req), NDA_NH_ID, nhid);
if (port) {
unsigned short dport;
@ -464,6 +536,8 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
}
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (src_vni != ~0)
addattr32(&req.n, sizeof(req), NDA_SRC_VNI, src_vni);
if (via)
addattr32(&req.n, sizeof(req), NDA_IFINDEX, via);
@ -477,6 +551,121 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
return 0;
}
static int fdb_get(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ndmsg ndm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_GETNEIGH,
.ndm.ndm_family = AF_BRIDGE,
};
char *d = NULL, *br = NULL;
struct nlmsghdr *answer;
unsigned long vni = ~0;
char abuf[ETH_ALEN];
int br_ifindex = 0;
char *addr = NULL;
short vlan = -1;
char *endptr;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "br") == 0) {
NEXT_ARG();
br = *argv;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "vni") == 0) {
NEXT_ARG();
vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
req.ndm.ndm_flags |= NTF_MASTER;
} else if (matches(*argv, "vlan") == 0) {
if (vlan >= 0)
duparg2("vlan", *argv);
NEXT_ARG();
vlan = atoi(*argv);
} else if (matches(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
if (matches(*argv, "help") == 0)
usage();
if (addr)
duparg2("to", *argv);
addr = *argv;
}
argc--; argv++;
}
if ((d == NULL && br == NULL) || addr == NULL) {
fprintf(stderr, "Device or master and address are required arguments.\n");
return -1;
}
if (sscanf(addr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
abuf, abuf+1, abuf+2,
abuf+3, abuf+4, abuf+5) != 6) {
fprintf(stderr, "Invalid mac address %s\n", addr);
return -1;
}
addattr_l(&req.n, sizeof(req), NDA_LLADDR, abuf, ETH_ALEN);
if (vlan >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vlan);
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (d) {
req.ndm.ndm_ifindex = ll_name_to_index(d);
if (!req.ndm.ndm_ifindex) {
fprintf(stderr, "Cannot find device \"%s\"\n", d);
return -1;
}
}
if (br) {
br_ifindex = ll_name_to_index(br);
if (!br_ifindex) {
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
addattr32(&req.n, sizeof(req), NDA_MASTER, br_ifindex);
}
if (rtnl_talk(&rth, &req.n, &answer) < 0)
return -2;
/*
* Initialize a json_writer and open an array object
* if -json was specified.
*/
new_json_obj(json);
if (print_fdb(answer, stdout) < 0) {
fprintf(stderr, "An error :-)\n");
return -1;
}
delete_json_obj();
return 0;
}
int do_fdb(int argc, char **argv)
{
ll_init_map(&rth);
@ -490,6 +679,8 @@ int do_fdb(int argc, char **argv)
return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_REPLACE, argc-1, argv+1);
if (matches(*argv, "delete") == 0)
return fdb_modify(RTM_DELNEIGH, 0, argc-1, argv+1);
if (matches(*argv, "get") == 0)
return fdb_get(argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)

View File

@ -19,7 +19,7 @@
static unsigned int filter_index;
static const char *port_states[] = {
static const char *stp_states[] = {
[BR_STATE_DISABLED] = "disabled",
[BR_STATE_LISTENING] = "listening",
[BR_STATE_LEARNING] = "learning",
@ -68,22 +68,29 @@ static void print_link_flags(FILE *fp, unsigned int flags, unsigned int mdown)
close_json_array(PRINT_ANY, "> ");
}
static void print_portstate(__u8 state)
void print_stp_state(__u8 state)
{
if (state <= BR_STATE_BLOCKING)
print_string(PRINT_ANY, "state",
"state %s ", port_states[state]);
"state %s ", stp_states[state]);
else
print_uint(PRINT_ANY, "state",
"state (%d) ", state);
}
static void print_onoff(FILE *fp, const char *flag, __u8 val)
int parse_stp_state(const char *arg)
{
if (is_json_context())
print_bool(PRINT_JSON, flag, NULL, val);
else
fprintf(fp, "%s %s ", flag, val ? "on" : "off");
size_t nstates = ARRAY_SIZE(stp_states);
int state;
for (state = 0; state < nstates; state++)
if (strcmp(stp_states[state], arg) == 0)
break;
if (state == nstates)
state = -1;
return state;
}
static void print_hwmode(__u16 mode)
@ -104,7 +111,7 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
parse_rtattr_nested(prtb, IFLA_BRPORT_MAX, attr);
if (prtb[IFLA_BRPORT_STATE])
print_portstate(rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
print_stp_state(rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
if (prtb[IFLA_BRPORT_PRIORITY])
print_uint(PRINT_ANY, "priority",
@ -123,35 +130,38 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
fprintf(fp, "%s ", _SL_);
if (prtb[IFLA_BRPORT_MODE])
print_onoff(fp, "hairpin",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
print_on_off(PRINT_ANY, "hairpin", "hairpin %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
if (prtb[IFLA_BRPORT_GUARD])
print_onoff(fp, "guard",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
print_on_off(PRINT_ANY, "guard", "guard %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
if (prtb[IFLA_BRPORT_PROTECT])
print_onoff(fp, "root_block",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
print_on_off(PRINT_ANY, "root_block", "root_block %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
if (prtb[IFLA_BRPORT_FAST_LEAVE])
print_onoff(fp, "fastleave",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
print_on_off(PRINT_ANY, "fastleave", "fastleave %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
if (prtb[IFLA_BRPORT_LEARNING])
print_onoff(fp, "learning",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
print_on_off(PRINT_ANY, "learning", "learning %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
if (prtb[IFLA_BRPORT_LEARNING_SYNC])
print_onoff(fp, "learning_sync",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
print_on_off(PRINT_ANY, "learning_sync", "learning_sync %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
print_onoff(fp, "flood",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
print_on_off(PRINT_ANY, "flood", "flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_FLOOD])
print_onoff(fp, "mcast_flood",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD]));
print_on_off(PRINT_ANY, "mcast_flood", "mcast_flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_TO_UCAST])
print_on_off(PRINT_ANY, "mcast_to_unicast", "mcast_to_unicast %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_TO_UCAST]));
if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS])
print_onoff(fp, "neigh_suppress",
rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS]));
print_on_off(PRINT_ANY, "neigh_suppress", "neigh_suppress %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS]));
if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_onoff(fp, "vlan_tunnel",
rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
print_on_off(PRINT_ANY, "vlan_tunnel", "vlan_tunnel %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
if (prtb[IFLA_BRPORT_BACKUP_PORT]) {
int ifidx;
@ -163,10 +173,10 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
}
if (prtb[IFLA_BRPORT_ISOLATED])
print_onoff(fp, "isolated",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
print_on_off(PRINT_ANY, "isolated", "isolated %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
} else
print_portstate(rta_getattr_u8(attr));
print_stp_state(rta_getattr_u8(attr));
}
@ -251,41 +261,27 @@ int print_linkinfo(struct nlmsghdr *n, void *arg)
static void usage(void)
{
fprintf(stderr, "Usage: bridge link set dev DEV [ cost COST ] [ priority PRIO ] [ state STATE ]\n");
fprintf(stderr, " [ guard {on | off} ]\n");
fprintf(stderr, " [ hairpin {on | off} ]\n");
fprintf(stderr, " [ fastleave {on | off} ]\n");
fprintf(stderr, " [ root_block {on | off} ]\n");
fprintf(stderr, " [ learning {on | off} ]\n");
fprintf(stderr, " [ learning_sync {on | off} ]\n");
fprintf(stderr, " [ flood {on | off} ]\n");
fprintf(stderr, " [ mcast_flood {on | off} ]\n");
fprintf(stderr, " [ neigh_suppress {on | off} ]\n");
fprintf(stderr, " [ vlan_tunnel {on | off} ]\n");
fprintf(stderr, " [ isolated {on | off} ]\n");
fprintf(stderr, " [ hwmode {vepa | veb} ]\n");
fprintf(stderr, " [ backup_port DEVICE ] [ nobackup_port ]\n");
fprintf(stderr, " [ self ] [ master ]\n");
fprintf(stderr, " bridge link show [dev DEV]\n");
fprintf(stderr,
"Usage: bridge link set dev DEV [ cost COST ] [ priority PRIO ] [ state STATE ]\n"
" [ guard {on | off} ]\n"
" [ hairpin {on | off} ]\n"
" [ fastleave {on | off} ]\n"
" [ root_block {on | off} ]\n"
" [ learning {on | off} ]\n"
" [ learning_sync {on | off} ]\n"
" [ flood {on | off} ]\n"
" [ mcast_flood {on | off} ]\n"
" [ mcast_to_unicast {on | off} ]\n"
" [ neigh_suppress {on | off} ]\n"
" [ vlan_tunnel {on | off} ]\n"
" [ isolated {on | off} ]\n"
" [ hwmode {vepa | veb} ]\n"
" [ backup_port DEVICE ] [ nobackup_port ]\n"
" [ self ] [ master ]\n"
" bridge link show [dev DEV]\n");
exit(-1);
}
static bool on_off(char *arg, __s8 *attr, char *val)
{
if (strcmp(val, "on") == 0)
*attr = 1;
else if (strcmp(val, "off") == 0)
*attr = 0;
else {
fprintf(stderr,
"Error: argument of \"%s\" must be \"on\" or \"off\"\n",
arg);
return false;
}
return true;
}
static int brlink_modify(int argc, char **argv)
{
struct {
@ -306,6 +302,7 @@ static int brlink_modify(int argc, char **argv)
__s8 flood = -1;
__s8 vlan_tunnel = -1;
__s8 mcast_flood = -1;
__s8 mcast_to_unicast = -1;
__s8 isolated = -1;
__s8 hairpin = -1;
__s8 bpdu_guard = -1;
@ -317,6 +314,7 @@ static int brlink_modify(int argc, char **argv)
__s16 mode = -1;
__u16 flags = 0;
struct rtattr *nest;
int ret;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@ -324,36 +322,49 @@ static int brlink_modify(int argc, char **argv)
d = *argv;
} else if (strcmp(*argv, "guard") == 0) {
NEXT_ARG();
if (!on_off("guard", &bpdu_guard, *argv))
return -1;
bpdu_guard = parse_on_off("guard", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "hairpin") == 0) {
NEXT_ARG();
if (!on_off("hairpin", &hairpin, *argv))
return -1;
hairpin = parse_on_off("hairpin", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "fastleave") == 0) {
NEXT_ARG();
if (!on_off("fastleave", &fast_leave, *argv))
return -1;
fast_leave = parse_on_off("fastleave", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "root_block") == 0) {
NEXT_ARG();
if (!on_off("root_block", &root_block, *argv))
return -1;
root_block = parse_on_off("root_block", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning") == 0) {
NEXT_ARG();
if (!on_off("learning", &learning, *argv))
return -1;
learning = parse_on_off("learning", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning_sync") == 0) {
NEXT_ARG();
if (!on_off("learning_sync", &learning_sync, *argv))
return -1;
learning_sync = parse_on_off("learning_sync", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "flood") == 0) {
NEXT_ARG();
if (!on_off("flood", &flood, *argv))
return -1;
flood = parse_on_off("flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_flood") == 0) {
NEXT_ARG();
if (!on_off("mcast_flood", &mcast_flood, *argv))
return -1;
mcast_flood = parse_on_off("mcast_flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_to_unicast") == 0) {
NEXT_ARG();
mcast_to_unicast = parse_on_off("mcast_to_unicast", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "cost") == 0) {
NEXT_ARG();
cost = atoi(*argv);
@ -363,14 +374,11 @@ static int brlink_modify(int argc, char **argv)
} else if (strcmp(*argv, "state") == 0) {
NEXT_ARG();
char *endptr;
size_t nstates = ARRAY_SIZE(port_states);
state = strtol(*argv, &endptr, 10);
if (!(**argv != '\0' && *endptr == '\0')) {
for (state = 0; state < nstates; state++)
if (strcmp(port_states[state], *argv) == 0)
break;
if (state == nstates) {
state = parse_stp_state(*argv);
if (state == -1) {
fprintf(stderr,
"Error: invalid STP port state\n");
return -1;
@ -394,18 +402,19 @@ static int brlink_modify(int argc, char **argv)
flags |= BRIDGE_FLAGS_MASTER;
} else if (strcmp(*argv, "neigh_suppress") == 0) {
NEXT_ARG();
if (!on_off("neigh_suppress", &neigh_suppress,
*argv))
return -1;
neigh_suppress = parse_on_off("neigh_suppress", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "vlan_tunnel") == 0) {
NEXT_ARG();
if (!on_off("vlan_tunnel", &vlan_tunnel,
*argv))
return -1;
vlan_tunnel = parse_on_off("vlan_tunnel", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "isolated") == 0) {
NEXT_ARG();
if (!on_off("isolated", &isolated, *argv))
return -1;
isolated = parse_on_off("isolated", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "backup_port") == 0) {
NEXT_ARG();
backup_port_idx = ll_name_to_index(*argv);
@ -453,6 +462,9 @@ static int brlink_modify(int argc, char **argv)
if (mcast_flood >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_FLOOD,
mcast_flood);
if (mcast_to_unicast >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_TO_UCAST,
mcast_to_unicast);
if (learning >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING, learning);
if (learning_sync >= 0)

View File

@ -16,9 +16,9 @@
#include <arpa/inet.h>
#include "libnetlink.h"
#include "utils.h"
#include "br_common.h"
#include "rt_names.h"
#include "utils.h"
#include "json_print.h"
#ifndef MDBA_RTA
@ -30,8 +30,9 @@ static unsigned int filter_index, filter_vlan;
static void usage(void)
{
fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n");
fprintf(stderr, " bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
fprintf(stderr,
"Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [src SOURCE] [permanent | temp] [vid VID]\n"
" bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
exit(-1);
}
@ -40,20 +41,25 @@ static bool is_temp_mcast_rtr(__u8 type)
return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP;
}
static const char *format_timer(__u32 ticks)
static const char *format_timer(__u32 ticks, int align)
{
struct timeval tv;
static char tbuf[32];
__jiffies_to_tv(&tv, ticks);
snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
if (align)
snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
else
snprintf(tbuf, sizeof(tbuf), "%lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
return tbuf;
}
static void __print_router_port_stats(FILE *f, struct rtattr *pattr)
void br_print_router_port_stats(struct rtattr *pattr)
{
struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1];
@ -64,7 +70,7 @@ static void __print_router_port_stats(FILE *f, struct rtattr *pattr)
__u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer));
format_timer(timer, 1));
}
if (tb[MDBA_ROUTER_PATTR_TYPE]) {
@ -95,13 +101,13 @@ static void br_print_router_ports(FILE *f, struct rtattr *attr,
print_string(PRINT_JSON, "port", NULL, port_ifname);
if (show_stats)
__print_router_port_stats(f, i);
br_print_router_port_stats(i);
close_json_object();
} else if (show_stats) {
fprintf(f, "router ports on %s: %s",
brifname, port_ifname);
__print_router_port_stats(f, i);
br_print_router_port_stats(i);
fprintf(f, "\n");
} else {
fprintf(f, "%s ", port_ifname);
@ -114,42 +120,120 @@ static void br_print_router_ports(FILE *f, struct rtattr *attr,
close_json_array(PRINT_JSON, NULL);
}
static void print_src_entry(struct rtattr *src_attr, int af, const char *sep)
{
struct rtattr *stb[MDBA_MDB_SRCATTR_MAX + 1];
SPRINT_BUF(abuf);
const char *addr;
__u32 timer_val;
parse_rtattr_nested(stb, MDBA_MDB_SRCATTR_MAX, src_attr);
if (!stb[MDBA_MDB_SRCATTR_ADDRESS] || !stb[MDBA_MDB_SRCATTR_TIMER])
return;
addr = inet_ntop(af, RTA_DATA(stb[MDBA_MDB_SRCATTR_ADDRESS]), abuf,
sizeof(abuf));
if (!addr)
return;
timer_val = rta_getattr_u32(stb[MDBA_MDB_SRCATTR_TIMER]);
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%s", sep);
print_color_string(PRINT_ANY, ifa_family_color(af),
"address", "%s", addr);
print_string(PRINT_ANY, "timer", "/%s", format_timer(timer_val, 0));
close_json_object();
}
static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
struct nlmsghdr *n, struct rtattr **tb)
{
const void *grp, *src;
const char *addr;
SPRINT_BUF(abuf);
const char *dev;
const void *src;
int af;
if (filter_vlan && e->vid != filter_vlan)
return;
af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
(const void *)&e->addr.u.ip6;
if (!e->addr.proto) {
af = AF_PACKET;
grp = &e->addr.u.mac_addr;
} else if (e->addr.proto == htons(ETH_P_IP)) {
af = AF_INET;
grp = &e->addr.u.ip4;
} else {
af = AF_INET6;
grp = &e->addr.u.ip6;
}
dev = ll_index_to_name(ifindex);
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELMDB)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
print_int(PRINT_ANY, "index", "%u: ", ifindex);
print_color_string(PRINT_ANY, COLOR_IFNAME, "dev", "%s ", dev);
print_string(PRINT_ANY, "port", " %s ",
print_int(PRINT_JSON, "index", NULL, ifindex);
print_color_string(PRINT_ANY, COLOR_IFNAME, "dev", "dev %s", dev);
print_string(PRINT_ANY, "port", " port %s",
ll_index_to_name(e->ifindex));
print_color_string(PRINT_ANY, ifa_family_color(af),
"grp", " %s ",
inet_ntop(af, src, abuf, sizeof(abuf)));
/* The ETH_ALEN argument is ignored for all cases but AF_PACKET */
addr = rt_addr_n2a_r(af, ETH_ALEN, grp, abuf, sizeof(abuf));
if (!addr)
return;
print_string(PRINT_ANY, "state", " %s ",
print_color_string(PRINT_ANY, ifa_family_color(af),
"grp", " grp %s", addr);
if (tb && tb[MDBA_MDB_EATTR_SOURCE]) {
src = (const void *)RTA_DATA(tb[MDBA_MDB_EATTR_SOURCE]);
print_color_string(PRINT_ANY, ifa_family_color(af),
"src", " src %s",
inet_ntop(af, src, abuf, sizeof(abuf)));
}
print_string(PRINT_ANY, "state", " %s",
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
if (show_details && tb) {
if (tb[MDBA_MDB_EATTR_GROUP_MODE]) {
__u8 mode = rta_getattr_u8(tb[MDBA_MDB_EATTR_GROUP_MODE]);
print_string(PRINT_ANY, "filter_mode", " filter_mode %s",
mode == MCAST_INCLUDE ? "include" :
"exclude");
}
if (tb[MDBA_MDB_EATTR_SRC_LIST]) {
struct rtattr *i, *attr = tb[MDBA_MDB_EATTR_SRC_LIST];
const char *sep = " ";
int rem;
open_json_array(PRINT_ANY, is_json_context() ?
"source_list" :
" source_list");
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem);
i = RTA_NEXT(i, rem)) {
print_src_entry(i, af, sep);
sep = ",";
}
close_json_array(PRINT_JSON, NULL);
}
if (tb[MDBA_MDB_EATTR_RTPROT]) {
__u8 rtprot = rta_getattr_u8(tb[MDBA_MDB_EATTR_RTPROT]);
SPRINT_BUF(rtb);
print_string(PRINT_ANY, "protocol", " proto %s ",
rtnl_rtprot_n2a(rtprot, rtb, sizeof(rtb)));
}
}
open_json_array(PRINT_JSON, "flags");
if (e->flags & MDB_FLAGS_OFFLOAD)
print_string(PRINT_ANY, NULL, "%s ", "offload");
print_string(PRINT_ANY, NULL, " %s", "offload");
if (e->flags & MDB_FLAGS_FAST_LEAVE)
print_string(PRINT_ANY, NULL, " %s", "fast_leave");
if (e->flags & MDB_FLAGS_STAR_EXCL)
print_string(PRINT_ANY, NULL, " %s", "added_by_star_ex");
if (e->flags & MDB_FLAGS_BLOCKED)
print_string(PRINT_ANY, NULL, " %s", "blocked");
close_json_array(PRINT_JSON, NULL);
if (e->vid)
@ -159,7 +243,7 @@ static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
__u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer));
format_timer(timer, 1));
}
print_nl();
@ -177,8 +261,9 @@ static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
e = RTA_DATA(i);
parse_rtattr(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)));
parse_rtattr_flags(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)),
NLA_F_NESTED);
print_mdb_entry(f, ifindex, e, n, etb);
}
}
@ -189,10 +274,8 @@ static void print_mdb_entries(FILE *fp, struct nlmsghdr *n,
int rem = RTA_PAYLOAD(mdb);
struct rtattr *i;
open_json_array(PRINT_JSON, "mdb");
for (i = RTA_DATA(mdb); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
br_print_mdb_entry(fp, ifindex, i, n);
close_json_array(PRINT_JSON, NULL);
}
static void print_router_entries(FILE *fp, struct nlmsghdr *n,
@ -200,7 +283,6 @@ static void print_router_entries(FILE *fp, struct nlmsghdr *n,
{
const char *brifname = ll_index_to_name(ifindex);
open_json_array(PRINT_JSON, "router");
if (n->nlmsg_type == RTM_GETMDB) {
if (show_details)
br_print_router_ports(fp, router, brifname);
@ -222,15 +304,12 @@ static void print_router_entries(FILE *fp, struct nlmsghdr *n,
port_name, brifname);
}
}
close_json_array(PRINT_JSON, NULL);
}
int print_mdb(struct nlmsghdr *n, void *arg)
static int __parse_mdb_nlmsg(struct nlmsghdr *n, struct rtattr **tb)
{
FILE *fp = arg;
struct br_port_msg *r = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[MDBA_MAX+1];
if (n->nlmsg_type != RTM_GETMDB &&
n->nlmsg_type != RTM_NEWMDB &&
@ -253,6 +332,54 @@ int print_mdb(struct nlmsghdr *n, void *arg)
parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
return 1;
}
static int print_mdbs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (tb[MDBA_MDB])
print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
return 0;
}
static int print_rtrs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (tb[MDBA_ROUTER])
print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
return 0;
}
int print_mdb_mon(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (n->nlmsg_type == RTM_DELMDB)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
@ -291,24 +418,60 @@ static int mdb_show(int argc, char **argv)
}
new_json_obj(json);
open_json_object(NULL);
/* get mdb entries*/
/* get mdb entries */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
open_json_array(PRINT_JSON, "mdb");
if (rtnl_dump_filter(&rth, print_mdbs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_array(PRINT_JSON, NULL);
/* get router ports */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
open_json_object("router");
if (rtnl_dump_filter(&rth, print_rtrs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_object();
close_json_object();
delete_json_obj();
fflush(stdout);
return 0;
}
static int mdb_parse_grp(const char *grp, struct br_mdb_entry *e)
{
if (inet_pton(AF_INET, grp, &e->addr.u.ip4)) {
e->addr.proto = htons(ETH_P_IP);
return 0;
}
if (inet_pton(AF_INET6, grp, &e->addr.u.ip6)) {
e->addr.proto = htons(ETH_P_IPV6);
return 0;
}
if (ll_addr_a2n((char *)e->addr.u.mac_addr, sizeof(e->addr.u.mac_addr),
grp) == ETH_ALEN) {
e->addr.proto = 0;
return 0;
}
return -1;
}
static int mdb_modify(int cmd, int flags, int argc, char **argv)
{
struct {
@ -321,8 +484,8 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
.n.nlmsg_type = cmd,
.bpm.family = PF_BRIDGE,
};
char *d = NULL, *p = NULL, *grp = NULL, *src = NULL;
struct br_mdb_entry entry = {};
char *d = NULL, *p = NULL, *grp = NULL;
short vid = 0;
while (argc > 0) {
@ -343,6 +506,9 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
} else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG();
vid = atoi(*argv);
} else if (strcmp(*argv, "src") == 0) {
NEXT_ARG();
src = *argv;
} else {
if (matches(*argv, "help") == 0)
usage();
@ -363,17 +529,31 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
if (!entry.ifindex)
return nodev(p);
if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) {
if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) {
fprintf(stderr, "Invalid address \"%s\"\n", grp);
return -1;
} else
entry.addr.proto = htons(ETH_P_IPV6);
} else
entry.addr.proto = htons(ETH_P_IP);
if (mdb_parse_grp(grp, &entry)) {
fprintf(stderr, "Invalid address \"%s\"\n", grp);
return -1;
}
entry.vid = vid;
addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
if (src) {
struct rtattr *nest = addattr_nest(&req.n, sizeof(req),
MDBA_SET_ENTRY_ATTRS);
struct in6_addr src_ip6;
__be32 src_ip4;
nest->rta_type |= NLA_F_NESTED;
if (!inet_pton(AF_INET, src, &src_ip4)) {
if (!inet_pton(AF_INET6, src, &src_ip6)) {
fprintf(stderr, "Invalid source address \"%s\"\n", src);
return -1;
}
addattr_l(&req.n, sizeof(req), MDBE_ATTR_SOURCE, &src_ip6, sizeof(src_ip6));
} else {
addattr32(&req.n, sizeof(req), MDBE_ATTR_SOURCE, src_ip4);
}
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;

View File

@ -31,7 +31,7 @@ static int prefix_banner;
static void usage(void)
{
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | all]\n");
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | vlan | all]\n");
exit(-1);
}
@ -61,12 +61,18 @@ static int accept_msg(struct rtnl_ctrl_data *ctrl,
case RTM_DELMDB:
if (prefix_banner)
fprintf(fp, "[MDB]");
return print_mdb(n, arg);
return print_mdb_mon(n, arg);
case NLMSG_TSTAMP:
print_nlmsg_timestamp(fp, n);
return 0;
case RTM_NEWVLAN:
case RTM_DELVLAN:
if (prefix_banner)
fprintf(fp, "[VLAN]");
return print_vlan_rtm(n, arg, true, false);
default:
return 0;
}
@ -79,6 +85,7 @@ int do_monitor(int argc, char **argv)
int llink = 0;
int lneigh = 0;
int lmdb = 0;
int lvlan = 0;
rtnl_close(&rth);
@ -95,8 +102,12 @@ int do_monitor(int argc, char **argv)
} else if (matches(*argv, "mdb") == 0) {
lmdb = 1;
groups = 0;
} else if (matches(*argv, "vlan") == 0) {
lvlan = 1;
groups = 0;
} else if (strcmp(*argv, "all") == 0) {
groups = ~RTMGRP_TC;
lvlan = 1;
prefix_banner = 1;
} else if (matches(*argv, "help") == 0) {
usage();
@ -134,6 +145,12 @@ int do_monitor(int argc, char **argv)
if (rtnl_open(&rth, groups) < 0)
exit(1);
if (lvlan && rtnl_add_nl_group(&rth, RTNLGRP_BRVLAN) < 0) {
fprintf(stderr, "Failed to add bridge vlan group to list\n");
exit(1);
}
ll_init_map(&rth);
if (rtnl_listen(&rth, accept_msg, stdout) < 0)

File diff suppressed because it is too large Load Diff

228
configure vendored
View File

@ -1,8 +1,10 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# This is not an autoconf generated configure
#
INCLUDE=${1:-"$PWD/include"}
INCLUDE="$PWD/include"
PREFIX="/usr"
LIBDIR="\${prefix}/lib"
# Output file which is input to Makefile
CONFIG=config.mk
@ -16,9 +18,11 @@ check_toolchain()
: ${PKG_CONFIG:=pkg-config}
: ${AR=ar}
: ${CC=gcc}
: ${YACC=bison}
echo "PKG_CONFIG:=${PKG_CONFIG}" >>$CONFIG
echo "AR:=${AR}" >>$CONFIG
echo "CC:=${CC}" >>$CONFIG
echo "YACC:=${YACC}" >>$CONFIG
}
check_atm()
@ -115,7 +119,7 @@ EOF
check_xt_old_internal_h()
{
# bail if previous XT checks has already succeeded.
grep -q if grep -q TC_CONFIG_XT $CONFIG && return
grep -q TC_CONFIG_XT $CONFIG && return
#check if we need our own internal.h
cat >$TMPDIR/ipttest.c <<EOF
@ -146,6 +150,15 @@ EOF
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
}
check_lib_dir()
{
LIBDIR=$(echo $LIBDIR | sed "s|\${prefix}|$PREFIX|")
echo -n "lib directory: "
echo "$LIBDIR"
echo "LIBDIR:=$LIBDIR" >> $CONFIG
}
check_ipt()
{
if ! grep TC_CONFIG_XT $CONFIG > /dev/null; then
@ -195,6 +208,31 @@ EOF
rm -f $TMPDIR/setnstest.c $TMPDIR/setnstest
}
check_name_to_handle_at()
{
cat >$TMPDIR/name_to_handle_at_test.c <<EOF
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
struct file_handle *fhp;
int mount_id, flags, dirfd;
char *pathname;
name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/name_to_handle_at_test $TMPDIR/name_to_handle_at_test.c >/dev/null 2>&1; then
echo "yes"
echo "CFLAGS += -DHAVE_HANDLE_AT" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/name_to_handle_at_test.c $TMPDIR/name_to_handle_at_test
}
check_ipset()
{
cat >$TMPDIR/ipsettest.c <<EOF
@ -206,7 +244,7 @@ typedef unsigned short ip_set_id_t;
#include <linux/netfilter/xt_set.h>
struct xt_set_info info;
#if IPSET_PROTOCOL == 6
#if IPSET_PROTOCOL == 6 || IPSET_PROTOCOL == 7
int main(void)
{
return IPSET_MAXNAMELEN;
@ -238,6 +276,111 @@ check_elf()
fi
}
have_libbpf_basic()
{
cat >$TMPDIR/libbpf_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
bpf_program__set_autoload(NULL, false);
bpf_map__ifindex(NULL);
bpf_map__set_pin_path(NULL, NULL);
bpf_object__open_file(NULL, NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_test $TMPDIR/libbpf_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_test.c $TMPDIR/libbpf_test
return $ret
}
have_libbpf_sec_name()
{
cat >$TMPDIR/libbpf_sec_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
void *ptr;
bpf_program__section_name(NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_sec_test $TMPDIR/libbpf_sec_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_sec_test.c $TMPDIR/libbpf_sec_test
return $ret
}
check_force_libbpf_on()
{
# if set LIBBPF_FORCE=on but no libbpf support, just exist the config
# process to make sure we don't build without libbpf.
if [ "$LIBBPF_FORCE" = on ]; then
echo " LIBBPF_FORCE=on set, but couldn't find a usable libbpf"
exit 1
fi
}
check_libbpf()
{
# if set LIBBPF_FORCE=off, disable libbpf entirely
if [ "$LIBBPF_FORCE" = off ]; then
echo "no"
return
fi
if ! ${PKG_CONFIG} libbpf --exists && [ -z "$LIBBPF_DIR" ] ; then
echo "no"
check_force_libbpf_on
return
fi
if [ $(uname -m) = x86_64 ]; then
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib64"
else
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib"
fi
if [ -n "$LIBBPF_DIR" ]; then
LIBBPF_CFLAGS="-I${LIBBPF_DIR}/usr/include"
LIBBPF_LDLIBS="${LIBBPF_LIBDIR}/libbpf.a -lz -lelf"
LIBBPF_VERSION=$(PKG_CONFIG_LIBDIR=${LIBBPF_LIBDIR}/pkgconfig ${PKG_CONFIG} libbpf --modversion)
else
LIBBPF_CFLAGS=$(${PKG_CONFIG} libbpf --cflags)
LIBBPF_LDLIBS=$(${PKG_CONFIG} libbpf --libs)
LIBBPF_VERSION=$(${PKG_CONFIG} libbpf --modversion)
fi
if ! have_libbpf_basic; then
echo "no"
echo " libbpf version $LIBBPF_VERSION is too low, please update it to at least 0.1.0"
check_force_libbpf_on
return
else
echo "HAVE_LIBBPF:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF ' $LIBBPF_CFLAGS >> $CONFIG
echo "CFLAGS += -DLIBBPF_VERSION=\\\"$LIBBPF_VERSION\\\"" >> $CONFIG
echo 'LDLIBS += ' $LIBBPF_LDLIBS >> $CONFIG
if [ -z "$LIBBPF_DIR" ]; then
echo "CFLAGS += -DLIBBPF_DYNAMIC" >> $CONFIG
fi
fi
# bpf_program__title() is deprecated since libbpf 0.2.0, use
# bpf_program__section_name() instead if we support
if have_libbpf_sec_name; then
echo "HAVE_LIBBPF_SECTION_NAME:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF_SECTION_NAME ' >> $CONFIG
fi
echo "yes"
echo " libbpf version $LIBBPF_VERSION"
}
check_selinux()
# SELinux is a compile time option in the ss utility
{
@ -349,6 +492,76 @@ endif
EOF
}
usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
--include_dir <dir> Path to iproute2 include dir
--libdir <dir> Path to iproute2 lib dir
--libbpf_dir <dir> Path to libbpf DESTDIR
--libbpf_force <on|off> Enable/disable libbpf by force. Available options:
on: require link against libbpf, quit config if no libbpf support
off: disable libbpf probing
--prefix <dir> Path prefix of the lib files to install
-h | --help Show this usage info
EOF
exit $1
}
# Compat with the old INCLUDE path setting method.
if [ $# -eq 1 ] && [ "$(echo $1 | cut -c 1)" != '-' ]; then
INCLUDE="$1"
else
while [ "$#" -gt 0 ]; do
case "$1" in
--include_dir)
shift
INCLUDE="$1" ;;
--include_dir=*)
INCLUDE="${1#*=}" ;;
--libdir)
shift
LIBDIR="$1" ;;
--libdir=*)
LIBDIR="${1#*=}" ;;
--libbpf_dir)
shift
LIBBPF_DIR="$1" ;;
--libbpf_dir=*)
LIBBPF_DIR="${1#*=}" ;;
--libbpf_force)
shift
LIBBPF_FORCE="$1" ;;
--libbpf_force=*)
LIBBPF_FORCE="${1#*=}" ;;
--prefix)
shift
PREFIX="$1" ;;
--prefix=*)
PREFIX="${1#*=}" ;;
-h | --help)
usage 0 ;;
--*)
;;
*)
usage 1 ;;
esac
[ "$#" -gt 0 ] && shift
done
fi
[ -d "$INCLUDE" ] || usage 1
if [ "${LIBBPF_DIR-unused}" != "unused" ]; then
[ -d "$LIBBPF_DIR" ] || usage 1
fi
if [ "${LIBBPF_FORCE-unused}" != "unused" ]; then
if [ "$LIBBPF_FORCE" != 'on' ] && [ "$LIBBPF_FORCE" != 'off' ]; then
usage 1
fi
fi
[ -z "$PREFIX" ] && usage 1
[ -z "$LIBDIR" ] && usage 1
echo "# Generated config based on" $INCLUDE >$CONFIG
quiet_config >> $CONFIG
@ -372,6 +585,7 @@ if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
fi
echo
check_lib_dir
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n "iptables modules directory: "
check_ipt_lib_dir
@ -380,9 +594,15 @@ fi
echo -n "libc has setns: "
check_setns
echo -n "libc has name_to_handle_at: "
check_name_to_handle_at
echo -n "SELinux support: "
check_selinux
echo -n "libbpf support: "
check_libbpf
echo -n "ELF support: "
check_elf

1
dcb/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
dcb

31
dcb/Makefile Normal file
View File

@ -0,0 +1,31 @@
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DCBOBJ = dcb.o \
dcb_app.o \
dcb_buffer.o \
dcb_dcbx.o \
dcb_ets.o \
dcb_maxrate.o \
dcb_pfc.o
TARGETS += dcb
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
dcb: $(DCBOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DCBOBJ) $(TARGETS)

611
dcb/dcb.c Normal file
View File

@ -0,0 +1,611 @@
// SPDX-License-Identifier: GPL-2.0+
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include <libmnl/libmnl.h>
#include <getopt.h>
#include "dcb.h"
#include "mnl_utils.h"
#include "namespace.h"
#include "utils.h"
#include "version.h"
static int dcb_init(struct dcb *dcb)
{
dcb->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (dcb->buf == NULL) {
perror("Netlink buffer allocation");
return -1;
}
dcb->nl = mnlu_socket_open(NETLINK_ROUTE);
if (dcb->nl == NULL) {
perror("Open netlink socket");
goto err_socket_open;
}
new_json_obj_plain(dcb->json_output);
return 0;
err_socket_open:
free(dcb->buf);
return -1;
}
static void dcb_fini(struct dcb *dcb)
{
delete_json_obj_plain();
mnl_socket_close(dcb->nl);
free(dcb->buf);
}
static struct dcb *dcb_alloc(void)
{
struct dcb *dcb;
dcb = calloc(1, sizeof(*dcb));
if (!dcb)
return NULL;
return dcb;
}
static void dcb_free(struct dcb *dcb)
{
free(dcb);
}
struct dcb_get_attribute {
struct dcb *dcb;
int attr;
void *payload;
__u16 payload_len;
};
static int dcb_get_attribute_attr_ieee_cb(const struct nlattr *attr, void *data)
{
struct dcb_get_attribute *ga = data;
if (mnl_attr_get_type(attr) != ga->attr)
return MNL_CB_OK;
ga->payload = mnl_attr_get_payload(attr);
ga->payload_len = mnl_attr_get_payload_len(attr);
return MNL_CB_STOP;
}
static int dcb_get_attribute_attr_cb(const struct nlattr *attr, void *data)
{
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE)
return MNL_CB_OK;
return mnl_attr_parse_nested(attr, dcb_get_attribute_attr_ieee_cb, data);
}
static int dcb_get_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_get_attribute_attr_cb, data);
}
static int dcb_get_attribute_bare_cb(const struct nlmsghdr *nlh, void *data)
{
/* Bare attributes (e.g. DCB_ATTR_DCBX) are not wrapped inside an IEEE
* container, so this does not have to go through unpacking in
* dcb_get_attribute_attr_cb().
*/
return mnl_attr_parse(nlh, sizeof(struct dcbmsg),
dcb_get_attribute_attr_ieee_cb, data);
}
struct dcb_set_attribute_response {
int response_attr;
};
static int dcb_set_attribute_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_set_attribute_response *resp = data;
uint16_t len;
uint8_t err;
if (mnl_attr_get_type(attr) != resp->response_attr)
return MNL_CB_OK;
len = mnl_attr_get_payload_len(attr);
if (len != 1) {
fprintf(stderr, "Response attribute expected to have size 1, not %d\n", len);
return MNL_CB_ERROR;
}
err = mnl_attr_get_u8(attr);
if (err) {
fprintf(stderr, "Error when attempting to set attribute: %s\n",
strerror(err));
return MNL_CB_ERROR;
}
return MNL_CB_STOP;
}
static int dcb_set_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_set_attribute_attr_cb, data);
}
static int dcb_talk(struct dcb *dcb, struct nlmsghdr *nlh, mnl_cb_t cb, void *data)
{
int ret;
ret = mnl_socket_sendto(dcb->nl, nlh, nlh->nlmsg_len);
if (ret < 0) {
perror("mnl_socket_sendto");
return -1;
}
return mnlu_socket_recv_run(dcb->nl, nlh->nlmsg_seq, dcb->buf, MNL_SOCKET_BUFFER_SIZE,
cb, data);
}
static struct nlmsghdr *dcb_prepare(struct dcb *dcb, const char *dev,
uint32_t nlmsg_type, uint8_t dcb_cmd)
{
struct dcbmsg dcbm = {
.cmd = dcb_cmd,
};
struct nlmsghdr *nlh;
nlh = mnlu_msg_prepare(dcb->buf, nlmsg_type, NLM_F_REQUEST, &dcbm, sizeof(dcbm));
mnl_attr_put_strz(nlh, DCB_ATTR_IFNAME, dev);
return nlh;
}
static int __dcb_get_attribute(struct dcb *dcb, int command,
const char *dev, int attr,
void **payload_p, __u16 *payload_len_p,
int (*get_attribute_cb)(const struct nlmsghdr *nlh,
void *data))
{
struct dcb_get_attribute ga;
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_GETDCB, command);
ga = (struct dcb_get_attribute) {
.dcb = dcb,
.attr = attr,
.payload = NULL,
};
ret = dcb_talk(dcb, nlh, get_attribute_cb, &ga);
if (ret) {
perror("Attribute read");
return ret;
}
if (ga.payload == NULL) {
perror("Attribute not found");
return -ENOENT;
}
*payload_p = ga.payload;
*payload_len_p = ga.payload_len;
return 0;
}
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, DCB_CMD_IEEE_GET, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_cb);
}
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, cmd, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_bare_cb);
}
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr, void *data, size_t data_len)
{
__u16 payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, attr, &payload, &payload_len);
if (ret)
return ret;
if (payload_len != data_len) {
fprintf(stderr, "Wrong len %d, expected %zd\n", payload_len, data_len);
return -EINVAL;
}
memcpy(data, payload, data_len);
return 0;
}
static int __dcb_set_attribute(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *, struct nlmsghdr *, void *),
void *data, int response_attr)
{
struct dcb_set_attribute_response resp = {
.response_attr = response_attr,
};
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_SETDCB, command);
ret = cb(dcb, nlh, data);
if (ret)
return ret;
ret = dcb_talk(dcb, nlh, dcb_set_attribute_cb, &resp);
if (ret) {
perror("Attribute write");
return ret;
}
return 0;
}
struct dcb_set_attribute_ieee_cb {
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data);
void *data;
};
static int dcb_set_attribute_ieee_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute_ieee_cb *ieee_data = data;
struct nlattr *nest;
int ret;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE);
ret = ieee_data->cb(dcb, nlh, ieee_data->data);
if (ret)
return ret;
mnl_attr_nest_end(nlh, nest);
return 0;
}
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data)
{
struct dcb_set_attribute_ieee_cb ieee_data = {
.cb = cb,
.data = data,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_ieee_cb, &ieee_data,
DCB_ATTR_IEEE);
}
struct dcb_set_attribute {
int attr;
const void *data;
size_t data_len;
};
static int dcb_set_attribute_put(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute *dsa = data;
mnl_attr_put(nlh, dsa->attr, dsa->data_len, dsa->data);
return 0;
}
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr, const void *data, size_t data_len)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return dcb_set_attribute_va(dcb, DCB_CMD_IEEE_SET, dev,
&dcb_set_attribute_put, &dsa);
}
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_put, &dsa, response_attr);
}
void dcb_print_array_u8(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%d ", i);
print_uint(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_u64(const __u64 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%" PRIu64 " ", i);
print_u64(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_on_off(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_on_off(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < array_size; i++) {
__u8 emt = array[i];
snprintf(b, sizeof(b), "%zd:%%s ", i);
if (emt < kw_size && kw[emt])
print_string(PRINT_ANY, NULL, b, kw[emt]);
else
print_string(PRINT_ANY, NULL, b, "???");
}
}
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t))
{
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
print_array(array, size);
close_json_array(PRINT_JSON, json_name);
}
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data)
{
bool is_all = key == (__u32) -1;
if (!is_all && key > max_key) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%d\n",
what_key, what_value, what_key, max_key);
return -EINVAL;
}
if (value > max_value) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%llu\n",
what_key, what_value, what_value, max_value);
return -EINVAL;
}
if (is_all) {
for (key = 0; key <= max_key; key++)
set_array(key, value, set_array_data);
} else {
set_array(key, value, set_array_data);
}
return 0;
}
void dcb_set_u8(__u32 key, __u64 value, void *data)
{
__u8 *array = data;
array[key] = value;
}
void dcb_set_u32(__u32 key, __u64 value, void *data)
{
__u32 *array = data;
array[key] = value;
}
void dcb_set_u64(__u32 key, __u64 value, void *data)
{
__u64 *array = data;
array[key] = value;
}
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void))
{
const char *dev;
if (!argc || matches(*argv, "help") == 0) {
help();
return 0;
} else if (matches(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
if (check_ifname(dev)) {
invarg("not a valid ifname", *argv);
return -EINVAL;
}
NEXT_ARG_FWD();
return and_then(dcb, dev, argc, argv);
} else {
fprintf(stderr, "Expected `dev DEV', not `%s'", *argv);
help();
return -EINVAL;
}
}
static void dcb_help(void)
{
fprintf(stderr,
"Usage: dcb [ OPTIONS ] OBJECT { COMMAND | help }\n"
" dcb [ -f | --force ] { -b | --batch } filename [ -n | --netns ] netnsname\n"
"where OBJECT := { app | buffer | dcbx | ets | maxrate | pfc }\n"
" OPTIONS := [ -V | --Version | -i | --iec | -j | --json\n"
" | -N | --Numeric | -p | --pretty\n"
" | -s | --statistics | -v | --verbose]\n");
}
static int dcb_cmd(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_help();
return 0;
} else if (matches(*argv, "app") == 0) {
return dcb_cmd_app(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "buffer") == 0) {
return dcb_cmd_buffer(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "dcbx") == 0) {
return dcb_cmd_dcbx(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "ets") == 0) {
return dcb_cmd_ets(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "maxrate") == 0) {
return dcb_cmd_maxrate(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "pfc") == 0) {
return dcb_cmd_pfc(dcb, argc - 1, argv + 1);
}
fprintf(stderr, "Object \"%s\" is unknown\n", *argv);
return -ENOENT;
}
static int dcb_batch_cmd(int argc, char *argv[], void *data)
{
struct dcb *dcb = data;
return dcb_cmd(dcb, argc, argv);
}
static int dcb_batch(struct dcb *dcb, const char *name, bool force)
{
return do_batch(name, force, dcb_batch_cmd, dcb);
}
int main(int argc, char **argv)
{
static const struct option long_options[] = {
{ "Version", no_argument, NULL, 'V' },
{ "force", no_argument, NULL, 'f' },
{ "batch", required_argument, NULL, 'b' },
{ "iec", no_argument, NULL, 'i' },
{ "json", no_argument, NULL, 'j' },
{ "Numeric", no_argument, NULL, 'N' },
{ "pretty", no_argument, NULL, 'p' },
{ "statistics", no_argument, NULL, 's' },
{ "netns", required_argument, NULL, 'n' },
{ "help", no_argument, NULL, 'h' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
bool force = false;
struct dcb *dcb;
int opt;
int err;
int ret;
dcb = dcb_alloc();
if (!dcb) {
fprintf(stderr, "Failed to allocate memory for dcb\n");
return EXIT_FAILURE;
}
while ((opt = getopt_long(argc, argv, "b:fhijn:psvNV",
long_options, NULL)) >= 0) {
switch (opt) {
case 'V':
printf("dcb utility, iproute2-%s\n", version);
ret = EXIT_SUCCESS;
goto dcb_free;
case 'f':
force = true;
break;
case 'b':
batch_file = optarg;
break;
case 'j':
dcb->json_output = true;
break;
case 'N':
dcb->numeric = true;
break;
case 'p':
pretty = true;
break;
case 's':
dcb->stats = true;
break;
case 'n':
if (netns_switch(optarg)) {
ret = EXIT_FAILURE;
goto dcb_free;
}
break;
case 'i':
dcb->use_iec = true;
break;
case 'h':
dcb_help();
ret = EXIT_SUCCESS;
goto dcb_free;
default:
fprintf(stderr, "Unknown option.\n");
dcb_help();
ret = EXIT_FAILURE;
goto dcb_free;
}
}
argc -= optind;
argv += optind;
err = dcb_init(dcb);
if (err) {
ret = EXIT_FAILURE;
goto dcb_free;
}
if (batch_file)
err = dcb_batch(dcb, batch_file, force);
else
err = dcb_cmd(dcb, argc, argv);
if (err) {
ret = EXIT_FAILURE;
goto dcb_fini;
}
ret = EXIT_SUCCESS;
dcb_fini:
dcb_fini(dcb);
dcb_free:
dcb_free(dcb);
return ret;
}

81
dcb/dcb.h Normal file
View File

@ -0,0 +1,81 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __DCB_H__
#define __DCB_H__ 1
#include <libmnl/libmnl.h>
#include <stdbool.h>
#include <stddef.h>
/* dcb.c */
struct dcb {
char *buf;
struct mnl_socket *nl;
bool json_output;
bool stats;
bool use_iec;
bool numeric;
};
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data);
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void));
void dcb_set_u8(__u32 key, __u64 value, void *data);
void dcb_set_u32(__u32 key, __u64 value, void *data);
void dcb_set_u64(__u32 key, __u64 value, void *data);
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr,
void *data, size_t data_len);
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr,
const void *data, size_t data_len);
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data);
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr);
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t));
void dcb_print_array_u8(const __u8 *array, size_t size);
void dcb_print_array_u64(const __u64 *array, size_t size);
void dcb_print_array_on_off(const __u8 *array, size_t size);
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size);
/* dcb_app.c */
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv);
/* dcb_buffer.c */
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv);
/* dcb_dcbx.c */
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv);
/* dcb_ets.c */
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv);
/* dcb_maxrate.c */
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv);
/* dcb_pfc.c */
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv);
#endif /* __DCB_H__ */

795
dcb/dcb_app.c Normal file
View File

@ -0,0 +1,795 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <libmnl/libmnl.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
#include "rt_names.h"
static void dcb_app_help_add(void)
{
fprintf(stderr,
"Usage: dcb app { add | del | replace } dev STRING\n"
" [ default-prio PRIO ]\n"
" [ ethtype-prio ET:PRIO ]\n"
" [ stream-port-prio PORT:PRIO ]\n"
" [ dgram-port-prio PORT:PRIO ]\n"
" [ port-prio PORT:PRIO ]\n"
" [ dscp-prio INTEGER:PRIO ]\n"
"\n"
" where PRIO := { 0 .. 7 }\n"
" ET := { 0x600 .. 0xffff }\n"
" PORT := { 1 .. 65535 }\n"
" DSCP := { 0 .. 63 }\n"
"\n"
);
}
static void dcb_app_help_show_flush(void)
{
fprintf(stderr,
"Usage: dcb app { show | flush } dev STRING\n"
" [ default-prio ]\n"
" [ ethtype-prio ]\n"
" [ stream-port-prio ]\n"
" [ dgram-port-prio ]\n"
" [ port-prio ]\n"
" [ dscp-prio ]\n"
"\n"
);
}
static void dcb_app_help(void)
{
fprintf(stderr,
"Usage: dcb app help\n"
"\n"
);
dcb_app_help_show_flush();
dcb_app_help_add();
}
struct dcb_app_table {
struct dcb_app *apps;
size_t n_apps;
};
static void dcb_app_table_fini(struct dcb_app_table *tab)
{
free(tab->apps);
}
static int dcb_app_table_push(struct dcb_app_table *tab, struct dcb_app *app)
{
struct dcb_app *apps = realloc(tab->apps, (tab->n_apps + 1) * sizeof(*tab->apps));
if (apps == NULL) {
perror("Cannot allocate APP table");
return -ENOMEM;
}
tab->apps = apps;
tab->apps[tab->n_apps++] = *app;
return 0;
}
static void dcb_app_table_remove_existing(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol &&
aa->priority == ab->priority) {
found = true;
break;
}
}
if (!found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static void dcb_app_table_remove_replaced(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool present = false;
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol)
present = true;
else
continue;
if (aa->priority == ab->priority) {
found = true;
break;
}
}
/* Entries that remain in A will be removed, so keep in the
* table only APP entries whose sel/pid is mentioned in B,
* but that do not have the full sel/pid/prio match.
*/
if (present && !found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static int dcb_app_table_copy(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t i;
int ret;
for (i = 0; i < b->n_apps; i++) {
ret = dcb_app_table_push(a, &b->apps[i]);
if (ret != 0)
return ret;
}
return 0;
}
static int dcb_app_cmp(const struct dcb_app *a, const struct dcb_app *b)
{
if (a->protocol < b->protocol)
return -1;
if (a->protocol > b->protocol)
return 1;
return a->priority - b->priority;
}
static int dcb_app_cmp_cb(const void *a, const void *b)
{
return dcb_app_cmp(a, b);
}
static void dcb_app_table_sort(struct dcb_app_table *tab)
{
qsort(tab->apps, tab->n_apps, sizeof(*tab->apps), dcb_app_cmp_cb);
}
struct dcb_app_parse_mapping {
__u8 selector;
struct dcb_app_table *tab;
int err;
};
static void dcb_app_parse_mapping_cb(__u32 key, __u64 value, void *data)
{
struct dcb_app_parse_mapping *pm = data;
struct dcb_app app = {
.selector = pm->selector,
.priority = value,
.protocol = key,
};
if (pm->err)
return;
pm->err = dcb_app_table_push(pm->tab, &app);
}
static int dcb_app_parse_mapping_ethtype_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key < 0x600) {
fprintf(stderr, "Protocol IDs < 0x600 are reserved for EtherType\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("ETHTYPE", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_dscp(__u32 *key, const char *arg)
{
if (parse_mapping_num_all(key, arg) == 0)
return 0;
if (rtnl_dsfield_a2n(key, arg) != 0)
return -1;
if (*key & 0x03) {
fprintf(stderr, "The values `%s' uses non-DSCP bits.\n", arg);
return -1;
}
/* Unshift the value to convert it from dsfield to DSCP. */
*key >>= 2;
return 0;
}
static int dcb_app_parse_mapping_dscp_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("DSCP", key, 63,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_mapping_port_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key == 0) {
fprintf(stderr, "Port ID of 0 is invalid\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("PORT", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_default_prio(int *argcp, char ***argvp, struct dcb_app_table *tab)
{
int argc = *argcp;
char **argv = *argvp;
int ret = 0;
while (argc > 0) {
struct dcb_app app;
__u8 prio;
if (get_u8(&prio, *argv, 0)) {
ret = 1;
break;
}
app = (struct dcb_app){
.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE,
.protocol = 0,
.priority = prio,
};
ret = dcb_app_table_push(tab, &app);
if (ret != 0)
break;
argc--, argv++;
}
*argcp = argc;
*argvp = argv;
return ret;
}
static bool dcb_app_is_ethtype(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol != 0;
}
static bool dcb_app_is_default(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol == 0;
}
static bool dcb_app_is_dscp(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DSCP;
}
static bool dcb_app_is_stream_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_STREAM;
}
static bool dcb_app_is_dgram_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DGRAM;
}
static bool dcb_app_is_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ANY;
}
static int dcb_app_print_key_dec(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static int dcb_app_print_key_hex(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%x:", protocol);
}
static int dcb_app_print_key_dscp(__u16 protocol)
{
const char *name = rtnl_dsfield_get_name(protocol << 2);
if (!is_json_context() && name != NULL)
return print_string(PRINT_FP, NULL, "%s:", name);
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static void dcb_app_print_filtered(const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *),
int (*print_key)(__u16 protocol),
const char *json_name,
const char *fp_name)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
struct dcb_app *app = &tab->apps[i];
if (!filter(app))
continue;
if (first) {
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
first = false;
}
open_json_array(PRINT_JSON, NULL);
print_key(app->protocol);
print_uint(PRINT_ANY, NULL, "%d ", app->priority);
close_json_array(PRINT_JSON, NULL);
}
if (!first) {
close_json_array(PRINT_JSON, json_name);
print_nl();
}
}
static void dcb_app_print_ethtype_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_ethtype, dcb_app_print_key_hex,
"ethtype_prio", "ethtype-prio");
}
static void dcb_app_print_dscp_prio(const struct dcb *dcb,
const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dscp,
dcb->numeric ? dcb_app_print_key_dec
: dcb_app_print_key_dscp,
"dscp_prio", "dscp-prio");
}
static void dcb_app_print_stream_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_stream_port, dcb_app_print_key_dec,
"stream_port_prio", "stream-port-prio");
}
static void dcb_app_print_dgram_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dgram_port, dcb_app_print_key_dec,
"dgram_port_prio", "dgram-port-prio");
}
static void dcb_app_print_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_port, dcb_app_print_key_dec,
"port_prio", "port-prio");
}
static void dcb_app_print_default_prio(const struct dcb_app_table *tab)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
if (!dcb_app_is_default(&tab->apps[i]))
continue;
if (first) {
open_json_array(PRINT_JSON, "default_prio");
print_string(PRINT_FP, NULL, "default-prio ", NULL);
first = false;
}
print_uint(PRINT_ANY, NULL, "%d ", tab->apps[i].priority);
}
if (!first) {
close_json_array(PRINT_JSON, "default_prio");
print_nl();
}
}
static void dcb_app_print(const struct dcb *dcb, const struct dcb_app_table *tab)
{
dcb_app_print_ethtype_prio(tab);
dcb_app_print_default_prio(tab);
dcb_app_print_dscp_prio(dcb, tab);
dcb_app_print_stream_port_prio(tab);
dcb_app_print_dgram_port_prio(tab);
dcb_app_print_port_prio(tab);
}
static int dcb_app_get_table_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_app_table *tab = data;
struct dcb_app *app;
int ret;
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE_APP) {
fprintf(stderr, "Unknown attribute in DCB_ATTR_IEEE_APP_TABLE: %d\n",
mnl_attr_get_type(attr));
return MNL_CB_OK;
}
if (mnl_attr_get_payload_len(attr) < sizeof(struct dcb_app)) {
fprintf(stderr, "DCB_ATTR_IEEE_APP payload expected to have size %zd, not %d\n",
sizeof(struct dcb_app), mnl_attr_get_payload_len(attr));
return MNL_CB_OK;
}
app = mnl_attr_get_payload(attr);
ret = dcb_app_table_push(tab, app);
if (ret != 0)
return MNL_CB_ERROR;
return MNL_CB_OK;
}
static int dcb_app_get(struct dcb *dcb, const char *dev, struct dcb_app_table *tab)
{
uint16_t payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, DCB_ATTR_IEEE_APP_TABLE, &payload, &payload_len);
if (ret != 0)
return ret;
ret = mnl_attr_parse_payload(payload, payload_len, dcb_app_get_table_attr_cb, tab);
if (ret != MNL_CB_OK)
return -EINVAL;
return 0;
}
struct dcb_app_add_del {
const struct dcb_app_table *tab;
bool (*filter)(const struct dcb_app *app);
};
static int dcb_app_add_del_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_app_add_del *add_del = data;
struct nlattr *nest;
size_t i;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE_APP_TABLE);
for (i = 0; i < add_del->tab->n_apps; i++) {
const struct dcb_app *app = &add_del->tab->apps[i];
if (add_del->filter == NULL || add_del->filter(app))
mnl_attr_put(nlh, DCB_ATTR_IEEE_APP, sizeof(*app), app);
}
mnl_attr_nest_end(nlh, nest);
return 0;
}
static int dcb_app_add_del(struct dcb *dcb, const char *dev, int command,
const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *))
{
struct dcb_app_add_del add_del = {
.tab = tab,
.filter = filter,
};
if (tab->n_apps == 0)
return 0;
return dcb_set_attribute_va(dcb, command, dev, dcb_app_add_del_cb, &add_del);
}
static int dcb_cmd_app_parse_add_del(struct dcb *dcb, const char *dev,
int argc, char **argv, struct dcb_app_table *tab)
{
struct dcb_app_parse_mapping pm = {
.tab = tab,
};
int ret;
if (!argc) {
dcb_app_help_add();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_add();
return 0;
} else if (matches(*argv, "ethtype-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_ethtype_prio,
&pm);
} else if (matches(*argv, "default-prio") == 0) {
NEXT_ARG();
ret = dcb_app_parse_default_prio(&argc, &argv, pm.tab);
if (ret != 0) {
fprintf(stderr, "Invalid default priority %s\n", *argv);
return ret;
}
} else if (matches(*argv, "dscp-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DSCP;
ret = parse_mapping_gen(&argc, &argv,
&dcb_app_parse_dscp,
&dcb_app_parse_mapping_dscp_prio,
&pm);
} else if (matches(*argv, "stream-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_STREAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "dgram-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ANY;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_add();
return -EINVAL;
}
if (ret != 0) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
if (pm.err)
return pm.err;
} while (argc > 0);
return 0;
}
static int dcb_cmd_app_add(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_del(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
dcb_app_table_sort(&tab);
open_json_object(NULL);
if (!argc) {
dcb_app_print(dcb, &tab);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
dcb_app_print_ethtype_prio(&tab);
} else if (matches(*argv, "dscp-prio") == 0) {
dcb_app_print_dscp_prio(dcb, &tab);
} else if (matches(*argv, "stream-port-prio") == 0) {
dcb_app_print_stream_port_prio(&tab);
} else if (matches(*argv, "dgram-port-prio") == 0) {
dcb_app_print_dgram_port_prio(&tab);
} else if (matches(*argv, "port-prio") == 0) {
dcb_app_print_port_prio(&tab);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_flush(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
if (!argc) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_ethtype);
if (ret != 0)
goto out;
} else if (matches(*argv, "default-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_default);
if (ret != 0)
goto out;
} else if (matches(*argv, "dscp-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_dscp);
if (ret != 0)
goto out;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_replace(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table orig = {};
struct dcb_app_table tab = {};
struct dcb_app_table new = {};
int ret;
ret = dcb_app_get(dcb, dev, &orig);
if (ret != 0)
return ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
goto out;
/* Attempts to add an existing entry would be rejected, so drop
* these entries from tab.
*/
ret = dcb_app_table_copy(&new, &tab);
if (ret != 0)
goto out;
dcb_app_table_remove_existing(&new, &orig);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &new, NULL);
if (ret != 0) {
fprintf(stderr, "Could not add new APP entries\n");
goto out;
}
/* Remove the obsolete entries. */
dcb_app_table_remove_replaced(&orig, &tab);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &orig, NULL);
if (ret != 0) {
fprintf(stderr, "Could not remove replaced APP entries\n");
goto out;
}
out:
dcb_app_table_fini(&new);
dcb_app_table_fini(&tab);
dcb_app_table_fini(&orig);
return 0;
}
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_app_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_show, dcb_app_help_show_flush);
} else if (matches(*argv, "flush") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_flush, dcb_app_help_show_flush);
} else if (matches(*argv, "add") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_add, dcb_app_help_add);
} else if (matches(*argv, "del") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_del, dcb_app_help_add);
} else if (matches(*argv, "replace") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_replace, dcb_app_help_add);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help();
return -EINVAL;
}
}

235
dcb/dcb_buffer.c Normal file
View File

@ -0,0 +1,235 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_buffer_help_set(void)
{
fprintf(stderr,
"Usage: dcb buffer set dev STRING\n"
" [ prio-buffer PRIO-MAP ]\n"
" [ buffer-size SIZE-MAP ]\n"
"\n"
" where PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:BUFFER\n"
" SIZE-MAP := [ SIZE-MAP ] SIZE-MAPPING\n"
" SIZE-MAPPING := { all | BUFFER }:INTEGER\n"
" PRIO := { 0 .. 7 }\n"
" BUFFER := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_buffer_help_show(void)
{
fprintf(stderr,
"Usage: dcb buffer show dev STRING\n"
" [ prio-buffer ] [ buffer-size ] [ total-size ]\n"
"\n"
);
}
static void dcb_buffer_help(void)
{
fprintf(stderr,
"Usage: dcb buffer help\n"
"\n"
);
dcb_buffer_help_show();
dcb_buffer_help_set();
}
static int dcb_buffer_parse_mapping_prio_buffer(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
__u8 buf;
if (get_u8(&buf, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021Q_MAX_PRIORITIES - 1,
"BUFFER", buf, DCBX_MAX_BUFFERS - 1,
dcb_set_u8, buffer->prio2buffer);
}
static int dcb_buffer_parse_mapping_buffer_size(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
unsigned int size;
if (get_size(&size, value)) {
fprintf(stderr, "%d:%s: Illegal value for buffer size\n", key, value);
return -EINVAL;
}
return dcb_parse_mapping("BUFFER", key, DCBX_MAX_BUFFERS - 1,
"INTEGER", size, -1,
dcb_set_u32, buffer->buffer_size);
}
static void dcb_buffer_print_total_size(const struct dcbnl_buffer *buffer)
{
print_size(PRINT_ANY, "total_size", "total-size %s ", buffer->total_size);
}
static void dcb_buffer_print_prio_buffer(const struct dcbnl_buffer *buffer)
{
dcb_print_named_array("prio_buffer", "prio-buffer",
buffer->prio2buffer, ARRAY_SIZE(buffer->prio2buffer),
dcb_print_array_u8);
}
static void dcb_buffer_print_buffer_size(const struct dcbnl_buffer *buffer)
{
size_t size = ARRAY_SIZE(buffer->buffer_size);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "buffer_size");
print_string(PRINT_FP, NULL, "buffer-size ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_size(PRINT_ANY, NULL, b, buffer->buffer_size[i]);
}
close_json_array(PRINT_JSON, "buffer_size");
}
static void dcb_buffer_print(const struct dcbnl_buffer *buffer)
{
dcb_buffer_print_prio_buffer(buffer);
print_nl();
dcb_buffer_print_buffer_size(buffer);
print_nl();
dcb_buffer_print_total_size(buffer);
print_nl();
}
static int dcb_buffer_get(struct dcb *dcb, const char *dev, struct dcbnl_buffer *buffer)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_buffer_set(struct dcb *dcb, const char *dev, const struct dcbnl_buffer *buffer)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_cmd_buffer_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
if (!argc) {
dcb_buffer_help_set();
return 0;
}
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_set();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_prio_buffer, &buffer);
if (ret) {
fprintf(stderr, "Invalid priority mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "buffer-size") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_buffer_size, &buffer);
if (ret) {
fprintf(stderr, "Invalid buffer size mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_buffer_set(dcb, dev, &buffer);
}
static int dcb_cmd_buffer_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_buffer_print(&buffer);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_show();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
dcb_buffer_print_prio_buffer(&buffer);
print_nl();
} else if (matches(*argv, "buffer-size") == 0) {
dcb_buffer_print_buffer_size(&buffer);
print_nl();
} else if (matches(*argv, "total-size") == 0) {
dcb_buffer_print_total_size(&buffer);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_buffer_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_show, dcb_buffer_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_set, dcb_buffer_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help();
return -EINVAL;
}
}

192
dcb/dcb_dcbx.c Normal file
View File

@ -0,0 +1,192 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_dcbx_help_set(void)
{
fprintf(stderr,
"Usage: dcb dcbx set dev STRING\n"
" [ host | lld-managed ]\n"
" [ cee | ieee ] [ static ]\n"
"\n"
);
}
static void dcb_dcbx_help_show(void)
{
fprintf(stderr,
"Usage: dcb dcbx show dev STRING\n"
"\n"
);
}
static void dcb_dcbx_help(void)
{
fprintf(stderr,
"Usage: dcb dcbx help\n"
"\n"
);
dcb_dcbx_help_show();
dcb_dcbx_help_set();
}
struct dcb_dcbx_flag {
__u8 value;
const char *key_fp;
const char *key_json;
};
static struct dcb_dcbx_flag dcb_dcbx_flags[] = {
{DCB_CAP_DCBX_HOST, "host"},
{DCB_CAP_DCBX_LLD_MANAGED, "lld-managed", "lld_managed"},
{DCB_CAP_DCBX_VER_CEE, "cee"},
{DCB_CAP_DCBX_VER_IEEE, "ieee"},
{DCB_CAP_DCBX_STATIC, "static"},
};
static void dcb_dcbx_print(__u8 dcbx)
{
int bit;
int i;
while ((bit = ffs(dcbx))) {
bool found = false;
bit--;
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (flag->value == 1 << bit) {
print_bool(PRINT_JSON, flag->key_json ?: flag->key_fp,
NULL, true);
print_string(PRINT_FP, NULL, "%s ", flag->key_fp);
found = true;
break;
}
}
if (!found)
fprintf(stderr, "Unknown DCBX bit %#x.\n", 1 << bit);
dcbx &= ~(1 << bit);
}
print_nl();
}
static int dcb_dcbx_get(struct dcb *dcb, const char *dev, __u8 *dcbx)
{
__u16 payload_len;
void *payload;
int err;
err = dcb_get_attribute_bare(dcb, DCB_CMD_IEEE_GET, dev, DCB_ATTR_DCBX,
&payload, &payload_len);
if (err != 0)
return err;
if (payload_len != 1) {
fprintf(stderr, "DCB_ATTR_DCBX payload has size %d, expected 1.\n",
payload_len);
return -EINVAL;
}
*dcbx = *(__u8 *) payload;
return 0;
}
static int dcb_dcbx_set(struct dcb *dcb, const char *dev, __u8 dcbx)
{
return dcb_set_attribute_bare(dcb, DCB_CMD_SDCBX, dev, DCB_ATTR_DCBX,
&dcbx, 1, DCB_ATTR_DCBX);
}
static int dcb_cmd_dcbx_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx = 0;
__u8 i;
if (!argc) {
dcb_dcbx_help_set();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_set();
return 0;
}
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (matches(*argv, flag->key_fp) == 0) {
dcbx |= flag->value;
NEXT_ARG_FWD();
goto next;
}
}
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_set();
return -EINVAL;
next:
;
} while (argc > 0);
return dcb_dcbx_set(dcb, dev, dcbx);
}
static int dcb_cmd_dcbx_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx;
int ret;
ret = dcb_dcbx_get(dcb, dev, &dcbx);
if (ret != 0)
return ret;
while (argc > 0) {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_show();
return 0;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
}
open_json_object(NULL);
dcb_dcbx_print(dcbx);
close_json_object();
return 0;
}
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_dcbx_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_show, dcb_dcbx_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_set, dcb_dcbx_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help();
return -EINVAL;
}
}

435
dcb/dcb_ets.c Normal file
View File

@ -0,0 +1,435 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_ets_help_set(void)
{
fprintf(stderr,
"Usage: dcb ets set dev STRING\n"
" [ willing { on | off } ]\n"
" [ { tc-tsa | reco-tc-tsa } TSA-MAP ]\n"
" [ { pg-bw | tc-bw | reco-tc-bw } BW-MAP ]\n"
" [ { prio-tc | reco-prio-tc } PRIO-MAP ]\n"
"\n"
" where TSA-MAP := [ TSA-MAP ] TSA-MAPPING\n"
" TSA-MAPPING := { all | TC }:{ strict | cbs | ets | vendor }\n"
" BW-MAP := [ BW-MAP ] BW-MAPPING\n"
" BW-MAPPING := { all | TC }:INTEGER\n"
" PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:TC\n"
" TC := { 0 .. 7 }\n"
" PRIO := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_ets_help_show(void)
{
fprintf(stderr,
"Usage: dcb ets show dev STRING\n"
" [ willing ] [ ets-cap ] [ cbs ] [ tc-tsa ]\n"
" [ reco-tc-tsa ] [ pg-bw ] [ tc-bw ] [ reco-tc-bw ]\n"
" [ prio-tc ] [ reco-prio-tc ]\n"
"\n"
);
}
static void dcb_ets_help(void)
{
fprintf(stderr,
"Usage: dcb ets help\n"
"\n"
);
dcb_ets_help_show();
dcb_ets_help_set();
}
static const char *const tsa_names[] = {
[IEEE_8021QAZ_TSA_STRICT] = "strict",
[IEEE_8021QAZ_TSA_CB_SHAPER] = "cbs",
[IEEE_8021QAZ_TSA_ETS] = "ets",
[IEEE_8021QAZ_TSA_VENDOR] = "vendor",
};
static int dcb_ets_parse_mapping_tc_tsa(__u32 key, char *value, void *data)
{
__u8 tsa;
int ret;
tsa = parse_one_of("TSA", value, tsa_names, ARRAY_SIZE(tsa_names), &ret);
if (ret)
return ret;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"TSA", tsa, -1U,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_tc_bw(__u32 key, char *value, void *data)
{
__u8 bw;
if (get_u8(&bw, value, 0))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"BW", bw, 100,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_prio_tc(unsigned int key, char *value, void *data)
{
__u8 tc;
if (get_u8(&tc, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"TC", tc, IEEE_8021QAZ_MAX_TCS - 1,
dcb_set_u8, data);
}
static void dcb_print_array_tsa(const __u8 *array, size_t size)
{
dcb_print_array_kw(array, size, tsa_names, ARRAY_SIZE(tsa_names));
}
static void dcb_ets_print_willing(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "willing", "willing %s ", ets->willing);
}
static void dcb_ets_print_ets_cap(const struct ieee_ets *ets)
{
print_uint(PRINT_ANY, "ets_cap", "ets-cap %d ", ets->ets_cap);
}
static void dcb_ets_print_cbs(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "cbs", "cbs %s ", ets->cbs);
}
static void dcb_ets_print_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_bw", "tc-bw",
ets->tc_tx_bw, ARRAY_SIZE(ets->tc_tx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_pg_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("pg_bw", "pg-bw",
ets->tc_rx_bw, ARRAY_SIZE(ets->tc_rx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_tsa", "tc-tsa",
ets->tc_tsa, ARRAY_SIZE(ets->tc_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("prio_tc", "prio-tc",
ets->prio_tc, ARRAY_SIZE(ets->prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_bw", "reco-tc-bw",
ets->tc_reco_bw, ARRAY_SIZE(ets->tc_reco_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_tsa", "reco-tc-tsa",
ets->tc_reco_tsa, ARRAY_SIZE(ets->tc_reco_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_reco_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_prio_tc", "reco-prio-tc",
ets->reco_prio_tc, ARRAY_SIZE(ets->reco_prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print(const struct ieee_ets *ets)
{
dcb_ets_print_willing(ets);
dcb_ets_print_ets_cap(ets);
dcb_ets_print_cbs(ets);
print_nl();
dcb_ets_print_tc_bw(ets);
print_nl();
dcb_ets_print_pg_bw(ets);
print_nl();
dcb_ets_print_tc_tsa(ets);
print_nl();
dcb_ets_print_prio_tc(ets);
print_nl();
dcb_ets_print_reco_tc_bw(ets);
print_nl();
dcb_ets_print_reco_tc_tsa(ets);
print_nl();
dcb_ets_print_reco_prio_tc(ets);
print_nl();
}
static int dcb_ets_get(struct dcb *dcb, const char *dev, struct ieee_ets *ets)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_ets_validate_bw(const __u8 bw[], const __u8 tsa[], const char *what)
{
bool has_ets = false;
unsigned int total = 0;
unsigned int tc;
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (tsa[tc] == IEEE_8021QAZ_TSA_ETS) {
has_ets = true;
break;
}
}
/* TC bandwidth is only intended for ETS, but 802.1Q-2018 only requires
* that the sum be 100, and individual entries 0..100. It explicitly
* notes that non-ETS TCs can have non-0 TC bandwidth during
* reconfiguration.
*/
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (bw[tc] > 100) {
fprintf(stderr, "%d%% for TC %d of %s is not a valid bandwidth percentage, expected 0..100%%\n",
bw[tc], tc, what);
return -EINVAL;
}
total += bw[tc];
}
/* This is what 802.1Q-2018 requires. */
if (total == 100)
return 0;
/* But this requirement does not make sense for all-strict
* configurations. Anything else than 0 does not make sense: either BW
* has not been reconfigured for the all-strict allocation yet, at which
* point we expect sum of 100. Or it has already been reconfigured, at
* which point accept 0.
*/
if (!has_ets && total == 0)
return 0;
fprintf(stderr, "Bandwidth percentages in %s sum to %d%%, expected %d%%\n",
what, total, has_ets ? 100 : 0);
return -EINVAL;
}
static int dcb_ets_set(struct dcb *dcb, const char *dev, const struct ieee_ets *ets)
{
/* Do not validate pg-bw, which is not standard and has unclear
* meaning.
*/
if (dcb_ets_validate_bw(ets->tc_tx_bw, ets->tc_tsa, "tc-bw") ||
dcb_ets_validate_bw(ets->tc_reco_bw, ets->tc_reco_tsa, "reco-tc-bw"))
return -EINVAL;
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_cmd_ets_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
if (!argc) {
dcb_ets_help_set();
return 1;
}
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_set();
return 0;
} else if (matches(*argv, "willing") == 0) {
NEXT_ARG();
ets.willing = parse_on_off("willing", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_tsa);
if (ret) {
fprintf(stderr, "Invalid tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_reco_tsa);
if (ret) {
fprintf(stderr, "Invalid reco-tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_tx_bw);
if (ret) {
fprintf(stderr, "Invalid tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "pg-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_rx_bw);
if (ret) {
fprintf(stderr, "Invalid pg-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_reco_bw);
if (ret) {
fprintf(stderr, "Invalid reco-tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.prio_tc);
if (ret) {
fprintf(stderr, "Invalid prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.reco_prio_tc);
if (ret) {
fprintf(stderr, "Invalid reco-prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_ets_set(dcb, dev, &ets);
}
static int dcb_cmd_ets_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_ets_print(&ets);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_show();
return 0;
} else if (matches(*argv, "willing") == 0) {
dcb_ets_print_willing(&ets);
print_nl();
} else if (matches(*argv, "ets-cap") == 0) {
dcb_ets_print_ets_cap(&ets);
print_nl();
} else if (matches(*argv, "cbs") == 0) {
dcb_ets_print_cbs(&ets);
print_nl();
} else if (matches(*argv, "tc-tsa") == 0) {
dcb_ets_print_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-tsa") == 0) {
dcb_ets_print_reco_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "tc-bw") == 0) {
dcb_ets_print_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "pg-bw") == 0) {
dcb_ets_print_pg_bw(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-bw") == 0) {
dcb_ets_print_reco_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "prio-tc") == 0) {
dcb_ets_print_prio_tc(&ets);
print_nl();
} else if (matches(*argv, "reco-prio-tc") == 0) {
dcb_ets_print_reco_prio_tc(&ets);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_ets_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_show, dcb_ets_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_set, dcb_ets_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help();
return -EINVAL;
}
}

182
dcb/dcb_maxrate.c Normal file
View File

@ -0,0 +1,182 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_maxrate_help_set(void)
{
fprintf(stderr,
"Usage: dcb maxrate set dev STRING\n"
" [ tc-maxrate RATE-MAP ]\n"
"\n"
" where RATE-MAP := [ RATE-MAP ] RATE-MAPPING\n"
" RATE-MAPPING := { all | TC }:RATE\n"
" TC := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_maxrate_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -i ] maxrate show dev STRING\n"
" [ tc-maxrate ]\n"
"\n"
);
}
static void dcb_maxrate_help(void)
{
fprintf(stderr,
"Usage: dcb maxrate help\n"
"\n"
);
dcb_maxrate_help_show();
dcb_maxrate_help_set();
}
static int dcb_maxrate_parse_mapping_tc_maxrate(__u32 key, char *value, void *data)
{
__u64 rate;
if (get_rate64(&rate, value))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"RATE", rate, -1,
dcb_set_u64, data);
}
static void dcb_maxrate_print_tc_maxrate(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
size_t size = ARRAY_SIZE(maxrate->tc_maxrate);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "tc_maxrate");
print_string(PRINT_FP, NULL, "tc-maxrate ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_rate(dcb->use_iec, PRINT_ANY, NULL, b, maxrate->tc_maxrate[i]);
}
close_json_array(PRINT_JSON, "tc_maxrate");
}
static void dcb_maxrate_print(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
dcb_maxrate_print_tc_maxrate(dcb, maxrate);
print_nl();
}
static int dcb_maxrate_get(struct dcb *dcb, const char *dev, struct ieee_maxrate *maxrate)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_maxrate_set(struct dcb *dcb, const char *dev, const struct ieee_maxrate *maxrate)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_cmd_maxrate_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
if (!argc) {
dcb_maxrate_help_set();
return 0;
}
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_set();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_maxrate_parse_mapping_tc_maxrate, &maxrate);
if (ret) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_maxrate_set(dcb, dev, &maxrate);
}
static int dcb_cmd_maxrate_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_maxrate_print(dcb, &maxrate);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_show();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
dcb_maxrate_print_tc_maxrate(dcb, &maxrate);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_maxrate_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_show, dcb_maxrate_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_set, dcb_maxrate_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help();
return -EINVAL;
}
}

286
dcb/dcb_pfc.c Normal file
View File

@ -0,0 +1,286 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_pfc_help_set(void)
{
fprintf(stderr,
"Usage: dcb pfc set dev STRING\n"
" [ prio-pfc PFC-MAP ]\n"
" [ macsec-bypass { on | off } ]\n"
" [ delay INTEGER ]\n"
"\n"
" where PFC-MAP := [ PFC-MAP ] PFC-MAPPING\n"
" PFC-MAPPING := { all | TC }:PFC\n"
" TC := { 0 .. 7 }\n"
" PFC := { on | off }\n"
"\n"
);
}
static void dcb_pfc_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -s ] pfc show dev STRING\n"
" [ pfc-cap ] [ prio-pfc ] [ macsec-bypass ]\n"
" [ delay ] [ requests ] [ indications ]\n"
"\n"
);
}
static void dcb_pfc_help(void)
{
fprintf(stderr,
"Usage: dcb pfc help\n"
"\n"
);
dcb_pfc_help_show();
dcb_pfc_help_set();
}
static void dcb_pfc_to_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 pfc_en)
{
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
array[i] = !!(pfc_en & (1 << i));
}
static void dcb_pfc_from_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 *pfc_en_p)
{
__u8 pfc_en = 0;
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
if (array[i])
pfc_en |= 1 << i;
}
*pfc_en_p = pfc_en;
}
static int dcb_pfc_parse_mapping_prio_pfc(__u32 key, char *value, void *data)
{
struct ieee_pfc *pfc = data;
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
bool enabled;
int ret;
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
enabled = parse_on_off("PFC", value, &ret);
if (ret)
return ret;
ret = dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"PFC", enabled, -1,
dcb_set_u8, pfc_en);
if (ret)
return ret;
dcb_pfc_from_array(pfc_en, &pfc->pfc_en);
return 0;
}
static void dcb_pfc_print_pfc_cap(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "pfc_cap", "pfc-cap %d ", pfc->pfc_cap);
}
static void dcb_pfc_print_macsec_bypass(const struct ieee_pfc *pfc)
{
print_on_off(PRINT_ANY, "macsec_bypass", "macsec-bypass %s ", pfc->mbc);
}
static void dcb_pfc_print_delay(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "delay", "delay %d ", pfc->delay);
}
static void dcb_pfc_print_prio_pfc(const struct ieee_pfc *pfc)
{
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
dcb_print_named_array("prio_pfc", "prio-pfc",
pfc_en, ARRAY_SIZE(pfc_en), &dcb_print_array_on_off);
}
static void dcb_pfc_print_requests(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "requests");
print_string(PRINT_FP, NULL, "requests ", NULL);
dcb_print_array_u64(pfc->requests, ARRAY_SIZE(pfc->requests));
close_json_array(PRINT_JSON, "requests");
}
static void dcb_pfc_print_indications(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "indications");
print_string(PRINT_FP, NULL, "indications ", NULL);
dcb_print_array_u64(pfc->indications, ARRAY_SIZE(pfc->indications));
close_json_array(PRINT_JSON, "indications");
}
static void dcb_pfc_print(const struct dcb *dcb, const struct ieee_pfc *pfc)
{
dcb_pfc_print_pfc_cap(pfc);
dcb_pfc_print_macsec_bypass(pfc);
dcb_pfc_print_delay(pfc);
print_nl();
dcb_pfc_print_prio_pfc(pfc);
print_nl();
if (dcb->stats) {
dcb_pfc_print_requests(pfc);
print_nl();
dcb_pfc_print_indications(pfc);
print_nl();
}
}
static int dcb_pfc_get(struct dcb *dcb, const char *dev, struct ieee_pfc *pfc)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_pfc_set(struct dcb *dcb, const char *dev, const struct ieee_pfc *pfc)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_cmd_pfc_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
if (!argc) {
dcb_pfc_help_set();
return 0;
}
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_set();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_pfc_parse_mapping_prio_pfc, &pfc);
if (ret) {
fprintf(stderr, "Invalid pfc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "macsec-bypass") == 0) {
NEXT_ARG();
pfc.mbc = parse_on_off("macsec-bypass", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "delay") == 0) {
NEXT_ARG();
/* Do not support the size notations for delay.
* Delay is specified in "bit times", not bits, so
* it is not applicable. At the same time it would
* be confusing that 10Kbit does not mean 10240,
* but 1280.
*/
if (get_u16(&pfc.delay, *argv, 0)) {
fprintf(stderr, "Invalid delay `%s', expected an integer 0..65535\n",
*argv);
return -EINVAL;
}
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_pfc_set(dcb, dev, &pfc);
}
static int dcb_cmd_pfc_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_pfc_print(dcb, &pfc);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_show();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
dcb_pfc_print_prio_pfc(&pfc);
print_nl();
} else if (matches(*argv, "pfc-cap") == 0) {
dcb_pfc_print_pfc_cap(&pfc);
print_nl();
} else if (matches(*argv, "macsec-bypass") == 0) {
dcb_pfc_print_macsec_bypass(&pfc);
print_nl();
} else if (matches(*argv, "delay") == 0) {
dcb_pfc_print_delay(&pfc);
print_nl();
} else if (matches(*argv, "requests") == 0) {
dcb_pfc_print_requests(&pfc);
print_nl();
} else if (matches(*argv, "indications") == 0) {
dcb_pfc_print_indications(&pfc);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_pfc_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_show, dcb_pfc_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_set, dcb_pfc_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help();
return -EINVAL;
}
}

View File

@ -7,12 +7,13 @@ ifeq ($(HAVE_MNL),y)
DEVLINKOBJ = devlink.o mnlg.o
TARGETS += devlink
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
devlink: $(DEVLINKOBJ)
devlink: $(DEVLINKOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all

File diff suppressed because it is too large Load Diff

View File

@ -14,11 +14,11 @@
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <time.h>
#include <libmnl/libmnl.h>
#include <linux/genetlink.h>
#include "libnetlink.h"
#include "mnl_utils.h"
#include "utils.h"
#include "mnlg.h"
@ -28,90 +28,13 @@ struct mnlg_socket {
uint32_t id;
uint8_t version;
unsigned int seq;
unsigned int portid;
};
static struct nlmsghdr *__mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags, uint32_t id,
uint8_t version)
{
struct nlmsghdr *nlh;
struct genlmsghdr *genl;
nlh = mnl_nlmsg_put_header(nlg->buf);
nlh->nlmsg_type = id;
nlh->nlmsg_flags = flags;
nlg->seq = time(NULL);
nlh->nlmsg_seq = nlg->seq;
genl = mnl_nlmsg_put_extra_header(nlh, sizeof(struct genlmsghdr));
genl->cmd = cmd;
genl->version = version;
return nlh;
}
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags)
{
return __mnlg_msg_prepare(nlg, cmd, flags, nlg->id, nlg->version);
}
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh)
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh)
{
return mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
}
static int mnlg_cb_noop(const struct nlmsghdr *nlh, void *data)
{
return MNL_CB_OK;
}
static int mnlg_cb_error(const struct nlmsghdr *nlh, void *data)
{
const struct nlmsgerr *err = mnl_nlmsg_get_payload(nlh);
/* Netlink subsystems returns the errno value with different signess */
if (err->error < 0)
errno = -err->error;
else
errno = err->error;
if (nl_dump_ext_ack(nlh, NULL))
return MNL_CB_ERROR;
return err->error == 0 ? MNL_CB_STOP : MNL_CB_ERROR;
}
static int mnlg_cb_stop(const struct nlmsghdr *nlh, void *data)
{
return MNL_CB_STOP;
}
static mnl_cb_t mnlg_cb_array[NLMSG_MIN_TYPE] = {
[NLMSG_NOOP] = mnlg_cb_noop,
[NLMSG_ERROR] = mnlg_cb_error,
[NLMSG_DONE] = mnlg_cb_stop,
[NLMSG_OVERRUN] = mnlg_cb_noop,
};
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data)
{
int err;
do {
err = mnl_socket_recvfrom(nlg->nl, nlg->buf,
MNL_SOCKET_BUFFER_SIZE);
if (err <= 0)
break;
err = mnl_cb_run2(nlg->buf, err, nlg->seq, nlg->portid,
data_cb, data, mnlg_cb_array,
ARRAY_SIZE(mnlg_cb_array));
} while (err > 0);
return err;
}
struct group_info {
bool found;
uint32_t id;
@ -191,15 +114,17 @@ static int get_group_id_cb(const struct nlmsghdr *nlh, void *data)
return MNL_CB_OK;
}
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name)
{
struct nlmsghdr *nlh;
struct group_info group_info;
int err;
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);
nlh = _mnlu_gen_socket_cmd_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK,
GENL_ID_CTRL, 1);
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->family);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
@ -207,7 +132,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
group_info.found = false;
group_info.name = group_name;
err = mnlg_socket_recv_run(nlg, get_group_id_cb, &group_info);
err = mnlu_gen_socket_recv_run(nlg, get_group_id_cb, &group_info);
if (err < 0)
return err;
@ -224,92 +149,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
return 0;
}
static int get_family_id_attr_cb(const struct nlattr *attr, void *data)
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg)
{
const struct nlattr **tb = data;
int type = mnl_attr_get_type(attr);
if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0)
return MNL_CB_ERROR;
if (type == CTRL_ATTR_FAMILY_ID &&
mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
}
static int get_family_id_cb(const struct nlmsghdr *nlh, void *data)
{
uint32_t *p_id = data;
struct nlattr *tb[CTRL_ATTR_MAX + 1] = {};
struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
mnl_attr_parse(nlh, sizeof(*genl), get_family_id_attr_cb, tb);
if (!tb[CTRL_ATTR_FAMILY_ID])
return MNL_CB_ERROR;
*p_id = mnl_attr_get_u16(tb[CTRL_ATTR_FAMILY_ID]);
return MNL_CB_OK;
}
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version)
{
struct mnlg_socket *nlg;
struct nlmsghdr *nlh;
int one = 1;
int err;
nlg = malloc(sizeof(*nlg));
if (!nlg)
return NULL;
nlg->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (!nlg->buf)
goto err_buf_alloc;
nlg->nl = mnl_socket_open(NETLINK_GENERIC);
if (!nlg->nl)
goto err_mnl_socket_open;
/* Older kernels may no support capped/extended ACK reporting */
mnl_socket_setsockopt(nlg->nl, NETLINK_CAP_ACK, &one, sizeof(one));
mnl_socket_setsockopt(nlg->nl, NETLINK_EXT_ACK, &one, sizeof(one));
err = mnl_socket_bind(nlg->nl, 0, MNL_SOCKET_AUTOPID);
if (err < 0)
goto err_mnl_socket_bind;
nlg->portid = mnl_socket_get_portid(nlg->nl);
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, family_name);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
goto err_mnlg_socket_send;
err = mnlg_socket_recv_run(nlg, get_family_id_cb, &nlg->id);
if (err < 0)
goto err_mnlg_socket_recv_run;
nlg->version = version;
return nlg;
err_mnlg_socket_recv_run:
err_mnlg_socket_send:
err_mnl_socket_bind:
mnl_socket_close(nlg->nl);
err_mnl_socket_open:
free(nlg->buf);
err_buf_alloc:
free(nlg);
return NULL;
}
void mnlg_socket_close(struct mnlg_socket *nlg)
{
mnl_socket_close(nlg->nl);
free(nlg->buf);
free(nlg);
return mnl_socket_get_fd(nlg->nl);
}

View File

@ -14,14 +14,10 @@
#include <libmnl/libmnl.h>
struct mnlg_socket;
struct mnlu_gen_socket;
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags);
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh);
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data);
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name);
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version);
void mnlg_socket_close(struct mnlg_socket *nlg);
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh);
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name);
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg);
#endif /* _MNLG_H_ */

View File

@ -6,8 +6,8 @@ What is it?
-----------
An extension to the filtering/classification architecture of Linux Traffic
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
i.e you could say something like:
-----
@ -17,7 +17,7 @@ tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
which implies "if a packet is seen on the ingress of the lo device with
a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and
we execute a policing action which rate limits its bandwidth utilization
we execute a policing action which rate limits its bandwidth utilization
to 1.5Mbps".
The new extensions allow for more than just policing actions to be added.
@ -29,9 +29,9 @@ syntax which will work fine. Of course to get the required effect you need
both newer tc and kernel. If you are reading this you have the
right tc ;->
A side effect is that we can now get stateless firewalling to work with tc.
A side effect is that we can now get stateless firewalling to work with tc.
Essentially this is now an alternative to iptables.
I won't go into details of my dislike for iptables at times, but
I won't go into details of my dislike for iptables at times, but
scalability is one of the main issues; however, if you need stateful
classification - use netfilter (for now).
@ -61,7 +61,7 @@ tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
match ip src 127.0.0.1/32 flowid 1:1 \
action police mtu 4000 rate 1500kbit burst 90k
" generic Actions" (gact) at the moment are:
" generic Actions" (gact) at the moment are:
{ drop, pass, reclassify, continue}
(If you have others, no listed here give me a reason and we will add them)
+drop says to drop the packet
@ -93,43 +93,43 @@ decimal 12, then use flowid 1:c.
3) A feature i call pipe
The motivation is derived from Unix pipe mechanism but applied to packets.
Essentially take a matching packet and pass it through
Essentially take a matching packet and pass it through
action1 | action2 | action3 etc.
You could do something similar to this with the tc policer and the "continue"
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
as an example -- and please note that this is just an example _not_ The
as an example -- and please note that this is just an example _not_ The
Word Youve Been Waiting For (yes i have had problems giving examples
which ended becoming dogma in documents and people modifying them a little
to look clever);
to look clever);
i selected the metering rates to be small so that i can show better how
i selected the metering rates to be small so that i can show better how
things work.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- It is then metered to make sure it does not exceed its allocated rate of
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesn't exceed rate, this is where we terminate action execution.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
then passed through a second meter.
-The second meter is shared across all flows on that device [i am surpised
that this seems to be not a well know feature of the policer; Bert was telling
-The second meter is shared across all flows on that device [i am surpised
that this seems to be not a well know feature of the policer; Bert was telling
me that someone was writing a qdisc just to do sharing across multiple devices;
it must be the summer heat again; weve had someone doing that every year around
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
share.]
-If the second meter is exceeded the color of the flow changes further to 3.
-We then pass the packet to another meter which is shared across all devices
in the system. If this meter is exceeded we drop the packet.
Note the mark can be used further up the system to do things like policy
Note the mark can be used further up the system to do things like policy
or more interesting things on the egress.
------------------ cut here -------------------------------
@ -161,31 +161,31 @@ action ipt -j mark --set-mark 3 \
# and then attempt to borrow from a meter used by all devices in the
# system. Should this be exceeded, drop the packet on the floor.
action police index 20 mtu 5000 rate 1kbit burst 90k drop
---------------------------------
---------------------------------
Now lets see the actions installed with
Now lets see the actions installed with
"tc filter show parent ffff: dev eth0"
-------- output -----------
jroot# tc filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
match 0a000015/ffffffff at 12
-------------------------------
@ -209,31 +209,31 @@ Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0"
--------------
jroot# tc -s filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
5
5
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
match 0a000015/ffffffff at 12
-------------------------------
@ -254,4 +254,3 @@ At the moment the focus has been on getting the architecture in place.
Expect new things in the spurious time i have to work on this
(particularly around end of year when i have typically get time off
from work).

View File

@ -1,16 +1,16 @@
gact <ACTION> [RAND] [INDEX]
Where:
ACTION := reclassify | drop | continue | pass | ok
Where:
ACTION := reclassify | drop | continue | pass | ok
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
INDEX := index value used
ACTION semantics
- pass and ok are equivalent to accept
- continue allows to restart classification lookup
- continue allows one to restart classification lookup
- drop drops packets
- reclassify implies continue classification where we left off
@ -42,14 +42,14 @@ filter u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 32 suc
random type none pass val 0
index 1 ref 1 bind 1 installed 59 sec used 35 sec
Sent 1680 bytes 20 pkts (dropped 20, overlimits 0 )
----
# example 2
#allow 1 out 10 randomly using the netrand generator
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random netrand ok 10
ping -c 20 10.0.0.9
----
@ -59,14 +59,14 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
random type netrand pass val 10
index 5 ref 1 bind 1 installed 49 sec used 25 sec
Sent 1680 bytes 20 pkts (dropped 16, overlimits 0 )
--------
#alternative: deterministically accept every second packet
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random determ ok 2
ping -c 20 10.0.0.9
tc -s filter show parent ffff: dev eth0
-----
filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1filter protocol ip pref 6 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 20 success 20)
@ -76,4 +76,3 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
index 4 ref 1 bind 1 installed 118 sec used 82 sec
Sent 1680 bytes 20 pkts (dropped 10, overlimits 0 )
-----

View File

@ -6,18 +6,18 @@ with a _lot_ less code.
Known IMQ/IFB USES
------------------
As far as i know the reasons listed below is why people use IMQ.
As far as i know the reasons listed below is why people use IMQ.
It would be nice to know of anything else that i missed.
1) qdiscs/policies that are per device as opposed to system wide.
IFB allows for sharing.
2) Allows for queueing incoming traffic for shaping instead of
dropping. I am not aware of any study that shows policing is
dropping. I am not aware of any study that shows policing is
worse than shaping in achieving the end goal of rate control.
I would be interested if anyone is experimenting.
3) Very interesting use: if you are serving p2p you may want to give
3) Very interesting use: if you are serving p2p you may want to give
preference to your own locally originated traffic (when responses come back)
vs someone using your system to do bittorent. So QoSing based on state
comes in as the solution. What people did to achieve this was stick
@ -25,17 +25,17 @@ the IMQ somewhere prelocal hook.
I think this is a pretty neat feature to have in Linux in general.
(i.e not just for IMQ).
But i won't go back to putting netfilter hooks in the device to satisfy
this. I also don't think its worth it hacking ifb some more to be
this. I also don't think its worth it hacking ifb some more to be
aware of say L3 info and play ip rule tricks to achieve this.
--> Instead the plan is to have a conntrack related action. This action will
selectively either query/create conntrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
selectively either query/create conntrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didn't have existing state. This
all however is dependent on whatever rules the admin enters.
At the moment this 3rd function does not exist yet. I have decided that
instead of sitting on the patch for another year, to release it and then
instead of sitting on the patch for another year, to release it and then
if there is pressure i will add this feature.
An example, to provide functionality that most people use IMQ for below:
@ -43,10 +43,10 @@ An example, to provide functionality that most people use IMQ for below:
--------
export TC="/sbin/tc"
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 parent 1:1 handle 10: sfq
$TC qdisc add dev ifb0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC filter add dev ifb0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1
$TC filter add dev ifb0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2
@ -54,7 +54,7 @@ ifconfig ifb0 up
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving in eth0 to ifb0
# redirect all IP packets arriving in eth0 to ifb0
# use mark 1 --> puts them onto class 1:1
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 \
@ -77,44 +77,44 @@ PING 10.22 (10.0.0.22): 56 data bytes
--- 10.22 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/1.3/2.8 ms
[root@jzny action-tests]#
[root@jzny action-tests]#
-----
Now look at some stats:
---
[root@jmandrake]:~# $TC -s filter show parent ffff: dev eth0
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
match 00000000/00000000 at 0
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 2: mirred (Egress Redirect to device ifb0) stolen
index 1 ref 1 bind 1 installed 165 sec used 27 sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# $TC -s qdisc
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc prio 1: dev ifb0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# ifconfig ifb0
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:6 errors:0 dropped:3 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:32
collisions:0 txqueuelen:32
RX bytes:504 (504.0 b) TX bytes:252 (252.0 b)
-----

View File

@ -7,10 +7,10 @@ flow to be mirrored. High end switches typically can select based
on more than just a port (eg a 5 tuple classifier). They may also be
capable of redirecting.
Usage:
Usage:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
DIRECTION := <ingress | egress>
ACTION := <mirror | redirect>
INDEX is the specific policy instance id
@ -18,7 +18,7 @@ DEVICENAME is the devicename
Direction:
- Ingress is not supported at the moment. It will be in the
future as well as mirror/redirecting to a socket.
future as well as mirror/redirecting to a socket.
Action:
- Mirror takes a copy of the packet and sends it to specified
@ -29,14 +29,14 @@ steals the packet and redirects to specified destination dev.
What NOT to do if you don't want your machine to crash:
------------------------------------------------------
Do not create loops!
Do not create loops!
Loops are not hard to create in the egress qdiscs.
Here are simple rules to follow if you don't want to get
hurt:
A) Do not have the same packet go to same netdevice twice
in a single graph of policies. Your machine will just hang!
This is design intent _not a bug_ to teach you some lessons.
This is design intent _not a bug_ to teach you some lessons.
In the future if there are easy ways to do this in the kernel
without affecting other packets not interested in this feature
@ -51,7 +51,7 @@ B) Do not redirect from one IFB device to another.
Remember that IFB is a very specialized case of packet redirecting
device. Instead of redirecting it puts packets at the exact spot
on the stack it found them from.
Redirecting from ifbX->ifbY will actually not crash your machine but your
Redirecting from ifbX->ifbY will actually not crash your machine but your
packets will all be dropped (this is much simpler to detect
and resolve and is only affecting users of ifb as opposed to the
whole stack).
@ -64,7 +64,7 @@ Some examples:
1) Mirror all packets arriving on eth0 to be sent out on eth1.
You may have a sniffer or some accounting box hooked up on eth1.
---
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
@ -100,7 +100,7 @@ stack (i.e ping would work).
3) Even more funky example:
#
#allow 1 out 10 packets on ingress of lo to randomly make it to the
#allow 1 out 10 packets on ingress of lo to randomly make it to the
# host A (Randomness uses the netrand generator)
#
---
@ -111,9 +111,9 @@ action mirred egress mirror dev eth0
---
4)
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
#
---
@ -158,7 +158,7 @@ Essentially a good debugging/logging interface (sort of like
BSDs speacialized log device does without needing one).
If you replace mirror with redirect, those packets will be
blackholed and will never make it out.
blackholed and will never make it out.
cheers,
jamal

View File

@ -14,8 +14,10 @@
13 dnrouted
14 xorp
15 ntk
16 dhcp
16 dhcp
18 keepalived
42 babel
99 openr
186 bgp
187 isis
188 ospf

View File

@ -1,122 +0,0 @@
# CHANGES
# -------
# v0.3a2- fixed bug in "if" operator. Thanks kad@dgtu.donetsk.ua.
# v0.3a- added TIME parameter. Example:
# TIME=00:00-19:00;64Kbit/6Kbit
# So, between 00:00 and 19:00 RATE will be 64Kbit.
# Just start "cbq.init timecheck" periodically from cron (every 10
# minutes for example).
# !!! Anyway you MUST start "cbq.init start" for CBQ initialize.
# v0.2 - Some cosmetique changes. Now it more compatible with
# old bash version. Thanks to Stanislav V. Voronyi
# <stas@cnti.uanet.kharkov.ua>.
# v0.1 - First public release
#
# README
# ------
#
# First of all - this is just a SIMPLE EXAMPLE of CBQ power.
# Don't ask me "why" and "how" :)
#
# This is an example of using CBQ (Class Based Queueing) and policy-based
# filter for building smart ethernet shapers. All CBQ parameters are
# correct only for ETHERNET (eth0,1,2..) linux interfaces. It works for
# ARCNET too (just set bandwidth parameter to 2Mbit). It was tested
# on 2.1.125-2.1.129 linux kernels (KSI linux, Nostromo version) and
# ip-route utility by A.Kuznetsov (iproute2-ss981101 version).
# You can download ip-route from ftp://ftp.inr.ac.ru/ip-routing or
# get iproute2*.rpm (compiled with glibc) from ftp.ksi-linux.com.
#
#
# HOW IT WORKS
#
# Each shaper must be described by config file in $CBQ_PATH
# (/etc/sysconfig/cbq/) directory - one config file for each CBQ shaper.
#
# Some words about config file name:
# Each shaper has its personal ID - two byte HEX number. Really ID is
# CBQ class.
# So, filename looks like:
#
# cbq-1280.My_first_shaper
# ^^^ ^^^ ^^^^^^^^^^^^^
# | | |______ Shaper name - any word
# | |___________________ ID (0000-FFFF), let ID looks like shaper's rate
# |______________________ Filename must begin from "cbq-"
#
#
# Config file describes shaper parameters and source[destination]
# address[port].
# For example let's prepare /etc/sysconfig/cbq/cbq-1280.My_first_shaper:
#
# ----------8<---------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.0/24
# ----------8<---------------------
#
# This is minimal configuration, where:
# DEVICE: eth0 - device where we do control our traffic
# 10Mbit - REAL ethernet card bandwidth
# 1Mbit - "weight" of :1 class (parent for all shapers for eth0),
# as a rule of thumb weight=batdwidth/10.
# 100Mbit adapter's example: DEVICE=eth0,100Mbit,10Mbit
# *** If you want to build more than one shaper per device it's
# enough to describe bandwidth and weight once - cbq.init
# is smart :) You can put only 'DEVICE=eth0' into cbq-*
# config file for eth0.
#
# RATE: Shaper's speed - Kbit,Mbit or bps (bytes per second)
#
# WEIGHT: "weight" of shaper (CBQ class). Like for DEVICE - approx. RATE/10
#
# PRIO: shaper's priority from 1 to 8 where 1 is the highest one.
# I do always use "5" for all my shapers.
#
# RULE: [source addr][:source port],[dest addr][:dest port]
# Some examples:
# RULE=10.1.1.0/24:80 - all traffic for network 10.1.1.0 to port 80
# will be shaped.
# RULE=10.2.2.5 - shaper works only for IP address 10.2.2.5
# RULE=:25,10.2.2.128/25:5000 - all traffic from any address and port 25 to
# address 10.2.2.128 - 10.2.2.255 and port 5000
# will be shaped.
# RULE=10.5.5.5:80, - shaper active only for traffic from port 80 of
# address 10.5.5.5
# Multiple RULE fields per one config file are allowed. For example:
# RULE=10.1.1.2:80
# RULE=10.1.1.2:25
# RULE=10.1.1.2:110
#
# *** ATTENTION!!!
# All shapers do work only for outgoing traffic!
# So, if you want to build bidirectional shaper you must set it up for
# both ethernet card. For example let's build shaper for our linux box like:
#
# --------- 192.168.1.1
# BACKBONE -----eth0-| linux |-eth1------*[our client]
# ---------
#
# Let all traffic from backbone to client will be shaped at 28Kbit and
# traffic from client to backbone - at 128Kbit. We need two config files:
#
# ---8<-----/etc/sysconfig/cbq/cbq-28.client-out----
# DEVICE=eth1,10Mbit,1Mbit
# RATE=28Kbit
# WEIGHT=2Kbit
# PRIO=5
# RULE=192.168.1.1
# ---8<---------------------------------------------
#
# ---8<-----/etc/sysconfig/cbq/cbq-128.client-in----
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.1,
# ---8<---------------------------------------------
# ^pay attention to "," - this is source address!
#
# Enjoy.

View File

@ -1,49 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# this script shows how one can rate limit incoming SYNs
# Useful for TCP-SYN attack protection. You can use
# IPchains to have more powerful additions to the SYN (eg
# in addition the subnet)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
#
# tag all incoming SYN packets through $INDEV as mark value 1
############################################################
$IPCHAINS -A input -i $INDEV -y -m 1
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
#
# SYN packets are 40 bytes (320 bits) so three SYNs equals
# 960 bits (approximately 1kbit); so we rate limit below
# the incoming SYNs to 3/sec (not very sueful really; but
#serves to show the point - JHS
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 1 fw \
police rate 1kbit burst 40 mtu 9k drop flowid :1
############################################################
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,8 +1,18 @@
eBPF toy code examples (running in kernel) to familiarize yourself
with syntax and features:
- bpf_shared.c -> Ingress/egress map sharing example
- bpf_tailcall.c -> Using tail call chains
- bpf_cyclic.c -> Simple cycle as tail calls
- BTF defined map examples
- bpf_graft.c -> Demo on altering runtime behaviour
- bpf_map_in_map.c -> Using map in map example
- bpf_shared.c -> Ingress/egress map sharing example
- bpf_map_in_map.c -> Using map in map example
- legacy struct bpf_elf_map defined map examples
- legacy/bpf_shared.c -> Ingress/egress map sharing example
- legacy/bpf_tailcall.c -> Using tail call chains
- legacy/bpf_cyclic.c -> Simple cycle as tail calls
- legacy/bpf_graft.c -> Demo on altering runtime behaviour
- legacy/bpf_map_in_map.c -> Using map in map example
Note: Users should use new BTF way to defined the maps, the examples
in legacy folder which is using struct bpf_elf_map defined maps is not
recommanded.

View File

@ -33,13 +33,13 @@
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} jmp_tc __section(".maps");
__section("aaa")
int cls_aaa(struct __sk_buff *skb)

View File

@ -1,24 +1,23 @@
#include "../../include/bpf_api.h"
#define MAP_INNER_ID 42
struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
} map_inner __section(".maps");
struct bpf_elf_map __section_maps map_inner = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.id = MAP_INNER_ID,
.inner_idx = 0,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_outer = {
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
__array(values, struct inner_map);
} map_outer __section(".maps") = {
.values = {
[0] = &map_inner,
},
};
__section("egress")

View File

@ -18,13 +18,13 @@
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME); /* or LIBBPF_PIN_NONE */
} map_sh __section(".maps");
__section("egress")
int emain(struct __sk_buff *skb)

View File

@ -1,4 +1,4 @@
#include "../../include/bpf_api.h"
#include "../../../include/bpf_api.h"
/* Cyclic dependency example to test the kernel's runtime upper
* bound on loops. Also demonstrates on how to use direct-actions,

View File

@ -0,0 +1,66 @@
#include "../../../include/bpf_api.h"
/* This example demonstrates how classifier run-time behaviour
* can be altered with tail calls. We start out with an empty
* jmp_tc array, then add section aaa to the array slot 0, and
* later on atomically replace it with section bbb. Note that
* as shown in other examples, the tc loader can prepopulate
* tail called sections, here we start out with an empty one
* on purpose to show it can also be done this way.
*
* tc filter add dev foo parent ffff: bpf obj graft.o
* tc exec bpf dbg
* [...]
* Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
* <idle>-0 [001] ..s. 138993.202265: : fallthrough
* Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139012.053587: : aaa
* <idle>-0 [002] ..s. 139012.172359: : aaa
* Socket Thread-19818 [001] ..s. 139012.173556: : aaa
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139022.102967: : bbb
* <idle>-0 [002] ..s. 139022.155640: : bbb
* Socket Thread-19818 [001] ..s. 139022.156730: : bbb
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("aaa")
int cls_aaa(struct __sk_buff *skb)
{
printt("aaa\n");
return TC_H_MAKE(1, 42);
}
__section("bbb")
int cls_bbb(struct __sk_buff *skb)
{
printt("bbb\n");
return TC_H_MAKE(1, 43);
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,56 @@
#include "../../../include/bpf_api.h"
#define MAP_INNER_ID 42
struct bpf_elf_map __section_maps map_inner = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.id = MAP_INNER_ID,
.inner_idx = 0,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_outer = {
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,53 @@
#include "../../../include/bpf_api.h"
/* Minimal, stand-alone toy map pinning example:
*
* clang -target bpf -O2 [...] -o bpf_shared.o -c bpf_shared.c
* tc filter add dev foo parent 1: bpf obj bpf_shared.o sec egress
* tc filter add dev foo parent ffff: bpf obj bpf_shared.o sec ingress
*
* Both classifier will share the very same map instance in this example,
* so map content can be accessed from ingress *and* egress side!
*
* This example has a pinning of PIN_OBJECT_NS, so it's private and
* thus shared among various program sections within the object.
*
* A setting of PIN_GLOBAL_NS would place it into a global namespace,
* so that it can be shared among different object files. A setting
* of PIN_NONE (= 0) means no sharing, so each tc invocation a new map
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
lock_xadd(val, 1);
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("map val: %d\n", *val);
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -1,5 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include "../../include/bpf_api.h"
#include "../../../include/bpf_api.h"
#define ENTRY_INIT 3
#define ENTRY_0 0

View File

@ -1,76 +0,0 @@
#! /bin/sh
TC=/home/root/tc
IP=/home/root/ip
DEVICE=eth1
BANDWIDTH="bandwidth 10Mbit"
# Attach CBQ on $DEVICE. It will have handle 1:.
# $BANDWIDTH is real $DEVICE bandwidth (10Mbit).
# avpkt is average packet size.
# mpu is minimal packet size.
$TC qdisc add dev $DEVICE root handle 1: cbq \
$BANDWIDTH avpkt 1000 mpu 64
# Create root class with classid 1:1. This step is not necessary.
# bandwidth is the same as on CBQ itself.
# rate == all the bandwidth
# allot is MTU + MAC header
# maxburst measure allowed class burstiness (please,read S.Floyd and VJ papers)
# est 1sec 8sec means, that kernel will evaluate average rate
# on this class with period 1sec and time constant 8sec.
# This rate is viewed with "tc -s class ls dev $DEVICE"
$TC class add dev $DEVICE parent 1:0 classid :1 est 1sec 8sec cbq \
$BANDWIDTH rate 10Mbit allot 1514 maxburst 50 avpkt 1000
# Bulk.
# New parameters are:
# weight, which is set to be proportional to
# "rate". It is not necessary, weight=1 will work as well.
# defmap and split say that best effort ttraffic, not classfied
# by another means will fall to this class.
$TC class add dev $DEVICE parent 1:1 classid :2 est 1sec 8sec cbq \
$BANDWIDTH rate 4Mbit allot 1514 weight 500Kbit \
prio 6 maxburst 50 avpkt 1000 split 1:0 defmap ff3d
# OPTIONAL.
# Attach "sfq" qdisc to this class, quantum is MTU, perturb
# gives period of hash function perturbation in seconds.
#
$TC qdisc add dev $DEVICE parent 1:2 sfq quantum 1514b perturb 15
# Interactive-burst class
$TC class add dev $DEVICE parent 1:1 classid :3 est 2sec 16sec cbq \
$BANDWIDTH rate 1Mbit allot 1514 weight 100Kbit \
prio 2 maxburst 100 avpkt 1000 split 1:0 defmap c0
$TC qdisc add dev $DEVICE parent 1:3 sfq quantum 1514b perturb 15
# Background.
$TC class add dev $DEVICE parent 1:1 classid :4 est 1sec 8sec cbq \
$BANDWIDTH rate 100Kbit allot 1514 weight 10Mbit \
prio 7 maxburst 10 avpkt 1000 split 1:0 defmap 2
$TC qdisc add dev $DEVICE parent 1:4 sfq quantum 1514b perturb 15
# Realtime class for RSVP
$TC class add dev $DEVICE parent 1:1 classid 1:7FFE cbq \
rate 5Mbit $BANDWIDTH allot 1514b avpkt 1000 \
maxburst 20
# Reclassified realtime traffic
#
# New element: split is not 1:0, but 1:7FFE. It means,
# that only real-time packets, which violated policing filters
# or exceeded reshaping buffers will fall to it.
$TC class add dev $DEVICE parent 1:7FFE classid 1:7FFF est 4sec 32sec cbq \
rate 1Mbit $BANDWIDTH allot 1514b avpkt 1000 weight 10Kbit \
prio 6 maxburst 10 split 1:7FFE defmap ffff

View File

@ -1,68 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script just tags on the ingress interfac using Ipchains
# the result is used for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
#
############################################################
$IPCHAINS -A input -s 10.2.0.4/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64 set_tc_index
#
# values of the DSCP to change depending on the class
#
#becomes EF
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
#becomes AF11
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
#becomes AF21
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent 1:0
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0

View File

@ -1,87 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script tags the fwmark on the ingress interface using IPchains
# the result is used first for policing on the Ingress interface then
# for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
############################################################
$IPCHAINS -A input -s 10.2.0.0/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# attach a fw classifier to the ingress which polices anything marked
# by ipchains to tag value 3 (The rest of the subnet packets -- not
# tag 1 or 2) to not go beyond 1.5Mbps
# Allow up to at least 60 packets to burst (assuming maximum packet
# size of # 1.5 KB) in the long run and up to about 6 packets in the
# shot run
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 3 fw \
police rate 1500kbit burst 90k mtu 9k drop flowid :1
############################################################
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $DEV ingress

View File

@ -1,170 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/usr/src/iproute2-current
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth0
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# Create u32 filters
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1: u32 \
divisor 1
############################################################
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter2=" police index 2 rate $CIR2 burst $CBS1 "
meter3=" police index 3 rate $CIR2 burst $CBS2 "
meter4=" police index 4 rate $CIR1 burst $CBS2 "
meter5=" police index 5 rate $CIR1 burst $CBS2 "
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) is passed on with a tcindex value 1
#if it doesn't exceed its CIR/CBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesn't exceed its CIR/CBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x90 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesn't exceed its CIR/CBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x98 0xfc \
$meter3 \
drop flowid :3
#
# *********************** BE ***************************
#
# Anything else (not from the AF4*) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesn't
# Note that the BE class is also used by the AF4* in the worst
# case
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,132 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR2 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesn't exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter1 \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter2 \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3 \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesn't
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,198 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=500kbit
# the PIR is what is in excess of the CIR
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 20 max sized packets
EBS1=30k
EBS2=30k
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter1a=" police index 2 rate $PIR1 burst $EBS1 "
meter2=" police index 3 rate $CIR2 burst $CBS1 "
meter2a=" police index 4 rate $PIR2 burst $EBS1 "
meter3=" police index 5 rate $CIR2 burst $CBS2 "
meter3a=" police index 6 rate $PIR2 burst $EBS2 "
meter4=" police index 7 rate $CIR1 burst $CBS2 "
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) from is passed on with a tcindex value 1
#if it doesn't exceed its CIR/CBS + PIR/EBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip tos 0x88 0xfc \
$meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesn't exceed its CIR/CBS + PIR/EBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 8 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 9 u32 \
match ip tos 0x90 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 10 u32 \
match ip tos 0x90 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 11 u32 \
match ip tos 0x90 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesn't exceed its CIR/CBS + PIR/EBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 13 u32 \
match ip tos 0x98 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 14 u32 \
match ip tos 0x98 0xfc \
$meter3a \
drop flowid :3
#
## *********************** BE ***************************
##
## Anything else (not from the AF4*) gets discarded if it
## exceeds 1Mbps and by default goes to BE if it doesn't
## Note that the BE class is also used by the AF4* in the worst
## case
##
$TC filter add dev $INDEV parent ffff: protocol ip prio 16 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,144 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter1a="police rate $CIR2 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter2a="police rate $CIR2 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter3a="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR1 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesn't exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 handle 1 fw \
$meter1 \
continue flowid 4:1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 handle 1 fw \
$meter1a \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 handle 1 fw \
$meter2 \
continue flowid 4:2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter2a \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter3 \
continue flowid 4:3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3a \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesn't
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,145 +0,0 @@
#! /bin/sh
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=1000kbit
# The PIR is the excess (in addition to the CIR i.e if always
# going to the PIR --> average rate is CIR+PIR)
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 10 max sized packets
EBS1=15k
EBS2=15k
# The meters
meter1=" police rate $CIR1 burst $CBS1 "
meter1a=" police rate $PIR1 burst $EBS1 "
meter2=" police rate $CIR2 burst $CBS1 "
meter2a="police rate $PIR2 burst $CBS1 "
meter3=" police rate $CIR2 burst $CBS2 "
meter3a=" police rate $PIR2 burst $EBS2 "
meter4=" police rate $CIR1 burst $CBS2 "
meter5=" police rate $CIR1 burst $CBS2 "
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# NOTE: tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
#anything from subnet 10.2.0.2/24 is passed on with a tcindex value 1
#if it doesn't exceed its CIR/CBS + PIR/EBS
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip src 10.2.0.0/24 $meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip src 10.2.0.0/24 $meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip src 10.2.0.0/24 $meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip src 10.2.0.0/24 $meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip src 10.2.0.0/24 $meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip src 10.2.0.0/24 $meter3a \
drop flowid :3
#
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesn't
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0 $meter5 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,98 +0,0 @@
Note all these are mere examples which can be customized to your needs
AFCBQ
-----
AF PHB built using CBQ, DSMARK,GRED (default in GRIO mode) ,RED for BE
and the tcindex classifier with some algorithmic mapping
EFCBQ
-----
EF PHB built using CBQ (for rate control and prioritization),
DSMARK( to remark DSCPs), tcindex classifier and RED for the BE
traffic.
EFPRIO
------
EF PHB using the PRIO scheduler, Token Bucket to rate control EF,
tcindex classifier, DSMARK to remark, and RED for the BE traffic
EDGE scripts
==============
CB-3(1|2)-(u32/chains)
======================
The major differences are that the classifier is u32 on -u32 extension
and IPchains on the chains extension. CB stands for color Blind
and 31 is for the mode where only a CIR and CBS are defined whereas
32 stands for a mode where a CIR/CBS + PIR/EBS are defined.
Color Blind (CB)
==========-----=
We look at one special subnet that we are interested in for simplicty
reasons to demonstrate the capability. We send the packets from that
subnet to AF4*, BE or end up dropping depending on the metering results.
The algorithm overview is as follows:
*classify:
**case: subnet X
----------------
if !exceed meter1 tag as AF41
else
if !exceed meter2 tag as AF42
else
if !exceed meter 3 tag as AF43
else
drop
default case: Any other subnet
-------------------------------
if !exceed meter 5 tag as AF43
else
drop
One Egress side change the DSCPs of the packets to reflect AF4* and BE
based on the tags from the ingress.
-------------------------------------------------------------
Color Aware
===========
Define some meters with + policing and give them IDs eg
meter1=police index 1 rate $CIR1 burst $CBS1
meter2=police index 2 rate $CIR2 burst $CBS2 etc
General overview:
classify based on the DSCPs and use the policer ids to decide tagging
*classify on ingress:
switch (dscp) {
case AF41: /* tos&0xfc == 0x88 */
if (!exceed meter1) break;
case AF42: /* tos&0xfc == 0x90 */
if (!exceed meter2) {
tag as AF42;
break;
}
case AF43: /* tos&0xfc == 0x98 */
if (!exceed meter3) {
tag as AF43;
break;
} else
drop;
default:
if (!exceed meter4) tag as BE;
else drop;
}
On the Egress side mark the proper AF tags

View File

@ -1,105 +0,0 @@
#!/usr/bin/perl
#
#
# AF using CBQ for a single interface eth0
# 4 AF classes using GRED and one BE using RED
# Things you might want to change:
# - the device bandwidth (set at 10Mbits)
# - the bandwidth allocated for each AF class and the BE class
# - the drop probability associated with each AF virtual queue
#
# AF DSCP values used (based on AF draft 04)
# -----------------------------------------
# AF DSCP values
# AF1 1. 0x0a 2. 0x0c 3. 0x0e
# AF2 1. 0x12 2. 0x14 3. 0x16
# AF3 1. 0x1a 2. 0x1c 3. 0x1e
# AF4 1. 0x22 2. 0x24 3. 0x26
#
#
# A simple DSCP-class relationship formula used to generate
# values in the for loop of this script; $drop stands for the
# DP
# $dscp = ($class*8+$drop*2)
#
# if you use GRIO buffer sharing, then GRED priority is set as follows:
# $gprio=$drop+1;
#
$TC = "/usr/src/iproute2-current/tc/tc";
$DEV = "dev lo";
$DEV = "dev eth1";
$DEV = "dev eth0";
# the BE-class number
$beclass = "5";
#GRIO buffer sharing on or off?
$GRIO = "";
$GRIO = "grio";
# The bandwidth of your device
$linerate="10Mbit";
# The BE and AF rates
%rate_table=();
$berate="1500Kbit";
$rate_table{"AF1rate"}="1500Kbit";
$rate_table{"AF2rate"}="1500Kbit";
$rate_table{"AF3rate"}="1500Kbit";
$rate_table{"AF4rate"}="1500Kbit";
#
#
#
print "\n# --- General setup ---\n";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex mask 0xfc " .
"shift 2 pass_on\n";
#"shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth $linerate ".
"cell 8 avpkt 1000 mpu 64\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 tcindex ".
"mask 0xf0 shift 4 pass_on\n";
for $class (1..4) {
print "\n# --- AF Class $class specific setup---\n";
$AFrate=sprintf("AF%drate",$class);
print "$TC class add $DEV parent 2:0 classid 2:$class cbq ".
"bandwidth $linerate rate $rate_table{$AFrate} avpkt 1000 prio ".
(6-$class)." bounded allot 1514 weight 1 maxburst 21\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle $class ".
"tcindex classid 2:$class\n";
print "$TC qdisc add $DEV parent 2:$class gred setup DPs 3 default 2 ".
"$GRIO\n";
#
# per DP setup
#
for $drop (1..3) {
print "\n# --- AF Class $class DP $drop---\n";
$dscp = $class*8+$drop*2;
$tcindex = sprintf("1%x%x",$class,$drop);
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 ".
"handle $dscp tcindex classid 1:$tcindex\n";
$prob = $drop*0.02;
if ($GRIO) {
$gprio = $drop+1;
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob ".
"prio $gprio\n";
} else {
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob \n";
}
}
}
#
#
print "\n#------BE Queue setup------\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 1:1\n";
print "$TC class add $DEV parent 2:0 classid 2:$beclass cbq ".
"bandwidth $linerate rate $berate avpkt 1000 prio 6 " .
"bounded allot 1514 weight 1 maxburst 21 \n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle 0 tcindex ".
"classid 2:5\n";
print "$TC qdisc add $DEV parent 2:5 red limit 60KB min 15KB max 45KB ".
"burst 20 avpkt 1000 bandwidth $linerate probability 0.4\n";

View File

@ -1,25 +0,0 @@
#!/usr/bin/perl
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
$efrate="1.5Mbit";
$MTU="1.5kB";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 prio\n";
#
# EF class: Maximum about one MTU sized packet allowed on the queue
#
print "$TC qdisc add $DEV parent 2:1 tbf rate $efrate burst $MTU limit 1.6kB\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
#
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,31 +0,0 @@
#!/usr/bin/perl
#
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth ".
"10Mbit cell 8 avpkt 1000 mpu 64\n";
#
# EF class
#
print "$TC class add $DEV parent 2:0 classid 2:1 cbq bandwidth ".
"10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated ".
"allot 1514 weight 1 maxburst 10 \n";
# packet fifo for EF?
print "$TC qdisc add $DEV parent 2:1 pfifo limit 5\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC class add $DEV parent 2:0 classid 2:2 cbq bandwidth ".
"10Mbit rate 5Mbit avpkt 1000 prio 7 allot 1514 weight 1 ".
"maxburst 21 borrow split 2:0 defmap 0xffff \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,125 +0,0 @@
These were the tests done to validate the Diffserv scripts.
This document will be updated continuously. If you do more
thorough validation testing please post the details to the
diffserv mailing list.
Nevertheless, these tests should serve for basic validation.
AFCBQ, EFCBQ, EFPRIO
----------------------
generate all possible DSCPs and observe that they
get sent to the proper classes. In the case of AF also
to the correct Virtual Queues.
Edge1
-----
generate TOS values 0x0,0x10,0xbb each with IP addresses
10.2.0.24 (mark 1), 10.2.0.3 (mark2) and 10.2.0.30 (mark 3)
and observe that they get marked as expected.
Edge2
-----
-Repeat the tests in Edge1
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 3
Edge31-cb-chains
----------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 1.
Metering: The data throughput should not exceed 2*CIR1 + 2*CIR2
which is roughly: 5mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
Edge31-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
approximately 5mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
approximately 3.5mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1.5mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device
Edge32-cb-u32, Edge32-cb-chains
-------------------------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well).
Metering:
The data throughput should not exceed 2*CIR1 + 2*CIR2
+ 2*PIR2 + PIR1 for u32 which is roughly: 6mbps
The data throughput should not exceed 2*CIR1 + 5*CIR2
for chains which is roughly: 6mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
TODO:
-More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
-More tests needed to capture stats on how many times the CIR was exceeded
but the data was not remarked etc.
Edge32-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
+PIR1 -- approximately 4mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
+ 2* PIR2 approximately 3mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed PIR1+ CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device

View File

@ -1,134 +0,0 @@
#!/bin/sh
#
# Setup address label from /etc/gai.conf
#
# Written by YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>, 2010.
#
IP=ip
DEFAULT_GAICONF=/etc/gai.conf
verbose=
debug=
function run ()
{
if [ x"$verbose" != x"" ]; then
echo "$@"
fi
if [ x"$debug" = x"" ]; then
"$@"
fi
}
function do_load_config ()
{
file=$1; shift
flush=1
cat $file | while read command prefix label; do
if [ x"$command" = x"#label" ]; then
if [ ${flush} = 1 ]; then
run ${IP} -6 addrlabel flush
flush=0
fi
run ${IP} -6 addrlabel add prefix $prefix label $label
fi
done
}
function do_list_config ()
{
${IP} -6 addrlabel list | while read p pfx l lbl; do
echo label ${pfx} ${lbl}
done
}
function help ()
{
echo "Usage: $0 [-v] {--list | --config [ ${DEFAULT_GAICONF} ] | --default}"
exit 1
}
TEMP=`getopt -o c::dlv -l config::,default,list,verbose -n gaiconf -- "$@"`
if [ $? != 0 ]; then
echo "Terminating..." >&2
exit 1
fi
TEMPFILE=`mktemp`
eval set -- "$TEMP"
while true ; do
case "$1" in
-c|--config)
if [ x"$cmd" != x"" ]; then
help
fi
case "$2" in
"") gai_conf="${DEFAULT_GAICONF}"
shift 2
;;
*) gai_conf="$2"
shift 2
esac
cmd=config
;;
-d|--default)
if [ x"$cmd" != x"" ]; then
help
fi
gai_conf=${TEMPFILE}
cmd=config
;;
-l|--list)
if [ x"$cmd" != x"" ]; then
help
fi
cmd=list
shift
;;
-v)
verbose=1
shift
;;
--)
shift;
break
;;
*)
echo "Internal error!" >&2
exit 1
;;
esac
done
case "$cmd" in
config)
if [ x"$gai_conf" = x"${TEMPFILE}" ]; then
sed -e 's/^[[:space:]]*//' <<END_OF_DEFAULT >${TEMPFILE}
label ::1/128 0
label ::/0 1
label 2002::/16 2
label ::/96 3
label ::ffff:0:0/96 4
label fec0::/10 5
label fc00::/7 6
label 2001:0::/32 7
END_OF_DEFAULT
fi
do_load_config "$gai_conf"
;;
list)
do_list_config
;;
*)
help
;;
esac
rm -f "${TEMPFILE}"
exit 0

View File

@ -28,13 +28,15 @@
static int usage(void)
{
fprintf(stderr,"Usage: ctrl <CMD>\n" \
"CMD := get <PARMS> | list | monitor\n" \
"CMD := get <PARMS> | list | monitor | policy <PARMS>\n" \
"PARMS := name <name> | id <id>\n" \
"Examples:\n" \
"\tctrl ls\n" \
"\tctrl monitor\n" \
"\tctrl get name foobar\n" \
"\tctrl get id 0xF\n");
"\tctrl get id 0xF\n"
"\tctrl policy name foobar\n"
"\tctrl policy id 0xF\n");
return -1;
}
@ -123,7 +125,8 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
ghdr->cmd != CTRL_CMD_DELFAMILY &&
ghdr->cmd != CTRL_CMD_NEWFAMILY &&
ghdr->cmd != CTRL_CMD_NEWMCAST_GRP &&
ghdr->cmd != CTRL_CMD_DELMCAST_GRP) {
ghdr->cmd != CTRL_CMD_DELMCAST_GRP &&
ghdr->cmd != CTRL_CMD_GETPOLICY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
return 0;
}
@ -136,7 +139,7 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
}
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
parse_rtattr_flags(tb, CTRL_ATTR_MAX, attrs, len, NLA_F_NESTED);
if (tb[CTRL_ATTR_FAMILY_NAME]) {
char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]);
@ -159,6 +162,36 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
__u32 *ma = RTA_DATA(tb[CTRL_ATTR_MAXATTR]);
fprintf(fp, " max attribs: %d ",*ma);
}
if (tb[CTRL_ATTR_OP_POLICY]) {
const struct rtattr *pos;
rtattr_for_each_nested(pos, tb[CTRL_ATTR_OP_POLICY]) {
struct rtattr *ptb[CTRL_ATTR_POLICY_DUMP_MAX + 1];
struct rtattr *pattrs = RTA_DATA(pos);
int plen = RTA_PAYLOAD(pos);
parse_rtattr_flags(ptb, CTRL_ATTR_POLICY_DUMP_MAX,
pattrs, plen, NLA_F_NESTED);
fprintf(fp, " op %d policies:",
pos->rta_type & ~NLA_F_NESTED);
if (ptb[CTRL_ATTR_POLICY_DO]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DO]);
fprintf(fp, " do=%d", *v);
}
if (ptb[CTRL_ATTR_POLICY_DUMP]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DUMP]);
fprintf(fp, " dump=%d", *v);
}
}
}
if (tb[CTRL_ATTR_POLICY])
nl_print_policy(tb[CTRL_ATTR_POLICY], fp);
/* end of family definitions .. */
fprintf(fp,"\n");
if (tb[CTRL_ATTR_OPS]) {
@ -235,7 +268,9 @@ static int ctrl_list(int cmd, int argc, char **argv)
exit(1);
}
if (cmd == CTRL_CMD_GETFAMILY) {
if (cmd == CTRL_CMD_GETFAMILY || cmd == CTRL_CMD_GETPOLICY) {
req.g.cmd = cmd;
if (argc != 2) {
fprintf(stderr, "Wrong number of params\n");
return -1;
@ -260,7 +295,9 @@ static int ctrl_list(int cmd, int argc, char **argv)
fprintf(stderr, "Wrong params\n");
goto ctrl_done;
}
}
if (cmd == CTRL_CMD_GETFAMILY) {
if (rtnl_talk(&rth, nlh, &answer) < 0) {
fprintf(stderr, "Error talking to the kernel\n");
goto ctrl_done;
@ -273,7 +310,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
}
if (cmd == CTRL_CMD_UNSPEC) {
if (cmd == CTRL_CMD_UNSPEC || cmd == CTRL_CMD_GETPOLICY) {
nlh->nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
nlh->nlmsg_seq = rth.dump = ++rth.seq;
@ -324,6 +361,8 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0)
return ctrl_list(CTRL_CMD_UNSPEC, argc-1, argv+1);
if (matches(*argv, "policy") == 0)
return ctrl_list(CTRL_CMD_GETPOLICY, argc-1, argv+1);
if (matches(*argv, "help") == 0)
return usage();

View File

@ -22,7 +22,7 @@
#include <errno.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h> /* until we put our own header */
#include "SNAPSHOT.h"
#include "version.h"
#include "utils.h"
#include "genl_utils.h"
@ -118,7 +118,7 @@ int main(int argc, char **argv)
} else if (matches(argv[1], "-raw") == 0) {
++show_raw;
} else if (matches(argv[1], "-Version") == 0) {
printf("genl utility, iproute2-ss%s\n", SNAPSHOT);
printf("genl utility, iproute2-%s\n", version);
exit(0);
} else if (matches(argv[1], "-help") == 0) {
usage();

View File

@ -2,11 +2,10 @@
#ifndef _TC_UTIL_H_
#define _TC_UTIL_H_ 1
#include <linux/genetlink.h>
#include "utils.h"
#include "linux/genetlink.h"
struct genl_util
{
struct genl_util {
struct genl_util *next;
char name[16];
int (*parse_genlopt)(struct genl_util *fu, int argc, char **argv);

View File

@ -1 +0,0 @@
static const char SNAPSHOT[] = "190107";

View File

@ -19,6 +19,19 @@
#include "bpf_elf.h"
/** libbpf pin type. */
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
LIBBPF_PIN_BY_NAME,
};
/** Type helper macros. */
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
/** Misc macros. */
#ifndef __stringify

View File

@ -272,14 +272,18 @@ const char *bpf_prog_to_default_section(enum bpf_prog_type type);
int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
int bpf_trace_pipe(void);
void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log,
size_t size_log);
int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, __u32 ifindex,
char *log, size_t size_log);
int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log,
size_t size_log);
int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_dump_prog_info(FILE *f, uint32_t id);
@ -287,6 +291,16 @@ int bpf_dump_prog_info(FILE *f, uint32_t id);
int bpf_send_map_fds(const char *path, const char *obj);
int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
unsigned int entries);
#ifdef HAVE_LIBBPF
int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
int iproute2_bpf_fetch_ancillary(void);
int iproute2_get_root_path(char *root_path, size_t len);
bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map *imap,
struct bpf_elf_map *omap, char *omap_name);
int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
#endif /* HAVE_LIBBPF */
#else
static inline int bpf_send_map_fds(const char *path, const char *obj)
{
@ -299,5 +313,15 @@ static inline int bpf_recv_map_fds(const char *path, int *fds,
{
return -1;
}
#ifdef HAVE_LIBBPF
static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
{
fprintf(stderr, "No ELF library support compiled in.\n");
return -1;
}
#endif /* HAVE_LIBBPF */
#endif /* HAVE_ELF */
const char *get_libbpf_version(void);
#endif /* __BPF_UTIL__ */

6
include/cg_map.h Normal file
View File

@ -0,0 +1,6 @@
#ifndef __CG_MAP_H__
#define __CG_MAP_H__
const char *cg_id_to_path(__u64 id);
#endif /* __CG_MAP_H__ */

View File

@ -12,7 +12,7 @@ extern int do_command4(int argc, char *argv[], char **table,
struct xtc_handle **handle, bool restore);
extern int delete_chain4(const xt_chainlabel chain, int verbose,
struct xtc_handle *handle);
extern int flush_entries4(const xt_chainlabel chain, int verbose,
extern int flush_entries4(const xt_chainlabel chain, int verbose,
struct xtc_handle *handle);
extern int for_each_chain4(int (*fn)(const xt_chainlabel, int, struct xtc_handle *),
int verbose, int builtinstoo, struct xtc_handle *handle);

View File

@ -15,6 +15,9 @@
#include "json_writer.h"
#include "color.h"
#define _IS_JSON_CONTEXT(type) (is_json_context() && (type & PRINT_JSON || type & PRINT_ANY))
#define _IS_FP_CONTEXT(type) (!is_json_context() && (type & PRINT_FP || type & PRINT_ANY))
json_writer_t *get_json_writer(void);
/*
@ -31,11 +34,11 @@ enum output_type {
void new_json_obj(int json);
void delete_json_obj(void);
void new_json_obj_plain(int json);
void delete_json_obj_plain(void);
bool is_json_context(void);
void fflush_fp(void);
void open_json_object(const char *str);
void close_json_object(void);
void open_json_array(enum output_type type, const char *delim);
@ -44,32 +47,61 @@ void close_json_array(enum output_type type, const char *delim);
void print_nl(void);
#define _PRINT_FUNC(type_name, type) \
void print_color_##type_name(enum output_type t, \
enum color_attr color, \
const char *key, \
const char *fmt, \
type value); \
int print_color_##type_name(enum output_type t, \
enum color_attr color, \
const char *key, \
const char *fmt, \
type value); \
\
static inline void print_##type_name(enum output_type t, \
const char *key, \
const char *fmt, \
type value) \
static inline int print_##type_name(enum output_type t, \
const char *key, \
const char *fmt, \
type value) \
{ \
print_color_##type_name(t, COLOR_NONE, key, fmt, value); \
return print_color_##type_name(t, COLOR_NONE, key, fmt, \
value); \
}
_PRINT_FUNC(int, int);
_PRINT_FUNC(s64, int64_t);
_PRINT_FUNC(bool, bool);
_PRINT_FUNC(null, const char*);
_PRINT_FUNC(string, const char*);
_PRINT_FUNC(uint, unsigned int);
_PRINT_FUNC(u64, uint64_t);
_PRINT_FUNC(hu, unsigned short);
_PRINT_FUNC(hex, unsigned int);
_PRINT_FUNC(0xhex, unsigned long long);
_PRINT_FUNC(luint, unsigned long);
_PRINT_FUNC(lluint, unsigned long long);
_PRINT_FUNC(float, double);
/* These functions return 0 if printing to a JSON context, number of
* characters printed otherwise (as calculated by printf(3)).
*/
_PRINT_FUNC(int, int)
_PRINT_FUNC(s64, int64_t)
_PRINT_FUNC(bool, bool)
_PRINT_FUNC(on_off, bool)
_PRINT_FUNC(null, const char*)
_PRINT_FUNC(string, const char*)
_PRINT_FUNC(uint, unsigned int)
_PRINT_FUNC(size, __u32)
_PRINT_FUNC(u64, uint64_t)
_PRINT_FUNC(hhu, unsigned char)
_PRINT_FUNC(hu, unsigned short)
_PRINT_FUNC(hex, unsigned int)
_PRINT_FUNC(0xhex, unsigned long long)
_PRINT_FUNC(luint, unsigned long)
_PRINT_FUNC(lluint, unsigned long long)
_PRINT_FUNC(float, double)
_PRINT_FUNC(tv, const struct timeval *)
#undef _PRINT_FUNC
#define _PRINT_NAME_VALUE_FUNC(type_name, type, format_char) \
void print_##type_name##_name_value(const char *name, type value) \
_PRINT_NAME_VALUE_FUNC(uint, unsigned int, u);
_PRINT_NAME_VALUE_FUNC(string, const char*, s);
#undef _PRINT_NAME_VALUE_FUNC
int print_color_rate(bool use_iec, enum output_type t, enum color_attr color,
const char *key, const char *fmt, unsigned long long rate);
static inline int print_rate(bool use_iec, enum output_type t,
const char *key, const char *fmt,
unsigned long long rate)
{
return print_color_rate(use_iec, t, COLOR_NONE, key, fmt, rate);
}
/* A backdoor to the size formatter. Please use print_size() instead. */
char *sprint_size(__u32 sz, char *buf);
#endif /* _JSON_PRINT_H_ */

View File

@ -38,6 +38,7 @@ void jsonw_float_fmt(json_writer_t *self, const char *fmt, double num);
void jsonw_uint(json_writer_t *self, unsigned int number);
void jsonw_u64(json_writer_t *self, uint64_t number);
void jsonw_xint(json_writer_t *self, uint64_t number);
void jsonw_hhu(json_writer_t *self, unsigned char num);
void jsonw_hu(json_writer_t *self, unsigned short number);
void jsonw_int(json_writer_t *self, int number);
void jsonw_s64(json_writer_t *self, int64_t number);
@ -52,6 +53,7 @@ void jsonw_float_field(json_writer_t *self, const char *prop, double num);
void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num);
void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num);
void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num);
void jsonw_hhu_field(json_writer_t *self, const char *prop, unsigned char num);
void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num);
void jsonw_int_field(json_writer_t *self, const char *prop, int num);
void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num);

View File

@ -21,6 +21,7 @@ struct { \
}, \
}
int genl_add_mcast_grp(struct rtnl_handle *grth, __u16 genl_family, const char *group);
int genl_resolve_family(struct rtnl_handle *grth, const char *family);
int genl_init_handle(struct rtnl_handle *grth, const char *family,
int *genl_family);

View File

@ -23,6 +23,7 @@ struct rtnl_handle {
FILE *dump_fp;
#define RTNL_HANDLE_F_LISTEN_ALL_NSID 0x01
#define RTNL_HANDLE_F_SUPPRESS_NLERR 0x02
#define RTNL_HANDLE_F_STRICT_CHK 0x04
int flags;
};
@ -44,26 +45,33 @@ int rtnl_open(struct rtnl_handle *rth, unsigned int subscriptions)
int rtnl_open_byproto(struct rtnl_handle *rth, unsigned int subscriptions,
int protocol)
__attribute__((warn_unused_result));
int rtnl_add_nl_group(struct rtnl_handle *rth, unsigned int group)
__attribute__((warn_unused_result));
void rtnl_close(struct rtnl_handle *rth);
void rtnl_set_strict_dump(struct rtnl_handle *rth);
int rtnl_addrdump_req(struct rtnl_handle *rth, int family)
typedef int (*req_filter_fn_t)(struct nlmsghdr *nlh, int reqlen);
int rtnl_addrdump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_addrlbldump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result));
int rtnl_routedump_req(struct rtnl_handle *rth, int family)
int rtnl_routedump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_ruledump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result));
int rtnl_neighdump_req(struct rtnl_handle *rth, int family)
int rtnl_neighdump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_neightbldump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result));
int rtnl_mdbdump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result));
int rtnl_netconfdump_req(struct rtnl_handle *rth, int family)
int rtnl_brvlandump_req(struct rtnl_handle *rth, int family, __u32 dump_flags)
__attribute__((warn_unused_result));
int rtnl_nsiddump_req(struct rtnl_handle *rth, int family)
int rtnl_netconfdump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result));
int rtnl_linkdump_req(struct rtnl_handle *rth, int fam)
@ -71,11 +79,15 @@ int rtnl_linkdump_req(struct rtnl_handle *rth, int fam)
int rtnl_linkdump_req_filter(struct rtnl_handle *rth, int fam, __u32 filt_mask)
__attribute__((warn_unused_result));
typedef int (*req_filter_fn_t)(struct nlmsghdr *nlh, int reqlen);
int rtnl_linkdump_req_filter_fn(struct rtnl_handle *rth, int fam,
req_filter_fn_t fn)
__attribute__((warn_unused_result));
int rtnl_fdb_linkdump_req_filter_fn(struct rtnl_handle *rth,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_nsiddump_req_filter_fn(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_statsdump_req_filter(struct rtnl_handle *rth, int fam, __u32 filt_mask)
__attribute__((warn_unused_result));
int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req,
@ -84,12 +96,40 @@ int rtnl_dump_request(struct rtnl_handle *rth, int type, void *req,
int rtnl_dump_request_n(struct rtnl_handle *rth, struct nlmsghdr *n)
__attribute__((warn_unused_result));
int rtnl_nexthopdump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
int rtnl_nexthop_bucket_dump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
struct rtnl_ctrl_data {
int nsid;
};
typedef int (*rtnl_filter_t)(struct nlmsghdr *n, void *);
/**
* rtnl error handler called from
* rtnl_dump_done()
* rtnl_dump_error()
*
* Return value is a bitmask of the following values:
* RTNL_LET_NLERR
* error handled as usual
* RTNL_SUPPRESS_NLMSG_DONE_NLERR
* error in nlmsg_type == NLMSG_DONE will be suppressed
* RTNL_SUPPRESS_NLMSG_ERROR_NLERR
* error in nlmsg_type == NLMSG_ERROR will be suppressed
* and nlmsg will be skipped
* RTNL_SUPPRESS_NLERR - suppress error in both previous cases
*/
#define RTNL_LET_NLERR 0x01
#define RTNL_SUPPRESS_NLMSG_DONE_NLERR 0x02
#define RTNL_SUPPRESS_NLMSG_ERROR_NLERR 0x04
#define RTNL_SUPPRESS_NLERR 0x06
typedef int (*rtnl_err_hndlr_t)(struct nlmsghdr *n, void *);
typedef int (*rtnl_listen_filter_t)(struct rtnl_ctrl_data *,
struct nlmsghdr *n, void *);
@ -99,6 +139,8 @@ typedef int (*nl_ext_ack_fn_t)(const char *errmsg, uint32_t off,
struct rtnl_dump_filter_arg {
rtnl_filter_t filter;
void *arg1;
rtnl_err_hndlr_t errhndlr;
void *arg2;
__u16 nc_flags;
};
@ -107,6 +149,15 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
void *arg, __u16 nc_flags);
#define rtnl_dump_filter(rth, filter, arg) \
rtnl_dump_filter_nc(rth, filter, arg, 0)
int rtnl_dump_filter_errhndlr_nc(struct rtnl_handle *rth,
rtnl_filter_t filter,
void *arg1,
rtnl_err_hndlr_t errhndlr,
void *arg2,
__u16 nc_flags);
#define rtnl_dump_filter_errhndlr(rth, filter, farg, errhndlr, earg) \
rtnl_dump_filter_errhndlr_nc(rth, filter, farg, errhndlr, earg, 0)
int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
struct nlmsghdr **answer)
__attribute__((warn_unused_result));
@ -121,6 +172,7 @@ int rtnl_send(struct rtnl_handle *rth, const void *buf, int)
int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int)
__attribute__((warn_unused_result));
int nl_dump_ext_ack(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn);
int nl_dump_ext_ack_done(const struct nlmsghdr *nlh, int error);
int addattr(struct nlmsghdr *n, int maxlen, int type);
int addattr8(struct nlmsghdr *n, int maxlen, int type, __u8 data);
@ -158,7 +210,8 @@ int rta_nest_end(struct rtattr *rta, struct rtattr *nest);
RTA_ALIGN((rta)->rta_len)))
#define parse_rtattr_nested(tb, max, rta) \
(parse_rtattr((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta)))
(parse_rtattr_flags((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta), \
NLA_F_NESTED))
#define parse_rtattr_one_nested(type, rta) \
(parse_rtattr_one(type, RTA_DATA(rta), RTA_PAYLOAD(rta)))
@ -264,8 +317,20 @@ int rtnl_from_file(FILE *, rtnl_listen_filter_t handler,
((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct if_stats_msg))))
#endif
#ifndef BRVLAN_RTA
#define BRVLAN_RTA(r) \
((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_vlan_msg))))
#endif
/* User defined nlmsg_type which is used mostly for logging netlink
* messages from dump file */
#define NLMSG_TSTAMP 15
#define rtattr_for_each_nested(attr, nest) \
for ((attr) = (void *)RTA_DATA(nest); \
RTA_OK(attr, RTA_PAYLOAD(nest) - ((char *)(attr) - (char *)RTA_DATA((nest)))); \
(attr) = RTA_TAIL((attr)))
void nl_print_policy(const struct rtattr *attr, FILE *fp);
#endif /* __LIBNETLINK_H__ */

View File

@ -9,6 +9,7 @@ unsigned ll_name_to_index(const char *name);
const char *ll_index_to_name(unsigned idx);
int ll_index_to_type(unsigned idx);
int ll_index_to_flags(unsigned idx);
void ll_drop_by_index(unsigned index);
unsigned namehash(const char *str);
const char *ll_idx_n2a(unsigned int idx);

33
include/mnl_utils.h Normal file
View File

@ -0,0 +1,33 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __MNL_UTILS_H__
#define __MNL_UTILS_H__ 1
struct mnlu_gen_socket {
struct mnl_socket *nl;
char *buf;
uint32_t family;
unsigned int seq;
uint8_t version;
};
int mnlu_gen_socket_open(struct mnlu_gen_socket *nlg, const char *family_name,
uint8_t version);
void mnlu_gen_socket_close(struct mnlu_gen_socket *nlg);
struct nlmsghdr *
_mnlu_gen_socket_cmd_prepare(struct mnlu_gen_socket *nlg,
uint8_t cmd, uint16_t flags,
uint32_t id, uint8_t version);
struct nlmsghdr *mnlu_gen_socket_cmd_prepare(struct mnlu_gen_socket *nlg,
uint8_t cmd, uint16_t flags);
int mnlu_gen_socket_sndrcv(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh,
mnl_cb_t data_cb, void *data);
struct mnl_socket *mnlu_socket_open(int bus);
struct nlmsghdr *mnlu_msg_prepare(void *buf, uint32_t nlmsg_type, uint16_t flags,
void *extra_header, size_t extra_header_size);
int mnlu_socket_recv_run(struct mnl_socket *nl, unsigned int seq, void *buf, size_t buf_size,
mnl_cb_t cb, void *data);
int mnlu_gen_socket_recv_run(struct mnlu_gen_socket *nlg, mnl_cb_t cb,
void *data);
#endif /* __MNL_UTILS_H__ */

View File

@ -9,6 +9,7 @@ const char *rtnl_rtscope_n2a(int id, char *buf, int len);
const char *rtnl_rttable_n2a(__u32 id, char *buf, int len);
const char *rtnl_rtrealm_n2a(int id, char *buf, int len);
const char *rtnl_dsfield_n2a(int id, char *buf, int len);
const char *rtnl_dsfield_get_name(int id);
const char *rtnl_group_n2a(int id, char *buf, int len);
int rtnl_rtprot_a2n(__u32 *id, const char *arg);
@ -33,4 +34,9 @@ int ll_proto_a2n(unsigned short *id, const char *buf);
const char *nl_proto_n2a(int id, char *buf, int len);
int nl_proto_a2n(__u32 *id, const char *arg);
int protodown_reason_a2n(__u32 *id, const char *arg);
int protodown_reason_n2a(int id, char *buf, int len);
extern int numeric;
#endif

View File

@ -0,0 +1,14 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef __ASM_GENERIC_SOCKIOS_H
#define __ASM_GENERIC_SOCKIOS_H
/* Socket-level I/O control calls. */
#define FIOSETOWN 0x8901
#define SIOCSPGRP 0x8902
#define FIOGETOWN 0x8903
#define SIOCGPGRP 0x8904
#define SIOCATMARK 0x8905
#define SIOCGSTAMP_OLD 0x8906 /* Get stamp (timeval) */
#define SIOCGSTAMPNS_OLD 0x8907 /* Get stamp (timespec) */
#endif /* __ASM_GENERIC_SOCKIOS_H */

62
include/uapi/linux/amt.h Normal file
View File

@ -0,0 +1,62 @@
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
* Copyright (c) 2021 Taehee Yoo <ap420073@gmail.com>
*/
#ifndef _AMT_H_
#define _AMT_H_
enum ifla_amt_mode {
/* AMT interface works as Gateway mode.
* The Gateway mode encapsulates IGMP/MLD traffic and decapsulates
* multicast traffic.
*/
AMT_MODE_GATEWAY = 0,
/* AMT interface works as Relay mode.
* The Relay mode encapsulates multicast traffic and decapsulates
* IGMP/MLD traffic.
*/
AMT_MODE_RELAY,
__AMT_MODE_MAX,
};
#define AMT_MODE_MAX (__AMT_MODE_MAX - 1)
enum {
IFLA_AMT_UNSPEC,
/* This attribute specify mode etier Gateway or Relay. */
IFLA_AMT_MODE,
/* This attribute specify Relay port.
* AMT interface is created as Gateway mode, this attribute is used
* to specify relay(remote) port.
* AMT interface is created as Relay mode, this attribute is used
* as local port.
*/
IFLA_AMT_RELAY_PORT,
/* This attribute specify Gateway port.
* AMT interface is created as Gateway mode, this attribute is used
* as local port.
* AMT interface is created as Relay mode, this attribute is not used.
*/
IFLA_AMT_GATEWAY_PORT,
/* This attribute specify physical device */
IFLA_AMT_LINK,
/* This attribute specify local ip address */
IFLA_AMT_LOCAL_IP,
/* This attribute specify Relay ip address.
* So, this is not used by Relay.
*/
IFLA_AMT_REMOTE_IP,
/* This attribute specify Discovery ip address.
* When Gateway get started, it send discovery message to find the
* Relay's ip address.
* So, this is not used by Relay.
*/
IFLA_AMT_DISCOVERY_IP,
/* This attribute specify number of maximum tunnel. */
IFLA_AMT_MAX_TUNNELS,
__IFLA_AMT_MAX,
};
#define IFLA_AMT_MAX (__IFLA_AMT_MAX - 1)
#endif /* _AMT_H_ */

View File

@ -5,7 +5,7 @@
/*
* See http://icawww1.epfl.ch/linux-atm/magic.html for the complete list of
* See https://icawww1.epfl.ch/linux-atm/magic.html for the complete list of
* "magic" ioctl numbers.
*/

File diff suppressed because it is too large Load Diff

View File

@ -22,9 +22,9 @@ struct btf_header {
};
/* Max # of type identifier */
#define BTF_MAX_TYPE 0x0000ffff
#define BTF_MAX_TYPE 0x000fffff
/* Max offset into the string section */
#define BTF_MAX_NAME_OFFSET 0x0000ffff
#define BTF_MAX_NAME_OFFSET 0x00ffffff
/* Max # of struct/union/enum members or func args */
#define BTF_MAX_VLEN 0xffff
@ -34,13 +34,16 @@ struct btf_type {
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-27: kind (e.g. int, ptr, array...etc)
* bits 28-31: unused
* bits 28-30: unused
* bit 31: kind_flag, currently used by
* struct, union and fwd
*/
__u32 info;
/* "size" is used by INT, ENUM, STRUCT and UNION.
/* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST and RESTRICT.
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC, FUNC_PROTO, VAR and DECL_TAG.
* "type" is a type_id referring to another type.
*/
union {
@ -49,23 +52,33 @@ struct btf_type {
};
};
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f)
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x1f)
#define BTF_INFO_VLEN(info) ((info) & 0xffff)
#define BTF_INFO_KFLAG(info) ((info) >> 31)
#define BTF_KIND_UNKN 0 /* Unknown */
#define BTF_KIND_INT 1 /* Integer */
#define BTF_KIND_PTR 2 /* Pointer */
#define BTF_KIND_ARRAY 3 /* Array */
#define BTF_KIND_STRUCT 4 /* Struct */
#define BTF_KIND_UNION 5 /* Union */
#define BTF_KIND_ENUM 6 /* Enumeration */
#define BTF_KIND_FWD 7 /* Forward */
#define BTF_KIND_TYPEDEF 8 /* Typedef */
#define BTF_KIND_VOLATILE 9 /* Volatile */
#define BTF_KIND_CONST 10 /* Const */
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_MAX 11
#define NR_BTF_KINDS 12
enum {
BTF_KIND_UNKN = 0, /* Unknown */
BTF_KIND_INT = 1, /* Integer */
BTF_KIND_PTR = 2, /* Pointer */
BTF_KIND_ARRAY = 3, /* Array */
BTF_KIND_STRUCT = 4, /* Struct */
BTF_KIND_UNION = 5, /* Union */
BTF_KIND_ENUM = 6, /* Enumeration */
BTF_KIND_FWD = 7, /* Forward */
BTF_KIND_TYPEDEF = 8, /* Typedef */
BTF_KIND_VOLATILE = 9, /* Volatile */
BTF_KIND_CONST = 10, /* Const */
BTF_KIND_RESTRICT = 11, /* Restrict */
BTF_KIND_FUNC = 12, /* Function */
BTF_KIND_FUNC_PROTO = 13, /* Function Proto */
BTF_KIND_VAR = 14, /* Variable */
BTF_KIND_DATASEC = 15, /* Section */
BTF_KIND_FLOAT = 16, /* Floating point */
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
};
/* For some specific BTF_KIND, "struct btf_type" is immediately
* followed by extra data.
@ -75,7 +88,7 @@ struct btf_type {
* is the 32 bits arrangement:
*/
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
#define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff0000) >> 16)
#define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff)
/* Attributes stored in the BTF_INT_ENCODING */
@ -107,7 +120,69 @@ struct btf_array {
struct btf_member {
__u32 name_off;
__u32 type;
__u32 offset; /* offset in bits */
/* If the type info kind_flag is set, the btf_member offset
* contains both member bitfield size and bit offset. The
* bitfield size is set for bitfield members. If the type
* info kind_flag is not set, the offset contains only bit
* offset.
*/
__u32 offset;
};
/* If the struct/union type info kind_flag is set, the
* following two macros are used to access bitfield_size
* and bit_offset from btf_member.offset.
*/
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
/* BTF_KIND_FUNC_PROTO is followed by multiple "struct btf_param".
* The exact number of btf_param is stored in the vlen (of the
* info in "struct btf_type").
*/
struct btf_param {
__u32 name_off;
__u32 type;
};
enum {
BTF_VAR_STATIC = 0,
BTF_VAR_GLOBAL_ALLOCATED = 1,
BTF_VAR_GLOBAL_EXTERN = 2,
};
enum btf_func_linkage {
BTF_FUNC_STATIC = 0,
BTF_FUNC_GLOBAL = 1,
BTF_FUNC_EXTERN = 2,
};
/* BTF_KIND_VAR is followed by a single "struct btf_var" to describe
* additional information related to the variable such as its linkage.
*/
struct btf_var {
__u32 linkage;
};
/* BTF_KIND_DATASEC is followed by multiple "struct btf_var_secinfo"
* to describe all BTF_KIND_VAR types it contains along with it's
* in-section offset as well as size.
*/
struct btf_var_secinfo {
__u32 type;
__u32 offset;
__u32 size;
};
/* BTF_KIND_DECL_TAG is followed by a single "struct btf_decl_tag" to describe
* additional information related to the tag applied location.
* If component_idx == -1, the tag is applied to a struct, union,
* variable or function. Otherwise, it is applied to a struct/union
* member or a func argument, and component_idx indicates which member
* or argument (0 ... vlen-1).
*/
struct btf_decl_tag {
__s32 component_idx;
};
#endif /* __LINUX_BTF_H__ */

View File

@ -1,4 +1,4 @@
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
/* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) */
/*
* linux/can.h
*
@ -84,6 +84,7 @@ typedef __u32 can_err_mask_t;
/* CAN payload length and DLC definitions according to ISO 11898-1 */
#define CAN_MAX_DLC 8
#define CAN_MAX_RAW_DLC 15
#define CAN_MAX_DLEN 8
/* CAN FD payload length and DLC definitions according to ISO 11898-7 */
@ -91,30 +92,39 @@ typedef __u32 can_err_mask_t;
#define CANFD_MAX_DLEN 64
/**
* struct can_frame - basic CAN frame structure
* @can_id: CAN ID of the frame and CAN_*_FLAG flags, see canid_t definition
* @can_dlc: frame payload length in byte (0 .. 8) aka data length code
* N.B. the DLC field from ISO 11898-1 Chapter 8.4.2.3 has a 1:1
* mapping of the 'data length code' to the real payload length
* @__pad: padding
* @__res0: reserved / padding
* @__res1: reserved / padding
* @data: CAN frame payload (up to 8 byte)
* struct can_frame - Classical CAN frame structure (aka CAN 2.0B)
* @can_id: CAN ID of the frame and CAN_*_FLAG flags, see canid_t definition
* @len: CAN frame payload length in byte (0 .. 8)
* @can_dlc: deprecated name for CAN frame payload length in byte (0 .. 8)
* @__pad: padding
* @__res0: reserved / padding
* @len8_dlc: optional DLC value (9 .. 15) at 8 byte payload length
* len8_dlc contains values from 9 .. 15 when the payload length is
* 8 bytes but the DLC value (see ISO 11898-1) is greater then 8.
* CAN_CTRLMODE_CC_LEN8_DLC flag has to be enabled in CAN driver.
* @data: CAN frame payload (up to 8 byte)
*/
struct can_frame {
canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */
__u8 can_dlc; /* frame payload length in byte (0 .. CAN_MAX_DLEN) */
__u8 __pad; /* padding */
__u8 __res0; /* reserved / padding */
__u8 __res1; /* reserved / padding */
__u8 data[CAN_MAX_DLEN] __attribute__((aligned(8)));
union {
/* CAN frame payload length in byte (0 .. CAN_MAX_DLEN)
* was previously named can_dlc so we need to carry that
* name for legacy support
*/
__u8 len;
__u8 can_dlc; /* deprecated */
} __attribute__((packed)); /* disable padding added in some ABIs */
__u8 __pad; /* padding */
__u8 __res0; /* reserved / padding */
__u8 len8_dlc; /* optional DLC for 8 byte payload length (9 .. 15) */
__u8 data[CAN_MAX_DLEN] __attribute__((aligned(8)));
};
/*
* defined bits for canfd_frame.flags
*
* The use of struct canfd_frame implies the Extended Data Length (EDL) bit to
* be set in the CAN frame bitstream on the wire. The EDL bit switch turns
* The use of struct canfd_frame implies the FD Frame (FDF) bit to
* be set in the CAN frame bitstream on the wire. The FDF bit switch turns
* the CAN controllers bitstream processor into the CAN FD mode which creates
* two new options within the CAN FD frame specification:
*
@ -125,9 +135,18 @@ struct can_frame {
* controller only the CANFD_BRS bit is relevant for real CAN controllers when
* building a CAN FD frame for transmission. Setting the CANFD_ESI bit can make
* sense for virtual CAN interfaces to test applications with echoed frames.
*
* The struct can_frame and struct canfd_frame intentionally share the same
* layout to be able to write CAN frame content into a CAN FD frame structure.
* When this is done the former differentiation via CAN_MTU / CANFD_MTU gets
* lost. CANFD_FDF allows programmers to mark CAN FD frames in the case of
* using struct canfd_frame for mixed CAN / CAN FD content (dual use).
* N.B. the Kernel APIs do NOT provide mixed CAN / CAN FD content inside of
* struct canfd_frame therefore the CANFD_FDF flag is disregarded by Linux.
*/
#define CANFD_BRS 0x01 /* bit rate switch (second bitrate for payload data) */
#define CANFD_ESI 0x02 /* error state indicator of the transmitting node */
#define CANFD_FDF 0x04 /* mark CAN FD for dual use of struct canfd_frame */
/**
* struct canfd_frame - CAN flexible data rate frame structure
@ -157,7 +176,8 @@ struct canfd_frame {
#define CAN_TP20 4 /* VAG Transport Protocol v2.0 */
#define CAN_MCNET 5 /* Bosch MCNet */
#define CAN_ISOTP 6 /* ISO 15765-2 Transport Protocol */
#define CAN_NPROTO 7
#define CAN_J1939 7 /* SAE J1939 */
#define CAN_NPROTO 8
#define SOL_CAN_BASE 100
@ -174,6 +194,23 @@ struct sockaddr_can {
/* transport protocol class address information (e.g. ISOTP) */
struct { canid_t rx_id, tx_id; } tp;
/* J1939 address information */
struct {
/* 8 byte name when using dynamic addressing */
__u64 name;
/* pgn:
* 8 bit: PS in PDU2 case, else 0
* 8 bit: PF
* 1 bit: DP
* 1 bit: reserved
*/
__u32 pgn;
/* 1 byte address */
__u8 addr;
} j1939;
/* reserved for future CAN protocols address information */
} can_addr;
};

View File

@ -1,4 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
* linux/can/netlink.h
*
@ -40,15 +40,15 @@ struct can_bittiming {
};
/*
* CAN harware-dependent bit-timing constant
* CAN hardware-dependent bit-timing constant
*
* Used for calculating and checking bit-timing parameters
*/
struct can_bittiming_const {
char name[16]; /* Name of the CAN controller hardware */
__u32 tseg1_min; /* Time segement 1 = prop_seg + phase_seg1 */
__u32 tseg1_min; /* Time segment 1 = prop_seg + phase_seg1 */
__u32 tseg1_max;
__u32 tseg2_min; /* Time segement 2 = phase_seg2 */
__u32 tseg2_min; /* Time segment 2 = phase_seg2 */
__u32 tseg2_max;
__u32 sjw_max; /* Synchronisation jump width */
__u32 brp_min; /* Bit-rate prescaler */
@ -100,6 +100,9 @@ struct can_ctrlmode {
#define CAN_CTRLMODE_FD 0x20 /* CAN FD mode */
#define CAN_CTRLMODE_PRESUME_ACK 0x40 /* Ignore missing CAN ACKs */
#define CAN_CTRLMODE_FD_NON_ISO 0x80 /* CAN FD in non-ISO mode */
#define CAN_CTRLMODE_CC_LEN8_DLC 0x100 /* Classic CAN DLC option */
#define CAN_CTRLMODE_TDC_AUTO 0x200 /* CAN transiver automatically calculates TDCV */
#define CAN_CTRLMODE_TDC_MANUAL 0x400 /* TDCV is manually set up by user */
/*
* CAN device statistics
@ -133,10 +136,35 @@ enum {
IFLA_CAN_BITRATE_CONST,
IFLA_CAN_DATA_BITRATE_CONST,
IFLA_CAN_BITRATE_MAX,
__IFLA_CAN_MAX
IFLA_CAN_TDC,
/* add new constants above here */
__IFLA_CAN_MAX,
IFLA_CAN_MAX = __IFLA_CAN_MAX - 1
};
#define IFLA_CAN_MAX (__IFLA_CAN_MAX - 1)
/*
* CAN FD Transmitter Delay Compensation (TDC)
*
* Please refer to struct can_tdc_const and can_tdc in
* include/linux/can/bittiming.h for further details.
*/
enum {
IFLA_CAN_TDC_UNSPEC,
IFLA_CAN_TDC_TDCV_MIN, /* u32 */
IFLA_CAN_TDC_TDCV_MAX, /* u32 */
IFLA_CAN_TDC_TDCO_MIN, /* u32 */
IFLA_CAN_TDC_TDCO_MAX, /* u32 */
IFLA_CAN_TDC_TDCF_MIN, /* u32 */
IFLA_CAN_TDC_TDCF_MAX, /* u32 */
IFLA_CAN_TDC_TDCV, /* u32 */
IFLA_CAN_TDC_TDCO, /* u32 */
IFLA_CAN_TDC_TDCF, /* u32 */
/* add new constants above here */
__IFLA_CAN_TDC,
IFLA_CAN_TDC_MAX = __IFLA_CAN_TDC - 1
};
/* u16 termination range: 1..65535 Ohms */
#define CAN_TERMINATION_DISABLED 0

View File

@ -1,4 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
#ifndef _CAN_VXCAN_H
#define _CAN_VXCAN_H

View File

@ -0,0 +1,36 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/* const.h: Macros for dealing with constants. */
#ifndef _LINUX_CONST_H
#define _LINUX_CONST_H
/* Some constant macros are used in both assembler and
* C code. Therefore we cannot annotate them always with
* 'UL' and other type specifiers unilaterally. We
* use the following macros to deal with this.
*
* Similarly, _AT() will cast an expression with a type in C, but
* leave it unchanged in asm.
*/
#ifdef __ASSEMBLY__
#define _AC(X,Y) X
#define _AT(T,X) X
#else
#define __AC(X,Y) (X##Y)
#define _AC(X,Y) __AC(X,Y)
#define _AT(T,X) ((T)(X))
#endif
#define _UL(x) (_AC(x, UL))
#define _ULL(x) (_AC(x, ULL))
#define _BITUL(x) (_UL(1) << (x))
#define _BITULL(x) (_ULL(1) << (x))
#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
#endif /* _LINUX_CONST_H */

769
include/uapi/linux/dcbnl.h Normal file
View File

@ -0,0 +1,769 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (c) 2008-2011, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
* Author: Lucy Liu <lucy.liu@intel.com>
*/
#ifndef __LINUX_DCBNL_H__
#define __LINUX_DCBNL_H__
#include <linux/types.h>
/* IEEE 802.1Qaz std supported values */
#define IEEE_8021QAZ_MAX_TCS 8
#define IEEE_8021QAZ_TSA_STRICT 0
#define IEEE_8021QAZ_TSA_CB_SHAPER 1
#define IEEE_8021QAZ_TSA_ETS 2
#define IEEE_8021QAZ_TSA_VENDOR 255
/* This structure contains the IEEE 802.1Qaz ETS managed object
*
* @willing: willing bit in ETS configuration TLV
* @ets_cap: indicates supported capacity of ets feature
* @cbs: credit based shaper ets algorithm supported
* @tc_tx_bw: tc tx bandwidth indexed by traffic class
* @tc_rx_bw: tc rx bandwidth indexed by traffic class
* @tc_tsa: TSA Assignment table, indexed by traffic class
* @prio_tc: priority assignment table mapping 8021Qp to traffic class
* @tc_reco_bw: recommended tc bandwidth indexed by traffic class for TLV
* @tc_reco_tsa: recommended tc bandwidth indexed by traffic class for TLV
* @reco_prio_tc: recommended tc tx bandwidth indexed by traffic class for TLV
*
* Recommended values are used to set fields in the ETS recommendation TLV
* with hardware offloaded LLDP.
*
* ----
* TSA Assignment 8 bit identifiers
* 0 strict priority
* 1 credit-based shaper
* 2 enhanced transmission selection
* 3-254 reserved
* 255 vendor specific
*/
struct ieee_ets {
__u8 willing;
__u8 ets_cap;
__u8 cbs;
__u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_rx_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
__u8 prio_tc[IEEE_8021QAZ_MAX_TCS];
__u8 tc_reco_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_reco_tsa[IEEE_8021QAZ_MAX_TCS];
__u8 reco_prio_tc[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains rate limit extension to the IEEE 802.1Qaz ETS
* managed object.
* Values are 64 bits long and specified in Kbps to enable usage over both
* slow and very fast networks.
*
* @tc_maxrate: maximal tc tx bandwidth indexed by traffic class
*/
struct ieee_maxrate {
__u64 tc_maxrate[IEEE_8021QAZ_MAX_TCS];
};
enum dcbnl_cndd_states {
DCB_CNDD_RESET = 0,
DCB_CNDD_EDGE,
DCB_CNDD_INTERIOR,
DCB_CNDD_INTERIOR_READY,
};
/* This structure contains the IEEE 802.1Qau QCN managed object.
*
*@rpg_enable: enable QCN RP
*@rppp_max_rps: maximum number of RPs allowed for this CNPV on this port
*@rpg_time_reset: time between rate increases if no CNMs received.
* given in u-seconds
*@rpg_byte_reset: transmitted data between rate increases if no CNMs received.
* given in Bytes
*@rpg_threshold: The number of times rpByteStage or rpTimeStage can count
* before RP rate control state machine advances states
*@rpg_max_rate: the maxinun rate, in Mbits per second,
* at which an RP can transmit
*@rpg_ai_rate: The rate, in Mbits per second,
* used to increase rpTargetRate in the RPR_ACTIVE_INCREASE
*@rpg_hai_rate: The rate, in Mbits per second,
* used to increase rpTargetRate in the RPR_HYPER_INCREASE state
*@rpg_gd: Upon CNM receive, flow rate is limited to (Fb/Gd)*CurrentRate.
* rpgGd is given as log2(Gd), where Gd may only be powers of 2
*@rpg_min_dec_fac: The minimum factor by which the current transmit rate
* can be changed by reception of a CNM.
* value is given as percentage (1-100)
*@rpg_min_rate: The minimum value, in bits per second, for rate to limit
*@cndd_state_machine: The state of the congestion notification domain
* defense state machine, as defined by IEEE 802.3Qau
* section 32.1.1. In the interior ready state,
* the QCN capable hardware may add CN-TAG TLV to the
* outgoing traffic, to specifically identify outgoing
* flows.
*/
struct ieee_qcn {
__u8 rpg_enable[IEEE_8021QAZ_MAX_TCS];
__u32 rppp_max_rps[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_time_reset[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_byte_reset[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_threshold[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_max_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_ai_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_hai_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_gd[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_min_dec_fac[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_min_rate[IEEE_8021QAZ_MAX_TCS];
__u32 cndd_state_machine[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains the IEEE 802.1Qau QCN statistics.
*
*@rppp_rp_centiseconds: the number of RP-centiseconds accumulated
* by RPs at this priority level on this Port
*@rppp_created_rps: number of active RPs(flows) that react to CNMs
*/
struct ieee_qcn_stats {
__u64 rppp_rp_centiseconds[IEEE_8021QAZ_MAX_TCS];
__u32 rppp_created_rps[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains the IEEE 802.1Qaz PFC managed object
*
* @pfc_cap: Indicates the number of traffic classes on the local device
* that may simultaneously have PFC enabled.
* @pfc_en: bitmap indicating pfc enabled traffic classes
* @mbc: enable macsec bypass capability
* @delay: the allowance made for a round-trip propagation delay of the
* link in bits.
* @requests: count of the sent pfc frames
* @indications: count of the received pfc frames
*/
struct ieee_pfc {
__u8 pfc_cap;
__u8 pfc_en;
__u8 mbc;
__u16 delay;
__u64 requests[IEEE_8021QAZ_MAX_TCS];
__u64 indications[IEEE_8021QAZ_MAX_TCS];
};
#define IEEE_8021Q_MAX_PRIORITIES 8
#define DCBX_MAX_BUFFERS 8
struct dcbnl_buffer {
/* priority to buffer mapping */
__u8 prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
/* buffer size in Bytes */
__u32 buffer_size[DCBX_MAX_BUFFERS];
__u32 total_size;
};
/* CEE DCBX std supported values */
#define CEE_DCBX_MAX_PGS 8
#define CEE_DCBX_MAX_PRIO 8
/**
* struct cee_pg - CEE Priority-Group managed object
*
* @willing: willing bit in the PG tlv
* @error: error bit in the PG tlv
* @pg_en: enable bit of the PG feature
* @tcs_supported: number of traffic classes supported
* @pg_bw: bandwidth percentage for each priority group
* @prio_pg: priority to PG mapping indexed by priority
*/
struct cee_pg {
__u8 willing;
__u8 error;
__u8 pg_en;
__u8 tcs_supported;
__u8 pg_bw[CEE_DCBX_MAX_PGS];
__u8 prio_pg[CEE_DCBX_MAX_PGS];
};
/**
* struct cee_pfc - CEE PFC managed object
*
* @willing: willing bit in the PFC tlv
* @error: error bit in the PFC tlv
* @pfc_en: bitmap indicating pfc enabled traffic classes
* @tcs_supported: number of traffic classes supported
*/
struct cee_pfc {
__u8 willing;
__u8 error;
__u8 pfc_en;
__u8 tcs_supported;
};
/* IEEE 802.1Qaz std supported values */
#define IEEE_8021QAZ_APP_SEL_ETHERTYPE 1
#define IEEE_8021QAZ_APP_SEL_STREAM 2
#define IEEE_8021QAZ_APP_SEL_DGRAM 3
#define IEEE_8021QAZ_APP_SEL_ANY 4
#define IEEE_8021QAZ_APP_SEL_DSCP 5
/* This structure contains the IEEE 802.1Qaz APP managed object. This
* object is also used for the CEE std as well.
*
* @selector: protocol identifier type
* @protocol: protocol of type indicated
* @priority: 3-bit unsigned integer indicating priority for IEEE
* 8-bit 802.1p user priority bitmap for CEE
*
* ----
* Selector field values for IEEE 802.1Qaz
* 0 Reserved
* 1 Ethertype
* 2 Well known port number over TCP or SCTP
* 3 Well known port number over UDP or DCCP
* 4 Well known port number over TCP, SCTP, UDP, or DCCP
* 5 Differentiated Services Code Point (DSCP) value
* 6-7 Reserved
*
* Selector field values for CEE
* 0 Ethertype
* 1 Well known port number over TCP or UDP
* 2-3 Reserved
*/
struct dcb_app {
__u8 selector;
__u8 priority;
__u16 protocol;
};
/**
* struct dcb_peer_app_info - APP feature information sent by the peer
*
* @willing: willing bit in the peer APP tlv
* @error: error bit in the peer APP tlv
*
* In addition to this information the full peer APP tlv also contains
* a table of 'app_count' APP objects defined above.
*/
struct dcb_peer_app_info {
__u8 willing;
__u8 error;
};
struct dcbmsg {
__u8 dcb_family;
__u8 cmd;
__u16 dcb_pad;
};
/**
* enum dcbnl_commands - supported DCB commands
*
* @DCB_CMD_UNDEFINED: unspecified command to catch errors
* @DCB_CMD_GSTATE: request the state of DCB in the device
* @DCB_CMD_SSTATE: set the state of DCB in the device
* @DCB_CMD_PGTX_GCFG: request the priority group configuration for Tx
* @DCB_CMD_PGTX_SCFG: set the priority group configuration for Tx
* @DCB_CMD_PGRX_GCFG: request the priority group configuration for Rx
* @DCB_CMD_PGRX_SCFG: set the priority group configuration for Rx
* @DCB_CMD_PFC_GCFG: request the priority flow control configuration
* @DCB_CMD_PFC_SCFG: set the priority flow control configuration
* @DCB_CMD_SET_ALL: apply all changes to the underlying device
* @DCB_CMD_GPERM_HWADDR: get the permanent MAC address of the underlying
* device. Only useful when using bonding.
* @DCB_CMD_GCAP: request the DCB capabilities of the device
* @DCB_CMD_GNUMTCS: get the number of traffic classes currently supported
* @DCB_CMD_SNUMTCS: set the number of traffic classes
* @DCB_CMD_GBCN: set backward congestion notification configuration
* @DCB_CMD_SBCN: get backward congestion notification configuration.
* @DCB_CMD_GAPP: get application protocol configuration
* @DCB_CMD_SAPP: set application protocol configuration
* @DCB_CMD_IEEE_SET: set IEEE 802.1Qaz configuration
* @DCB_CMD_IEEE_GET: get IEEE 802.1Qaz configuration
* @DCB_CMD_GDCBX: get DCBX engine configuration
* @DCB_CMD_SDCBX: set DCBX engine configuration
* @DCB_CMD_GFEATCFG: get DCBX features flags
* @DCB_CMD_SFEATCFG: set DCBX features negotiation flags
* @DCB_CMD_CEE_GET: get CEE aggregated configuration
* @DCB_CMD_IEEE_DEL: delete IEEE 802.1Qaz configuration
*/
enum dcbnl_commands {
DCB_CMD_UNDEFINED,
DCB_CMD_GSTATE,
DCB_CMD_SSTATE,
DCB_CMD_PGTX_GCFG,
DCB_CMD_PGTX_SCFG,
DCB_CMD_PGRX_GCFG,
DCB_CMD_PGRX_SCFG,
DCB_CMD_PFC_GCFG,
DCB_CMD_PFC_SCFG,
DCB_CMD_SET_ALL,
DCB_CMD_GPERM_HWADDR,
DCB_CMD_GCAP,
DCB_CMD_GNUMTCS,
DCB_CMD_SNUMTCS,
DCB_CMD_PFC_GSTATE,
DCB_CMD_PFC_SSTATE,
DCB_CMD_BCN_GCFG,
DCB_CMD_BCN_SCFG,
DCB_CMD_GAPP,
DCB_CMD_SAPP,
DCB_CMD_IEEE_SET,
DCB_CMD_IEEE_GET,
DCB_CMD_GDCBX,
DCB_CMD_SDCBX,
DCB_CMD_GFEATCFG,
DCB_CMD_SFEATCFG,
DCB_CMD_CEE_GET,
DCB_CMD_IEEE_DEL,
__DCB_CMD_ENUM_MAX,
DCB_CMD_MAX = __DCB_CMD_ENUM_MAX - 1,
};
/**
* enum dcbnl_attrs - DCB top-level netlink attributes
*
* @DCB_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_ATTR_IFNAME: interface name of the underlying device (NLA_STRING)
* @DCB_ATTR_STATE: enable state of DCB in the device (NLA_U8)
* @DCB_ATTR_PFC_STATE: enable state of PFC in the device (NLA_U8)
* @DCB_ATTR_PFC_CFG: priority flow control configuration (NLA_NESTED)
* @DCB_ATTR_NUM_TC: number of traffic classes supported in the device (NLA_U8)
* @DCB_ATTR_PG_CFG: priority group configuration (NLA_NESTED)
* @DCB_ATTR_SET_ALL: bool to commit changes to hardware or not (NLA_U8)
* @DCB_ATTR_PERM_HWADDR: MAC address of the physical device (NLA_NESTED)
* @DCB_ATTR_CAP: DCB capabilities of the device (NLA_NESTED)
* @DCB_ATTR_NUMTCS: number of traffic classes supported (NLA_NESTED)
* @DCB_ATTR_BCN: backward congestion notification configuration (NLA_NESTED)
* @DCB_ATTR_IEEE: IEEE 802.1Qaz supported attributes (NLA_NESTED)
* @DCB_ATTR_DCBX: DCBX engine configuration in the device (NLA_U8)
* @DCB_ATTR_FEATCFG: DCBX features flags (NLA_NESTED)
* @DCB_ATTR_CEE: CEE std supported attributes (NLA_NESTED)
*/
enum dcbnl_attrs {
DCB_ATTR_UNDEFINED,
DCB_ATTR_IFNAME,
DCB_ATTR_STATE,
DCB_ATTR_PFC_STATE,
DCB_ATTR_PFC_CFG,
DCB_ATTR_NUM_TC,
DCB_ATTR_PG_CFG,
DCB_ATTR_SET_ALL,
DCB_ATTR_PERM_HWADDR,
DCB_ATTR_CAP,
DCB_ATTR_NUMTCS,
DCB_ATTR_BCN,
DCB_ATTR_APP,
/* IEEE std attributes */
DCB_ATTR_IEEE,
DCB_ATTR_DCBX,
DCB_ATTR_FEATCFG,
/* CEE nested attributes */
DCB_ATTR_CEE,
__DCB_ATTR_ENUM_MAX,
DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
};
/**
* enum ieee_attrs - IEEE 802.1Qaz get/set attributes
*
* @DCB_ATTR_IEEE_UNSPEC: unspecified
* @DCB_ATTR_IEEE_ETS: negotiated ETS configuration
* @DCB_ATTR_IEEE_PFC: negotiated PFC configuration
* @DCB_ATTR_IEEE_APP_TABLE: negotiated APP configuration
* @DCB_ATTR_IEEE_PEER_ETS: peer ETS configuration - get only
* @DCB_ATTR_IEEE_PEER_PFC: peer PFC configuration - get only
* @DCB_ATTR_IEEE_PEER_APP: peer APP tlv - get only
*/
enum ieee_attrs {
DCB_ATTR_IEEE_UNSPEC,
DCB_ATTR_IEEE_ETS,
DCB_ATTR_IEEE_PFC,
DCB_ATTR_IEEE_APP_TABLE,
DCB_ATTR_IEEE_PEER_ETS,
DCB_ATTR_IEEE_PEER_PFC,
DCB_ATTR_IEEE_PEER_APP,
DCB_ATTR_IEEE_MAXRATE,
DCB_ATTR_IEEE_QCN,
DCB_ATTR_IEEE_QCN_STATS,
DCB_ATTR_DCB_BUFFER,
__DCB_ATTR_IEEE_MAX
};
#define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
enum ieee_attrs_app {
DCB_ATTR_IEEE_APP_UNSPEC,
DCB_ATTR_IEEE_APP,
__DCB_ATTR_IEEE_APP_MAX
};
#define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
/**
* enum cee_attrs - CEE DCBX get attributes.
*
* @DCB_ATTR_CEE_UNSPEC: unspecified
* @DCB_ATTR_CEE_PEER_PG: peer PG configuration - get only
* @DCB_ATTR_CEE_PEER_PFC: peer PFC configuration - get only
* @DCB_ATTR_CEE_PEER_APP_TABLE: peer APP tlv - get only
* @DCB_ATTR_CEE_TX_PG: TX PG configuration (DCB_CMD_PGTX_GCFG)
* @DCB_ATTR_CEE_RX_PG: RX PG configuration (DCB_CMD_PGRX_GCFG)
* @DCB_ATTR_CEE_PFC: PFC configuration (DCB_CMD_PFC_GCFG)
* @DCB_ATTR_CEE_APP_TABLE: APP configuration (multi DCB_CMD_GAPP)
* @DCB_ATTR_CEE_FEAT: DCBX features flags (DCB_CMD_GFEATCFG)
*
* An aggregated collection of the cee std negotiated parameters.
*/
enum cee_attrs {
DCB_ATTR_CEE_UNSPEC,
DCB_ATTR_CEE_PEER_PG,
DCB_ATTR_CEE_PEER_PFC,
DCB_ATTR_CEE_PEER_APP_TABLE,
DCB_ATTR_CEE_TX_PG,
DCB_ATTR_CEE_RX_PG,
DCB_ATTR_CEE_PFC,
DCB_ATTR_CEE_APP_TABLE,
DCB_ATTR_CEE_FEAT,
__DCB_ATTR_CEE_MAX
};
#define DCB_ATTR_CEE_MAX (__DCB_ATTR_CEE_MAX - 1)
enum peer_app_attr {
DCB_ATTR_CEE_PEER_APP_UNSPEC,
DCB_ATTR_CEE_PEER_APP_INFO,
DCB_ATTR_CEE_PEER_APP,
__DCB_ATTR_CEE_PEER_APP_MAX
};
#define DCB_ATTR_CEE_PEER_APP_MAX (__DCB_ATTR_CEE_PEER_APP_MAX - 1)
enum cee_attrs_app {
DCB_ATTR_CEE_APP_UNSPEC,
DCB_ATTR_CEE_APP,
__DCB_ATTR_CEE_APP_MAX
};
#define DCB_ATTR_CEE_APP_MAX (__DCB_ATTR_CEE_APP_MAX - 1)
/**
* enum dcbnl_pfc_attrs - DCB Priority Flow Control user priority nested attrs
*
* @DCB_PFC_UP_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_PFC_UP_ATTR_0: Priority Flow Control value for User Priority 0 (NLA_U8)
* @DCB_PFC_UP_ATTR_1: Priority Flow Control value for User Priority 1 (NLA_U8)
* @DCB_PFC_UP_ATTR_2: Priority Flow Control value for User Priority 2 (NLA_U8)
* @DCB_PFC_UP_ATTR_3: Priority Flow Control value for User Priority 3 (NLA_U8)
* @DCB_PFC_UP_ATTR_4: Priority Flow Control value for User Priority 4 (NLA_U8)
* @DCB_PFC_UP_ATTR_5: Priority Flow Control value for User Priority 5 (NLA_U8)
* @DCB_PFC_UP_ATTR_6: Priority Flow Control value for User Priority 6 (NLA_U8)
* @DCB_PFC_UP_ATTR_7: Priority Flow Control value for User Priority 7 (NLA_U8)
* @DCB_PFC_UP_ATTR_MAX: highest attribute number currently defined
* @DCB_PFC_UP_ATTR_ALL: apply to all priority flow control attrs (NLA_FLAG)
*
*/
enum dcbnl_pfc_up_attrs {
DCB_PFC_UP_ATTR_UNDEFINED,
DCB_PFC_UP_ATTR_0,
DCB_PFC_UP_ATTR_1,
DCB_PFC_UP_ATTR_2,
DCB_PFC_UP_ATTR_3,
DCB_PFC_UP_ATTR_4,
DCB_PFC_UP_ATTR_5,
DCB_PFC_UP_ATTR_6,
DCB_PFC_UP_ATTR_7,
DCB_PFC_UP_ATTR_ALL,
__DCB_PFC_UP_ATTR_ENUM_MAX,
DCB_PFC_UP_ATTR_MAX = __DCB_PFC_UP_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_pg_attrs - DCB Priority Group attributes
*
* @DCB_PG_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_PG_ATTR_TC_0: Priority Group Traffic Class 0 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_1: Priority Group Traffic Class 1 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_2: Priority Group Traffic Class 2 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_3: Priority Group Traffic Class 3 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_4: Priority Group Traffic Class 4 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_5: Priority Group Traffic Class 5 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_6: Priority Group Traffic Class 6 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_7: Priority Group Traffic Class 7 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_MAX: highest attribute number currently defined
* @DCB_PG_ATTR_TC_ALL: apply to all traffic classes (NLA_NESTED)
* @DCB_PG_ATTR_BW_ID_0: Percent of link bandwidth for Priority Group 0 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_1: Percent of link bandwidth for Priority Group 1 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_2: Percent of link bandwidth for Priority Group 2 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_3: Percent of link bandwidth for Priority Group 3 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_4: Percent of link bandwidth for Priority Group 4 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_5: Percent of link bandwidth for Priority Group 5 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_6: Percent of link bandwidth for Priority Group 6 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_7: Percent of link bandwidth for Priority Group 7 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_MAX: highest attribute number currently defined
* @DCB_PG_ATTR_BW_ID_ALL: apply to all priority groups (NLA_FLAG)
*
*/
enum dcbnl_pg_attrs {
DCB_PG_ATTR_UNDEFINED,
DCB_PG_ATTR_TC_0,
DCB_PG_ATTR_TC_1,
DCB_PG_ATTR_TC_2,
DCB_PG_ATTR_TC_3,
DCB_PG_ATTR_TC_4,
DCB_PG_ATTR_TC_5,
DCB_PG_ATTR_TC_6,
DCB_PG_ATTR_TC_7,
DCB_PG_ATTR_TC_MAX,
DCB_PG_ATTR_TC_ALL,
DCB_PG_ATTR_BW_ID_0,
DCB_PG_ATTR_BW_ID_1,
DCB_PG_ATTR_BW_ID_2,
DCB_PG_ATTR_BW_ID_3,
DCB_PG_ATTR_BW_ID_4,
DCB_PG_ATTR_BW_ID_5,
DCB_PG_ATTR_BW_ID_6,
DCB_PG_ATTR_BW_ID_7,
DCB_PG_ATTR_BW_ID_MAX,
DCB_PG_ATTR_BW_ID_ALL,
__DCB_PG_ATTR_ENUM_MAX,
DCB_PG_ATTR_MAX = __DCB_PG_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_tc_attrs - DCB Traffic Class attributes
*
* @DCB_TC_ATTR_PARAM_UNDEFINED: unspecified attribute to catch errors
* @DCB_TC_ATTR_PARAM_PGID: (NLA_U8) Priority group the traffic class belongs to
* Valid values are: 0-7
* @DCB_TC_ATTR_PARAM_UP_MAPPING: (NLA_U8) Traffic class to user priority map
* Some devices may not support changing the
* user priority map of a TC.
* @DCB_TC_ATTR_PARAM_STRICT_PRIO: (NLA_U8) Strict priority setting
* 0 - none
* 1 - group strict
* 2 - link strict
* @DCB_TC_ATTR_PARAM_BW_PCT: optional - (NLA_U8) If supported by the device and
* not configured to use link strict priority,
* this is the percentage of bandwidth of the
* priority group this traffic class belongs to
* @DCB_TC_ATTR_PARAM_ALL: (NLA_FLAG) all traffic class parameters
*
*/
enum dcbnl_tc_attrs {
DCB_TC_ATTR_PARAM_UNDEFINED,
DCB_TC_ATTR_PARAM_PGID,
DCB_TC_ATTR_PARAM_UP_MAPPING,
DCB_TC_ATTR_PARAM_STRICT_PRIO,
DCB_TC_ATTR_PARAM_BW_PCT,
DCB_TC_ATTR_PARAM_ALL,
__DCB_TC_ATTR_PARAM_ENUM_MAX,
DCB_TC_ATTR_PARAM_MAX = __DCB_TC_ATTR_PARAM_ENUM_MAX - 1,
};
/**
* enum dcbnl_cap_attrs - DCB Capability attributes
*
* @DCB_CAP_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_CAP_ATTR_ALL: (NLA_FLAG) all capability parameters
* @DCB_CAP_ATTR_PG: (NLA_U8) device supports Priority Groups
* @DCB_CAP_ATTR_PFC: (NLA_U8) device supports Priority Flow Control
* @DCB_CAP_ATTR_UP2TC: (NLA_U8) device supports user priority to
* traffic class mapping
* @DCB_CAP_ATTR_PG_TCS: (NLA_U8) bitmap where each bit represents a
* number of traffic classes the device
* can be configured to use for Priority Groups
* @DCB_CAP_ATTR_PFC_TCS: (NLA_U8) bitmap where each bit represents a
* number of traffic classes the device can be
* configured to use for Priority Flow Control
* @DCB_CAP_ATTR_GSP: (NLA_U8) device supports group strict priority
* @DCB_CAP_ATTR_BCN: (NLA_U8) device supports Backwards Congestion
* Notification
* @DCB_CAP_ATTR_DCBX: (NLA_U8) device supports DCBX engine
*
*/
enum dcbnl_cap_attrs {
DCB_CAP_ATTR_UNDEFINED,
DCB_CAP_ATTR_ALL,
DCB_CAP_ATTR_PG,
DCB_CAP_ATTR_PFC,
DCB_CAP_ATTR_UP2TC,
DCB_CAP_ATTR_PG_TCS,
DCB_CAP_ATTR_PFC_TCS,
DCB_CAP_ATTR_GSP,
DCB_CAP_ATTR_BCN,
DCB_CAP_ATTR_DCBX,
__DCB_CAP_ATTR_ENUM_MAX,
DCB_CAP_ATTR_MAX = __DCB_CAP_ATTR_ENUM_MAX - 1,
};
/**
* DCBX capability flags
*
* @DCB_CAP_DCBX_HOST: DCBX negotiation is performed by the host LLDP agent.
* 'set' routines are used to configure the device with
* the negotiated parameters
*
* @DCB_CAP_DCBX_LLD_MANAGED: DCBX negotiation is not performed in the host but
* by another entity
* 'get' routines are used to retrieve the
* negotiated parameters
* 'set' routines can be used to set the initial
* negotiation configuration
*
* @DCB_CAP_DCBX_VER_CEE: for a non-host DCBX engine, indicates the engine
* supports the CEE protocol flavor
*
* @DCB_CAP_DCBX_VER_IEEE: for a non-host DCBX engine, indicates the engine
* supports the IEEE protocol flavor
*
* @DCB_CAP_DCBX_STATIC: for a non-host DCBX engine, indicates the engine
* supports static configuration (i.e no actual
* negotiation is performed negotiated parameters equal
* the initial configuration)
*
*/
#define DCB_CAP_DCBX_HOST 0x01
#define DCB_CAP_DCBX_LLD_MANAGED 0x02
#define DCB_CAP_DCBX_VER_CEE 0x04
#define DCB_CAP_DCBX_VER_IEEE 0x08
#define DCB_CAP_DCBX_STATIC 0x10
/**
* enum dcbnl_numtcs_attrs - number of traffic classes
*
* @DCB_NUMTCS_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_NUMTCS_ATTR_ALL: (NLA_FLAG) all traffic class attributes
* @DCB_NUMTCS_ATTR_PG: (NLA_U8) number of traffic classes used for
* priority groups
* @DCB_NUMTCS_ATTR_PFC: (NLA_U8) number of traffic classes which can
* support priority flow control
*/
enum dcbnl_numtcs_attrs {
DCB_NUMTCS_ATTR_UNDEFINED,
DCB_NUMTCS_ATTR_ALL,
DCB_NUMTCS_ATTR_PG,
DCB_NUMTCS_ATTR_PFC,
__DCB_NUMTCS_ATTR_ENUM_MAX,
DCB_NUMTCS_ATTR_MAX = __DCB_NUMTCS_ATTR_ENUM_MAX - 1,
};
enum dcbnl_bcn_attrs{
DCB_BCN_ATTR_UNDEFINED = 0,
DCB_BCN_ATTR_RP_0,
DCB_BCN_ATTR_RP_1,
DCB_BCN_ATTR_RP_2,
DCB_BCN_ATTR_RP_3,
DCB_BCN_ATTR_RP_4,
DCB_BCN_ATTR_RP_5,
DCB_BCN_ATTR_RP_6,
DCB_BCN_ATTR_RP_7,
DCB_BCN_ATTR_RP_ALL,
DCB_BCN_ATTR_BCNA_0,
DCB_BCN_ATTR_BCNA_1,
DCB_BCN_ATTR_ALPHA,
DCB_BCN_ATTR_BETA,
DCB_BCN_ATTR_GD,
DCB_BCN_ATTR_GI,
DCB_BCN_ATTR_TMAX,
DCB_BCN_ATTR_TD,
DCB_BCN_ATTR_RMIN,
DCB_BCN_ATTR_W,
DCB_BCN_ATTR_RD,
DCB_BCN_ATTR_RU,
DCB_BCN_ATTR_WRTT,
DCB_BCN_ATTR_RI,
DCB_BCN_ATTR_C,
DCB_BCN_ATTR_ALL,
__DCB_BCN_ATTR_ENUM_MAX,
DCB_BCN_ATTR_MAX = __DCB_BCN_ATTR_ENUM_MAX - 1,
};
/**
* enum dcb_general_attr_values - general DCB attribute values
*
* @DCB_ATTR_UNDEFINED: value used to indicate an attribute is not supported
*
*/
enum dcb_general_attr_values {
DCB_ATTR_VALUE_UNDEFINED = 0xff
};
#define DCB_APP_IDTYPE_ETHTYPE 0x00
#define DCB_APP_IDTYPE_PORTNUM 0x01
enum dcbnl_app_attrs {
DCB_APP_ATTR_UNDEFINED,
DCB_APP_ATTR_IDTYPE,
DCB_APP_ATTR_ID,
DCB_APP_ATTR_PRIORITY,
__DCB_APP_ATTR_ENUM_MAX,
DCB_APP_ATTR_MAX = __DCB_APP_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_featcfg_attrs - features conifiguration flags
*
* @DCB_FEATCFG_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_FEATCFG_ATTR_ALL: (NLA_FLAG) all features configuration attributes
* @DCB_FEATCFG_ATTR_PG: (NLA_U8) configuration flags for priority groups
* @DCB_FEATCFG_ATTR_PFC: (NLA_U8) configuration flags for priority
* flow control
* @DCB_FEATCFG_ATTR_APP: (NLA_U8) configuration flags for application TLV
*
*/
#define DCB_FEATCFG_ERROR 0x01 /* error in feature resolution */
#define DCB_FEATCFG_ENABLE 0x02 /* enable feature */
#define DCB_FEATCFG_WILLING 0x04 /* feature is willing */
#define DCB_FEATCFG_ADVERTISE 0x08 /* advertise feature */
enum dcbnl_featcfg_attrs {
DCB_FEATCFG_ATTR_UNDEFINED,
DCB_FEATCFG_ATTR_ALL,
DCB_FEATCFG_ATTR_PG,
DCB_FEATCFG_ATTR_PFC,
DCB_FEATCFG_ATTR_APP,
__DCB_FEATCFG_ATTR_ENUM_MAX,
DCB_FEATCFG_ATTR_MAX = __DCB_FEATCFG_ATTR_ENUM_MAX - 1,
};
#endif /* __LINUX_DCBNL_H__ */

View File

@ -13,6 +13,8 @@
#ifndef _LINUX_DEVLINK_H_
#define _LINUX_DEVLINK_H_
#include <linux/const.h>
#define DEVLINK_GENL_NAME "devlink"
#define DEVLINK_GENL_VERSION 0x1
#define DEVLINK_GENL_MCGRP_CONFIG_NAME "config"
@ -89,6 +91,46 @@ enum devlink_command {
DEVLINK_CMD_REGION_DEL,
DEVLINK_CMD_REGION_READ,
DEVLINK_CMD_PORT_PARAM_GET, /* can dump */
DEVLINK_CMD_PORT_PARAM_SET,
DEVLINK_CMD_PORT_PARAM_NEW,
DEVLINK_CMD_PORT_PARAM_DEL,
DEVLINK_CMD_INFO_GET, /* can dump */
DEVLINK_CMD_HEALTH_REPORTER_GET,
DEVLINK_CMD_HEALTH_REPORTER_SET,
DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
DEVLINK_CMD_FLASH_UPDATE,
DEVLINK_CMD_FLASH_UPDATE_END, /* notification only */
DEVLINK_CMD_FLASH_UPDATE_STATUS, /* notification only */
DEVLINK_CMD_TRAP_GET, /* can dump */
DEVLINK_CMD_TRAP_SET,
DEVLINK_CMD_TRAP_NEW,
DEVLINK_CMD_TRAP_DEL,
DEVLINK_CMD_TRAP_GROUP_GET, /* can dump */
DEVLINK_CMD_TRAP_GROUP_SET,
DEVLINK_CMD_TRAP_GROUP_NEW,
DEVLINK_CMD_TRAP_GROUP_DEL,
DEVLINK_CMD_TRAP_POLICER_GET, /* can dump */
DEVLINK_CMD_TRAP_POLICER_SET,
DEVLINK_CMD_TRAP_POLICER_NEW,
DEVLINK_CMD_TRAP_POLICER_DEL,
DEVLINK_CMD_HEALTH_REPORTER_TEST,
DEVLINK_CMD_RATE_GET, /* can dump */
DEVLINK_CMD_RATE_SET,
DEVLINK_CMD_RATE_NEW,
DEVLINK_CMD_RATE_DEL,
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@ -151,6 +193,27 @@ enum devlink_port_flavour {
DEVLINK_PORT_FLAVOUR_DSA, /* Distributed switch architecture
* interconnect port.
*/
DEVLINK_PORT_FLAVOUR_PCI_PF, /* Represents eswitch port for
* the PCI PF. It is an internal
* port that faces the PCI PF.
*/
DEVLINK_PORT_FLAVOUR_PCI_VF, /* Represents eswitch port
* for the PCI VF. It is an internal
* port that faces the PCI VF.
*/
DEVLINK_PORT_FLAVOUR_VIRTUAL, /* Any virtual port facing the user. */
DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but
* is not used in any way.
*/
DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
* for the PCI SF. It is an internal
* port that faces the PCI SF.
*/
};
enum devlink_rate_type {
DEVLINK_RATE_TYPE_LEAF,
DEVLINK_RATE_TYPE_NODE,
};
enum devlink_param_cmode {
@ -163,6 +226,118 @@ enum devlink_param_cmode {
DEVLINK_PARAM_CMODE_MAX = __DEVLINK_PARAM_CMODE_MAX - 1
};
enum devlink_param_fw_load_policy_value {
DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER,
DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH,
DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK,
DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_UNKNOWN,
};
enum devlink_param_reset_dev_on_drv_probe_value {
DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN,
DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS,
DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER,
DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK,
};
enum {
DEVLINK_ATTR_STATS_RX_PACKETS, /* u64 */
DEVLINK_ATTR_STATS_RX_BYTES, /* u64 */
DEVLINK_ATTR_STATS_RX_DROPPED, /* u64 */
__DEVLINK_ATTR_STATS_MAX,
DEVLINK_ATTR_STATS_MAX = __DEVLINK_ATTR_STATS_MAX - 1
};
/* Specify what sections of a flash component can be overwritten when
* performing an update. Overwriting of firmware binary sections is always
* implicitly assumed to be allowed.
*
* Each section must be documented in
* Documentation/networking/devlink/devlink-flash.rst
*
*/
enum {
DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT,
DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT,
__DEVLINK_FLASH_OVERWRITE_MAX_BIT,
DEVLINK_FLASH_OVERWRITE_MAX_BIT = __DEVLINK_FLASH_OVERWRITE_MAX_BIT - 1
};
#define DEVLINK_FLASH_OVERWRITE_SETTINGS _BITUL(DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT)
#define DEVLINK_FLASH_OVERWRITE_IDENTIFIERS _BITUL(DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT)
#define DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS \
(_BITUL(__DEVLINK_FLASH_OVERWRITE_MAX_BIT) - 1)
/**
* enum devlink_trap_action - Packet trap action.
* @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is not
* sent to the CPU.
* @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU.
* @DEVLINK_TRAP_ACTION_MIRROR: Packet is forwarded by the device and a copy is
* sent to the CPU.
*/
enum devlink_trap_action {
DEVLINK_TRAP_ACTION_DROP,
DEVLINK_TRAP_ACTION_TRAP,
DEVLINK_TRAP_ACTION_MIRROR,
};
/**
* enum devlink_trap_type - Packet trap type.
* @DEVLINK_TRAP_TYPE_DROP: Trap reason is a drop. Trapped packets are only
* processed by devlink and not injected to the
* kernel's Rx path.
* @DEVLINK_TRAP_TYPE_EXCEPTION: Trap reason is an exception. Packet was not
* forwarded as intended due to an exception
* (e.g., missing neighbour entry) and trapped to
* control plane for resolution. Trapped packets
* are processed by devlink and injected to
* the kernel's Rx path.
* @DEVLINK_TRAP_TYPE_CONTROL: Packet was trapped because it is required for
* the correct functioning of the control plane.
* For example, an ARP request packet. Trapped
* packets are injected to the kernel's Rx path,
* but not reported to drop monitor.
*/
enum devlink_trap_type {
DEVLINK_TRAP_TYPE_DROP,
DEVLINK_TRAP_TYPE_EXCEPTION,
DEVLINK_TRAP_TYPE_CONTROL,
};
enum {
/* Trap can report input port as metadata */
DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT,
/* Trap can report flow action cookie as metadata */
DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE,
};
enum devlink_reload_action {
DEVLINK_RELOAD_ACTION_UNSPEC,
DEVLINK_RELOAD_ACTION_DRIVER_REINIT, /* Driver entities re-instantiation */
DEVLINK_RELOAD_ACTION_FW_ACTIVATE, /* FW activate */
/* Add new reload actions above */
__DEVLINK_RELOAD_ACTION_MAX,
DEVLINK_RELOAD_ACTION_MAX = __DEVLINK_RELOAD_ACTION_MAX - 1
};
enum devlink_reload_limit {
DEVLINK_RELOAD_LIMIT_UNSPEC, /* unspecified, no constraints */
DEVLINK_RELOAD_LIMIT_NO_RESET, /* No reset allowed, no down time allowed,
* no link flap and no configuration is lost.
*/
/* Add new reload limit above */
__DEVLINK_RELOAD_LIMIT_MAX,
DEVLINK_RELOAD_LIMIT_MAX = __DEVLINK_RELOAD_LIMIT_MAX - 1
};
#define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@ -280,6 +455,104 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
DEVLINK_ATTR_REGION_CHUNK_LEN, /* u64 */
DEVLINK_ATTR_INFO_DRIVER_NAME, /* string */
DEVLINK_ATTR_INFO_SERIAL_NUMBER, /* string */
DEVLINK_ATTR_INFO_VERSION_FIXED, /* nested */
DEVLINK_ATTR_INFO_VERSION_RUNNING, /* nested */
DEVLINK_ATTR_INFO_VERSION_STORED, /* nested */
DEVLINK_ATTR_INFO_VERSION_NAME, /* string */
DEVLINK_ATTR_INFO_VERSION_VALUE, /* string */
DEVLINK_ATTR_SB_POOL_CELL_SIZE, /* u32 */
DEVLINK_ATTR_FMSG, /* nested */
DEVLINK_ATTR_FMSG_OBJ_NEST_START, /* flag */
DEVLINK_ATTR_FMSG_PAIR_NEST_START, /* flag */
DEVLINK_ATTR_FMSG_ARR_NEST_START, /* flag */
DEVLINK_ATTR_FMSG_NEST_END, /* flag */
DEVLINK_ATTR_FMSG_OBJ_NAME, /* string */
DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, /* u8 */
DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA, /* dynamic */
DEVLINK_ATTR_HEALTH_REPORTER, /* nested */
DEVLINK_ATTR_HEALTH_REPORTER_NAME, /* string */
DEVLINK_ATTR_HEALTH_REPORTER_STATE, /* u8 */
DEVLINK_ATTR_HEALTH_REPORTER_ERR_COUNT, /* u64 */
DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT, /* u64 */
DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, /* u64 */
DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, /* u64 */
DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, /* u8 */
DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME, /* string */
DEVLINK_ATTR_FLASH_UPDATE_COMPONENT, /* string */
DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG, /* string */
DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE, /* u64 */
DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL, /* u64 */
DEVLINK_ATTR_PORT_PCI_PF_NUMBER, /* u16 */
DEVLINK_ATTR_PORT_PCI_VF_NUMBER, /* u16 */
DEVLINK_ATTR_STATS, /* nested */
DEVLINK_ATTR_TRAP_NAME, /* string */
/* enum devlink_trap_action */
DEVLINK_ATTR_TRAP_ACTION, /* u8 */
/* enum devlink_trap_type */
DEVLINK_ATTR_TRAP_TYPE, /* u8 */
DEVLINK_ATTR_TRAP_GENERIC, /* flag */
DEVLINK_ATTR_TRAP_METADATA, /* nested */
DEVLINK_ATTR_TRAP_GROUP_NAME, /* string */
DEVLINK_ATTR_RELOAD_FAILED, /* u8 0 or 1 */
DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, /* u64 */
DEVLINK_ATTR_NETNS_FD, /* u32 */
DEVLINK_ATTR_NETNS_PID, /* u32 */
DEVLINK_ATTR_NETNS_ID, /* u32 */
DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, /* u8 */
DEVLINK_ATTR_TRAP_POLICER_ID, /* u32 */
DEVLINK_ATTR_TRAP_POLICER_RATE, /* u64 */
DEVLINK_ATTR_TRAP_POLICER_BURST, /* u64 */
DEVLINK_ATTR_PORT_FUNCTION, /* nested */
DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, /* string */
DEVLINK_ATTR_PORT_LANES, /* u32 */
DEVLINK_ATTR_PORT_SPLITTABLE, /* u8 */
DEVLINK_ATTR_PORT_EXTERNAL, /* u8 */
DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, /* u32 */
DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, /* u64 */
DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK, /* bitfield32 */
DEVLINK_ATTR_RELOAD_ACTION, /* u8 */
DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, /* bitfield32 */
DEVLINK_ATTR_RELOAD_LIMITS, /* bitfield32 */
DEVLINK_ATTR_DEV_STATS, /* nested */
DEVLINK_ATTR_RELOAD_STATS, /* nested */
DEVLINK_ATTR_RELOAD_STATS_ENTRY, /* nested */
DEVLINK_ATTR_RELOAD_STATS_LIMIT, /* u8 */
DEVLINK_ATTR_RELOAD_STATS_VALUE, /* u32 */
DEVLINK_ATTR_REMOTE_RELOAD_STATS, /* nested */
DEVLINK_ATTR_RELOAD_ACTION_INFO, /* nested */
DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */
DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */
DEVLINK_ATTR_RATE_TYPE, /* u16 */
DEVLINK_ATTR_RATE_TX_SHARE, /* u64 */
DEVLINK_ATTR_RATE_TX_MAX, /* u64 */
DEVLINK_ATTR_RATE_NODE_NAME, /* string */
DEVLINK_ATTR_RATE_PARENT_NODE_NAME, /* string */
DEVLINK_ATTR_REGION_MAX_SNAPSHOTS, /* u32 */
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
@ -326,4 +599,32 @@ enum devlink_resource_unit {
DEVLINK_RESOURCE_UNIT_ENTRY,
};
enum devlink_port_function_attr {
DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, /* binary */
DEVLINK_PORT_FN_ATTR_STATE, /* u8 */
DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */
__DEVLINK_PORT_FUNCTION_ATTR_MAX,
DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
};
enum devlink_port_fn_state {
DEVLINK_PORT_FN_STATE_INACTIVE,
DEVLINK_PORT_FN_STATE_ACTIVE,
};
/**
* enum devlink_port_fn_opstate - indicates operational state of the function
* @DEVLINK_PORT_FN_OPSTATE_ATTACHED: Driver is attached to the function.
* For graceful tear down of the function, after inactivation of the
* function, user should wait for operational state to turn DETACHED.
* @DEVLINK_PORT_FN_OPSTATE_DETACHED: Driver is detached from the function.
* It is safe to delete the port.
*/
enum devlink_port_fn_opstate {
DEVLINK_PORT_FN_OPSTATE_DETACHED,
DEVLINK_PORT_FN_OPSTATE_ATTACHED,
};
#endif /* _LINUX_DEVLINK_H_ */

View File

@ -34,15 +34,23 @@
#define EM_M32R 88 /* Renesas M32R */
#define EM_MN10300 89 /* Panasonic/MEI MN10300, AM33 */
#define EM_OPENRISC 92 /* OpenRISC 32-bit embedded processor */
#define EM_ARCOMPACT 93 /* ARCompact processor */
#define EM_XTENSA 94 /* Tensilica Xtensa Architecture */
#define EM_BLACKFIN 106 /* ADI Blackfin Processor */
#define EM_UNICORE 110 /* UniCore-32 */
#define EM_ALTERA_NIOS2 113 /* Altera Nios II soft-core processor */
#define EM_TI_C6000 140 /* TI C6X DSPs */
#define EM_HEXAGON 164 /* QUALCOMM Hexagon */
#define EM_NDS32 167 /* Andes Technology compact code size
embedded RISC processor family */
#define EM_AARCH64 183 /* ARM 64 bit */
#define EM_TILEPRO 188 /* Tilera TILEPro */
#define EM_MICROBLAZE 189 /* Xilinx MicroBlaze */
#define EM_TILEGX 191 /* Tilera TILE-Gx */
#define EM_ARCV2 195 /* ARCv2 Cores */
#define EM_RISCV 243 /* RISC-V */
#define EM_BPF 247 /* Linux BPF - in-kernel virtual machine */
#define EM_CSKY 252 /* C-SKY */
#define EM_FRV 0x5441 /* Fujitsu FR-V */
/*

View File

@ -16,6 +16,12 @@ enum {
FOU_ATTR_IPPROTO, /* u8 */
FOU_ATTR_TYPE, /* u8 */
FOU_ATTR_REMCSUM_NOPARTIAL, /* flag */
FOU_ATTR_LOCAL_V4, /* u32 */
FOU_ATTR_LOCAL_V6, /* in6_addr */
FOU_ATTR_PEER_V4, /* u32 */
FOU_ATTR_PEER_V6, /* in6_addr */
FOU_ATTR_PEER_PORT, /* u16 */
FOU_ATTR_IFINDEX, /* s32 */
__FOU_ATTR_MAX,
};

View File

@ -13,6 +13,7 @@ enum {
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
TCA_STATS_BASIC_HW,
TCA_STATS_PKT64,
__TCA_STATS_MAX,
};
#define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
@ -26,10 +27,6 @@ struct gnet_stats_basic {
__u64 bytes;
__u32 packets;
};
struct gnet_stats_basic_packed {
__u64 bytes;
__u32 packets;
} __attribute__ ((packed));
/**
* struct gnet_stats_rate_est - rate estimator

View File

@ -48,6 +48,7 @@ enum {
CTRL_CMD_NEWMCAST_GRP,
CTRL_CMD_DELMCAST_GRP,
CTRL_CMD_GETMCAST_GRP, /* unused */
CTRL_CMD_GETPOLICY,
__CTRL_CMD_MAX,
};
@ -62,6 +63,9 @@ enum {
CTRL_ATTR_MAXATTR,
CTRL_ATTR_OPS,
CTRL_ATTR_MCAST_GROUPS,
CTRL_ATTR_POLICY,
CTRL_ATTR_OP_POLICY,
CTRL_ATTR_OP,
__CTRL_ATTR_MAX,
};
@ -83,6 +87,15 @@ enum {
__CTRL_ATTR_MCAST_GRP_MAX,
};
enum {
CTRL_ATTR_POLICY_UNSPEC,
CTRL_ATTR_POLICY_DO,
CTRL_ATTR_POLICY_DUMP,
__CTRL_ATTR_POLICY_DUMP_MAX,
CTRL_ATTR_POLICY_DUMP_MAX = __CTRL_ATTR_POLICY_DUMP_MAX - 1
};
#define CTRL_ATTR_MCAST_GRP_MAX (__CTRL_ATTR_MCAST_GRP_MAX - 1)

View File

@ -79,6 +79,15 @@ typedef struct {
unsigned int timeout;
} cisco_proto;
typedef struct {
unsigned short dce; /* 1 for DCE (network side) operation */
unsigned int modulo; /* modulo (8 = basic / 128 = extended) */
unsigned int window; /* frame window size */
unsigned int t1; /* timeout t1 */
unsigned int t2; /* timeout t2 */
unsigned int n2; /* frame retry counter */
} x25_hdlc_proto;
/* PPP doesn't need any info now - supply length = 0 to ioctl */
#endif /* __ASSEMBLY__ */

View File

@ -68,6 +68,7 @@ struct icmp6hdr {
#define icmp6_mtu icmp6_dataun.un_data32[0]
#define icmp6_unused icmp6_dataun.un_data32[0]
#define icmp6_maxdelay icmp6_dataun.un_data16[0]
#define icmp6_datagram_len icmp6_dataun.un_data8[0]
#define icmp6_router icmp6_dataun.u_nd_advt.router
#define icmp6_solicited icmp6_dataun.u_nd_advt.solicited
#define icmp6_override icmp6_dataun.u_nd_advt.override
@ -90,6 +91,8 @@ struct icmp6hdr {
#define ICMPV6_TIME_EXCEED 3
#define ICMPV6_PARAMPROB 4
#define ICMPV6_ERRMSG_MAX 127
#define ICMPV6_INFOMSG_MASK 0x80
#define ICMPV6_ECHO_REQUEST 128
@ -108,6 +111,10 @@ struct icmp6hdr {
#define ICMPV6_MOBILE_PREFIX_SOL 146
#define ICMPV6_MOBILE_PREFIX_ADV 147
#define ICMPV6_MRDISC_ADV 151
#define ICMPV6_MSG_MAX 255
/*
* Codes for Destination Unreachable
*/
@ -131,7 +138,11 @@ struct icmp6hdr {
#define ICMPV6_HDR_FIELD 0
#define ICMPV6_UNK_NEXTHDR 1
#define ICMPV6_UNK_OPTION 2
#define ICMPV6_HDR_INCOMP 3
/* Codes for EXT_ECHO (PROBE) */
#define ICMPV6_EXT_ECHO_REQUEST 160
#define ICMPV6_EXT_ECHO_REPLY 161
/*
* constants for (set|get)sockopt
*/

View File

@ -31,6 +31,7 @@
#define IFNAMSIZ 16
#endif /* __UAPI_DEF_IF_IFNAMSIZ */
#define IFALIASZ 256
#define ALTIFNAMSIZ 128
#include <linux/hdlc/ioctl.h>
/* For glibc compatibility. An empty enum does not compile. */
@ -175,6 +176,7 @@ enum {
enum {
IF_LINK_MODE_DEFAULT,
IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */
IF_LINK_MODE_TESTING, /* limit upward transition to testing */
};
/*
@ -210,6 +212,7 @@ struct if_settings {
fr_proto *fr;
fr_proto_pvc *fr_pvc;
fr_proto_pvc_info *fr_pvc_info;
x25_hdlc_proto *x25;
/* interface settings */
sync_serial_settings *sync;

View File

@ -24,6 +24,22 @@ struct sockaddr_alg {
__u8 salg_name[64];
};
/*
* Linux v4.12 and later removed the 64-byte limit on salg_name[]; it's now an
* arbitrary-length field. We had to keep the original struct above for source
* compatibility with existing userspace programs, though. Use the new struct
* below if support for very long algorithm names is needed. To do this,
* allocate 'sizeof(struct sockaddr_alg_new) + strlen(algname) + 1' bytes, and
* copy algname (including the null terminator) into salg_name.
*/
struct sockaddr_alg_new {
__u16 salg_family;
__u8 salg_type[14];
__u32 salg_feat;
__u32 salg_mask;
__u8 salg_name[];
};
struct af_alg_iv {
__u32 ivlen;
__u8 iv[0];
@ -35,6 +51,7 @@ struct af_alg_iv {
#define ALG_SET_OP 3
#define ALG_SET_AEAD_ASSOCLEN 4
#define ALG_SET_AEAD_AUTHSIZE 5
#define ALG_SET_DRBG_ENTROPY 6
/* Operations */
#define ALG_OP_DECRYPT 0

View File

@ -54,6 +54,7 @@
#define ARPHRD_X25 271 /* CCITT X.25 */
#define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */
#define ARPHRD_CAN 280 /* Controller Area Network */
#define ARPHRD_MCTP 290
#define ARPHRD_PPP 512
#define ARPHRD_CISCO 513 /* Cisco HDLC */
#define ARPHRD_HDLC ARPHRD_CISCO

View File

@ -94,6 +94,17 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */
/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1
#define LACP_STATE_LACP_TIMEOUT 0x2
#define LACP_STATE_AGGREGATION 0x4
#define LACP_STATE_SYNCHRONIZATION 0x8
#define LACP_STATE_COLLECTING 0x10
#define LACP_STATE_DISTRIBUTING 0x20
#define LACP_STATE_DEFAULTED 0x40
#define LACP_STATE_EXPIRED 0x80
typedef struct ifbond {
__s32 bond_mode;
@ -117,15 +128,28 @@ struct ad_info {
__u8 partner_system[ETH_ALEN];
};
/* Embedded inside LINK_XSTATS_TYPE_BOND */
enum {
BOND_XSTATS_UNSPEC,
BOND_XSTATS_3AD,
__BOND_XSTATS_MAX
};
#define BOND_XSTATS_MAX (__BOND_XSTATS_MAX - 1)
/* Embedded inside BOND_XSTATS_3AD */
enum {
BOND_3AD_STAT_LACPDU_RX,
BOND_3AD_STAT_LACPDU_TX,
BOND_3AD_STAT_LACPDU_UNKNOWN_RX,
BOND_3AD_STAT_LACPDU_ILLEGAL_RX,
BOND_3AD_STAT_MARKER_RX,
BOND_3AD_STAT_MARKER_TX,
BOND_3AD_STAT_MARKER_RESP_RX,
BOND_3AD_STAT_MARKER_RESP_TX,
BOND_3AD_STAT_MARKER_UNKNOWN_RX,
BOND_3AD_STAT_PAD,
__BOND_3AD_STAT_MAX
};
#define BOND_3AD_STAT_MAX (__BOND_3AD_STAT_MAX - 1)
#endif /* _LINUX_IF_BONDING_H */
/*
* Local variables:
* version-control: t
* kept-new-versions: 5
* c-indent-level: 8
* c-basic-offset: 8
* tab-width: 8
* End:
*/

View File

@ -120,6 +120,8 @@ enum {
IFLA_BRIDGE_MODE,
IFLA_BRIDGE_VLAN_INFO,
IFLA_BRIDGE_VLAN_TUNNEL_INFO,
IFLA_BRIDGE_MRP,
IFLA_BRIDGE_CFM,
__IFLA_BRIDGE_MAX,
};
#define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
@ -130,6 +132,7 @@ enum {
#define BRIDGE_VLAN_INFO_RANGE_BEGIN (1<<3) /* VLAN is start of vlan range */
#define BRIDGE_VLAN_INFO_RANGE_END (1<<4) /* VLAN is end of vlan range */
#define BRIDGE_VLAN_INFO_BRENTRY (1<<5) /* Global bridge VLAN entry */
#define BRIDGE_VLAN_INFO_ONLY_OPTS (1<<6) /* Skip create/delete/flags */
struct bridge_vlan_info {
__u16 flags;
@ -156,6 +159,415 @@ struct bridge_vlan_xstats {
__u32 pad2;
};
enum {
IFLA_BRIDGE_MRP_UNSPEC,
IFLA_BRIDGE_MRP_INSTANCE,
IFLA_BRIDGE_MRP_PORT_STATE,
IFLA_BRIDGE_MRP_PORT_ROLE,
IFLA_BRIDGE_MRP_RING_STATE,
IFLA_BRIDGE_MRP_RING_ROLE,
IFLA_BRIDGE_MRP_START_TEST,
IFLA_BRIDGE_MRP_INFO,
IFLA_BRIDGE_MRP_IN_ROLE,
IFLA_BRIDGE_MRP_IN_STATE,
IFLA_BRIDGE_MRP_START_IN_TEST,
__IFLA_BRIDGE_MRP_MAX,
};
#define IFLA_BRIDGE_MRP_MAX (__IFLA_BRIDGE_MRP_MAX - 1)
enum {
IFLA_BRIDGE_MRP_INSTANCE_UNSPEC,
IFLA_BRIDGE_MRP_INSTANCE_RING_ID,
IFLA_BRIDGE_MRP_INSTANCE_P_IFINDEX,
IFLA_BRIDGE_MRP_INSTANCE_S_IFINDEX,
IFLA_BRIDGE_MRP_INSTANCE_PRIO,
__IFLA_BRIDGE_MRP_INSTANCE_MAX,
};
#define IFLA_BRIDGE_MRP_INSTANCE_MAX (__IFLA_BRIDGE_MRP_INSTANCE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_PORT_STATE_UNSPEC,
IFLA_BRIDGE_MRP_PORT_STATE_STATE,
__IFLA_BRIDGE_MRP_PORT_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_PORT_STATE_MAX (__IFLA_BRIDGE_MRP_PORT_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_PORT_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_PORT_ROLE_ROLE,
__IFLA_BRIDGE_MRP_PORT_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_PORT_ROLE_MAX (__IFLA_BRIDGE_MRP_PORT_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_RING_STATE_UNSPEC,
IFLA_BRIDGE_MRP_RING_STATE_RING_ID,
IFLA_BRIDGE_MRP_RING_STATE_STATE,
__IFLA_BRIDGE_MRP_RING_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_RING_STATE_MAX (__IFLA_BRIDGE_MRP_RING_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_RING_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_RING_ROLE_RING_ID,
IFLA_BRIDGE_MRP_RING_ROLE_ROLE,
__IFLA_BRIDGE_MRP_RING_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_RING_ROLE_MAX (__IFLA_BRIDGE_MRP_RING_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_START_TEST_UNSPEC,
IFLA_BRIDGE_MRP_START_TEST_RING_ID,
IFLA_BRIDGE_MRP_START_TEST_INTERVAL,
IFLA_BRIDGE_MRP_START_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_START_TEST_PERIOD,
IFLA_BRIDGE_MRP_START_TEST_MONITOR,
__IFLA_BRIDGE_MRP_START_TEST_MAX,
};
#define IFLA_BRIDGE_MRP_START_TEST_MAX (__IFLA_BRIDGE_MRP_START_TEST_MAX - 1)
enum {
IFLA_BRIDGE_MRP_INFO_UNSPEC,
IFLA_BRIDGE_MRP_INFO_RING_ID,
IFLA_BRIDGE_MRP_INFO_P_IFINDEX,
IFLA_BRIDGE_MRP_INFO_S_IFINDEX,
IFLA_BRIDGE_MRP_INFO_PRIO,
IFLA_BRIDGE_MRP_INFO_RING_STATE,
IFLA_BRIDGE_MRP_INFO_RING_ROLE,
IFLA_BRIDGE_MRP_INFO_TEST_INTERVAL,
IFLA_BRIDGE_MRP_INFO_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_INFO_TEST_MONITOR,
IFLA_BRIDGE_MRP_INFO_I_IFINDEX,
IFLA_BRIDGE_MRP_INFO_IN_STATE,
IFLA_BRIDGE_MRP_INFO_IN_ROLE,
IFLA_BRIDGE_MRP_INFO_IN_TEST_INTERVAL,
IFLA_BRIDGE_MRP_INFO_IN_TEST_MAX_MISS,
__IFLA_BRIDGE_MRP_INFO_MAX,
};
#define IFLA_BRIDGE_MRP_INFO_MAX (__IFLA_BRIDGE_MRP_INFO_MAX - 1)
enum {
IFLA_BRIDGE_MRP_IN_STATE_UNSPEC,
IFLA_BRIDGE_MRP_IN_STATE_IN_ID,
IFLA_BRIDGE_MRP_IN_STATE_STATE,
__IFLA_BRIDGE_MRP_IN_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_IN_STATE_MAX (__IFLA_BRIDGE_MRP_IN_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_IN_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_IN_ROLE_RING_ID,
IFLA_BRIDGE_MRP_IN_ROLE_IN_ID,
IFLA_BRIDGE_MRP_IN_ROLE_ROLE,
IFLA_BRIDGE_MRP_IN_ROLE_I_IFINDEX,
__IFLA_BRIDGE_MRP_IN_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_IN_ROLE_MAX (__IFLA_BRIDGE_MRP_IN_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_START_IN_TEST_UNSPEC,
IFLA_BRIDGE_MRP_START_IN_TEST_IN_ID,
IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL,
IFLA_BRIDGE_MRP_START_IN_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_START_IN_TEST_PERIOD,
__IFLA_BRIDGE_MRP_START_IN_TEST_MAX,
};
#define IFLA_BRIDGE_MRP_START_IN_TEST_MAX (__IFLA_BRIDGE_MRP_START_IN_TEST_MAX - 1)
struct br_mrp_instance {
__u32 ring_id;
__u32 p_ifindex;
__u32 s_ifindex;
__u16 prio;
};
struct br_mrp_ring_state {
__u32 ring_id;
__u32 ring_state;
};
struct br_mrp_ring_role {
__u32 ring_id;
__u32 ring_role;
};
struct br_mrp_start_test {
__u32 ring_id;
__u32 interval;
__u32 max_miss;
__u32 period;
__u32 monitor;
};
struct br_mrp_in_state {
__u32 in_state;
__u16 in_id;
};
struct br_mrp_in_role {
__u32 ring_id;
__u32 in_role;
__u32 i_ifindex;
__u16 in_id;
};
struct br_mrp_start_in_test {
__u32 interval;
__u32 max_miss;
__u32 period;
__u16 in_id;
};
enum {
IFLA_BRIDGE_CFM_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CREATE,
IFLA_BRIDGE_CFM_MEP_DELETE,
IFLA_BRIDGE_CFM_MEP_CONFIG,
IFLA_BRIDGE_CFM_CC_CONFIG,
IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD,
IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE,
IFLA_BRIDGE_CFM_CC_RDI,
IFLA_BRIDGE_CFM_CC_CCM_TX,
IFLA_BRIDGE_CFM_MEP_CREATE_INFO,
IFLA_BRIDGE_CFM_MEP_CONFIG_INFO,
IFLA_BRIDGE_CFM_CC_CONFIG_INFO,
IFLA_BRIDGE_CFM_CC_RDI_INFO,
IFLA_BRIDGE_CFM_CC_CCM_TX_INFO,
IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO,
IFLA_BRIDGE_CFM_MEP_STATUS_INFO,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO,
__IFLA_BRIDGE_CFM_MAX,
};
#define IFLA_BRIDGE_CFM_MAX (__IFLA_BRIDGE_CFM_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_CREATE_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE,
IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN,
IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION,
IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX,
__IFLA_BRIDGE_CFM_MEP_CREATE_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_CREATE_MAX (__IFLA_BRIDGE_CFM_MEP_CREATE_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_DELETE_UNSPEC,
IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE,
__IFLA_BRIDGE_CFM_MEP_DELETE_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_DELETE_MAX (__IFLA_BRIDGE_CFM_MEP_DELETE_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_CONFIG_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE,
IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC,
IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL,
IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID,
__IFLA_BRIDGE_CFM_MEP_CONFIG_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_CONFIG_MAX (__IFLA_BRIDGE_CFM_MEP_CONFIG_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC,
IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE,
IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE,
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL,
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID,
__IFLA_BRIDGE_CFM_CC_CONFIG_MAX,
};
#define IFLA_BRIDGE_CFM_CC_CONFIG_MAX (__IFLA_BRIDGE_CFM_CC_CONFIG_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_PEER_MEP_UNSPEC,
IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE,
IFLA_BRIDGE_CFM_CC_PEER_MEPID,
__IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX,
};
#define IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX (__IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_RDI_UNSPEC,
IFLA_BRIDGE_CFM_CC_RDI_INSTANCE,
IFLA_BRIDGE_CFM_CC_RDI_RDI,
__IFLA_BRIDGE_CFM_CC_RDI_MAX,
};
#define IFLA_BRIDGE_CFM_CC_RDI_MAX (__IFLA_BRIDGE_CFM_CC_RDI_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_CCM_TX_UNSPEC,
IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE,
IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC,
IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE,
IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD,
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV,
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV,
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE,
__IFLA_BRIDGE_CFM_CC_CCM_TX_MAX,
};
#define IFLA_BRIDGE_CFM_CC_CCM_TX_MAX (__IFLA_BRIDGE_CFM_CC_CCM_TX_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_STATUS_UNSPEC,
IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE,
IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN,
IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN,
IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN,
__IFLA_BRIDGE_CFM_MEP_STATUS_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_STATUS_MAX (__IFLA_BRIDGE_CFM_MEP_STATUS_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_PEER_STATUS_UNSPEC,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN,
__IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX,
};
#define IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX (__IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX - 1)
struct bridge_stp_xstats {
__u64 transition_blk;
__u64 transition_fwd;
__u64 rx_bpdu;
__u64 tx_bpdu;
__u64 rx_tcn;
__u64 tx_tcn;
};
/* Bridge vlan RTM header */
struct br_vlan_msg {
__u8 family;
__u8 reserved1;
__u16 reserved2;
__u32 ifindex;
};
enum {
BRIDGE_VLANDB_DUMP_UNSPEC,
BRIDGE_VLANDB_DUMP_FLAGS,
__BRIDGE_VLANDB_DUMP_MAX,
};
#define BRIDGE_VLANDB_DUMP_MAX (__BRIDGE_VLANDB_DUMP_MAX - 1)
/* flags used in BRIDGE_VLANDB_DUMP_FLAGS attribute to affect dumps */
#define BRIDGE_VLANDB_DUMPF_STATS (1 << 0) /* Include stats in the dump */
#define BRIDGE_VLANDB_DUMPF_GLOBAL (1 << 1) /* Dump global vlan options only */
/* Bridge vlan RTM attributes
* [BRIDGE_VLANDB_ENTRY] = {
* [BRIDGE_VLANDB_ENTRY_INFO]
* ...
* }
* [BRIDGE_VLANDB_GLOBAL_OPTIONS] = {
* [BRIDGE_VLANDB_GOPTS_ID]
* ...
* }
*/
enum {
BRIDGE_VLANDB_UNSPEC,
BRIDGE_VLANDB_ENTRY,
BRIDGE_VLANDB_GLOBAL_OPTIONS,
__BRIDGE_VLANDB_MAX,
};
#define BRIDGE_VLANDB_MAX (__BRIDGE_VLANDB_MAX - 1)
enum {
BRIDGE_VLANDB_ENTRY_UNSPEC,
BRIDGE_VLANDB_ENTRY_INFO,
BRIDGE_VLANDB_ENTRY_RANGE,
BRIDGE_VLANDB_ENTRY_STATE,
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
BRIDGE_VLANDB_ENTRY_STATS,
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
__BRIDGE_VLANDB_ENTRY_MAX,
};
#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
/* [BRIDGE_VLANDB_ENTRY] = {
* [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = {
* [BRIDGE_VLANDB_TINFO_ID]
* ...
* }
* }
*/
enum {
BRIDGE_VLANDB_TINFO_UNSPEC,
BRIDGE_VLANDB_TINFO_ID,
BRIDGE_VLANDB_TINFO_CMD,
__BRIDGE_VLANDB_TINFO_MAX,
};
#define BRIDGE_VLANDB_TINFO_MAX (__BRIDGE_VLANDB_TINFO_MAX - 1)
/* [BRIDGE_VLANDB_ENTRY] = {
* [BRIDGE_VLANDB_ENTRY_STATS] = {
* [BRIDGE_VLANDB_STATS_RX_BYTES]
* ...
* }
* ...
* }
*/
enum {
BRIDGE_VLANDB_STATS_UNSPEC,
BRIDGE_VLANDB_STATS_RX_BYTES,
BRIDGE_VLANDB_STATS_RX_PACKETS,
BRIDGE_VLANDB_STATS_TX_BYTES,
BRIDGE_VLANDB_STATS_TX_PACKETS,
BRIDGE_VLANDB_STATS_PAD,
__BRIDGE_VLANDB_STATS_MAX,
};
#define BRIDGE_VLANDB_STATS_MAX (__BRIDGE_VLANDB_STATS_MAX - 1)
enum {
BRIDGE_VLANDB_GOPTS_UNSPEC,
BRIDGE_VLANDB_GOPTS_ID,
BRIDGE_VLANDB_GOPTS_RANGE,
BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING,
BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION,
BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL,
BRIDGE_VLANDB_GOPTS_PAD,
BRIDGE_VLANDB_GOPTS_MCAST_MEMBERSHIP_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER,
BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE,
__BRIDGE_VLANDB_GOPTS_MAX
};
#define BRIDGE_VLANDB_GOPTS_MAX (__BRIDGE_VLANDB_GOPTS_MAX - 1)
/* Bridge multicast database attributes
* [MDBA_MDB] = {
* [MDBA_MDB_ENTRY] = {
@ -198,10 +610,33 @@ enum {
enum {
MDBA_MDB_EATTR_UNSPEC,
MDBA_MDB_EATTR_TIMER,
MDBA_MDB_EATTR_SRC_LIST,
MDBA_MDB_EATTR_GROUP_MODE,
MDBA_MDB_EATTR_SOURCE,
MDBA_MDB_EATTR_RTPROT,
__MDBA_MDB_EATTR_MAX
};
#define MDBA_MDB_EATTR_MAX (__MDBA_MDB_EATTR_MAX - 1)
/* per mdb entry source */
enum {
MDBA_MDB_SRCLIST_UNSPEC,
MDBA_MDB_SRCLIST_ENTRY,
__MDBA_MDB_SRCLIST_MAX
};
#define MDBA_MDB_SRCLIST_MAX (__MDBA_MDB_SRCLIST_MAX - 1)
/* per mdb entry per source attributes
* these are embedded in MDBA_MDB_SRCLIST_ENTRY
*/
enum {
MDBA_MDB_SRCATTR_UNSPEC,
MDBA_MDB_SRCATTR_ADDRESS,
MDBA_MDB_SRCATTR_TIMER,
__MDBA_MDB_SRCATTR_MAX
};
#define MDBA_MDB_SRCATTR_MAX (__MDBA_MDB_SRCATTR_MAX - 1)
/* multicast router types */
enum {
MDB_RTR_TYPE_DISABLED,
@ -222,6 +657,9 @@ enum {
MDBA_ROUTER_PATTR_UNSPEC,
MDBA_ROUTER_PATTR_TIMER,
MDBA_ROUTER_PATTR_TYPE,
MDBA_ROUTER_PATTR_INET_TIMER,
MDBA_ROUTER_PATTR_INET6_TIMER,
MDBA_ROUTER_PATTR_VID,
__MDBA_ROUTER_PATTR_MAX
};
#define MDBA_ROUTER_PATTR_MAX (__MDBA_ROUTER_PATTR_MAX - 1)
@ -237,12 +675,16 @@ struct br_mdb_entry {
#define MDB_PERMANENT 1
__u8 state;
#define MDB_FLAGS_OFFLOAD (1 << 0)
#define MDB_FLAGS_FAST_LEAVE (1 << 1)
#define MDB_FLAGS_STAR_EXCL (1 << 2)
#define MDB_FLAGS_BLOCKED (1 << 3)
__u8 flags;
__u16 vid;
struct {
union {
__be32 ip4;
struct in6_addr ip6;
unsigned char mac_addr[ETH_ALEN];
} u;
__be16 proto;
} addr;
@ -251,16 +693,30 @@ struct br_mdb_entry {
enum {
MDBA_SET_ENTRY_UNSPEC,
MDBA_SET_ENTRY,
MDBA_SET_ENTRY_ATTRS,
__MDBA_SET_ENTRY_MAX,
};
#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
/* [MDBA_SET_ENTRY_ATTRS] = {
* [MDBE_ATTR_xxx]
* ...
* }
*/
enum {
MDBE_ATTR_UNSPEC,
MDBE_ATTR_SOURCE,
__MDBE_ATTR_MAX,
};
#define MDBE_ATTR_MAX (__MDBE_ATTR_MAX - 1)
/* Embedded inside LINK_XSTATS_TYPE_BRIDGE */
enum {
BRIDGE_XSTATS_UNSPEC,
BRIDGE_XSTATS_VLAN,
BRIDGE_XSTATS_MCAST,
BRIDGE_XSTATS_PAD,
BRIDGE_XSTATS_STP,
__BRIDGE_XSTATS_MAX
};
#define BRIDGE_XSTATS_MAX (__BRIDGE_XSTATS_MAX - 1)
@ -292,4 +748,40 @@ struct br_mcast_stats {
__u64 mcast_bytes[BR_MCAST_DIR_SIZE];
__u64 mcast_packets[BR_MCAST_DIR_SIZE];
};
/* bridge boolean options
* BR_BOOLOPT_NO_LL_LEARN - disable learning from link-local packets
* BR_BOOLOPT_MCAST_VLAN_SNOOPING - control vlan multicast snooping
*
* IMPORTANT: if adding a new option do not forget to handle
* it in br_boolopt_toggle/get and bridge sysfs
*/
enum br_boolopt_id {
BR_BOOLOPT_NO_LL_LEARN,
BR_BOOLOPT_MCAST_VLAN_SNOOPING,
BR_BOOLOPT_MAX
};
/* struct br_boolopt_multi - change multiple bridge boolean options
*
* @optval: new option values (bit per option)
* @optmask: options to change (bit per option)
*/
struct br_boolopt_multi {
__u32 optval;
__u32 optmask;
};
enum {
BRIDGE_QUERIER_UNSPEC,
BRIDGE_QUERIER_IP_ADDRESS,
BRIDGE_QUERIER_IP_PORT,
BRIDGE_QUERIER_IP_OTHER_TIMER,
BRIDGE_QUERIER_PAD,
BRIDGE_QUERIER_IPV6_ADDRESS,
BRIDGE_QUERIER_IPV6_PORT,
BRIDGE_QUERIER_IPV6_OTHER_TIMER,
__BRIDGE_QUERIER_MAX
};
#define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
#endif /* _LINUX_IF_BRIDGE_H */

View File

@ -86,17 +86,21 @@
* over Ethernet
*/
#define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */
#define ETH_P_REALTEK 0x8899 /* Multiple proprietary protocols */
#define ETH_P_AOE 0x88A2 /* ATA over Ethernet */
#define ETH_P_8021AD 0x88A8 /* 802.1ad Service VLAN */
#define ETH_P_802_EX1 0x88B5 /* 802.1 Local Experimental 1. */
#define ETH_P_PREAUTH 0x88C7 /* 802.11 Preauthentication */
#define ETH_P_TIPC 0x88CA /* TIPC */
#define ETH_P_LLDP 0x88CC /* Link Layer Discovery Protocol */
#define ETH_P_MRP 0x88E3 /* Media Redundancy Protocol */
#define ETH_P_MACSEC 0x88E5 /* 802.1ae MACsec */
#define ETH_P_8021AH 0x88E7 /* 802.1ah Backbone Service Tag */
#define ETH_P_MVRP 0x88F5 /* 802.1Q MVRP */
#define ETH_P_1588 0x88F7 /* IEEE 1588 Timesync */
#define ETH_P_NCSI 0x88F8 /* NCSI protocol */
#define ETH_P_PRP 0x88FB /* IEC 62439-3 PRP/HSRv0 */
#define ETH_P_CFM 0x8902 /* Connectivity Fault Management */
#define ETH_P_FCOE 0x8906 /* Fibre Channel over Ethernet */
#define ETH_P_IBOE 0x8915 /* Infiniband over Ethernet */
#define ETH_P_TDLS 0x890D /* TDLS */
@ -109,10 +113,11 @@
#define ETH_P_QINQ2 0x9200 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_QINQ3 0x9300 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_EDSA 0xDADA /* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_DSA_8021Q 0xDADB /* Fake VLAN Header for DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_IFE 0xED3E /* ForCES inter-FE LFB type */
#define ETH_P_AF_IUCV 0xFBFB /* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_802_3_MIN 0x0600 /* If the value in the ethernet type is less than this value
#define ETH_P_802_3_MIN 0x0600 /* If the value in the ethernet type is more than this value
* then the frame is Ethernet II. Else it is 802.3 */
/*
@ -147,6 +152,9 @@
#define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and
* aggregation protocol
*/
#define ETH_P_MCTP 0x00FA /* Management component transport
* protocol packets
*/
/*
* This is an Ethernet frame header.

Some files were not shown because too many files have changed in this diff Show More