Compare commits

...

2915 Commits
v4.7.0 ... main

Author SHA1 Message Date
Paul Blakey 73590d9573 tc: flower: Fix buffer overflow on large labels
Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').

Fix that by increasing the print buffer size.

Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111

Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-06 13:44:50 -08:00
Stephen Hemminger 3f77bc6253 uapi: update to if_ether.h
Merged from 5.16-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-03 12:20:02 -08:00
Maxim Petrov 5f8bb902e1 ip/ipnexthop: fix unsigned overflow in parse_nh_group_type_res()
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.

Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.

Fixes: 9167671822 ("nexthop: Add support for resilient nexthop groups")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:48 -08:00
Maxim Petrov 3184de3797 lib/bpf_legacy: remove always-true check
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:04 -08:00
Stephen Hemminger 79026c1262 rdma: update uapi headers
Update the RDMA uapi headers from 5.16.0-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 10:00:19 -08:00
Stephen Hemminger fa58de9b0c vdpa: align uapi headers
Update vdpa headers based on 5.16.0-rc1 and remove redundant
copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 09:56:57 -08:00
[200~jiangheng be31c26484 lnstat: fix buffer overflow in header output
Running lnstat will cause core dump from reading past end of array.

Segmentation fault (core dumped)

The maximum  value of th.num_lines is HDR_LINES(10),  h should not be equal to th.num_lines, array th.hdr may be out of bounds.

Signed-off-by jiangheng <jiangheng12@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 13:41:10 -08:00
Maxim Petrov 0e94972590 tc/m_vlan: fix print_vlan() conditional on TCA_VLAN_ACT_PUSH_ETH
Fix the wild bracket in the if clause leading to the error in the condition.

Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 11:13:12 -08:00
Davide Caratti 9bd5ab0f09 mptcp: fix JSON output when dumping endpoints by id
iproute ignores '-j' command line argument when dumping endpoints by id:

 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
 [{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
 1.2.3.4 id 42 signal backup

fix mptcp_addr_show() to use the proper JSON helpers.

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-11 10:07:26 -08:00
Anssi Hannula a787d9ae10 man: tc-u32: Fix page to match new firstfrag behavior
Commit 690b11f4a6 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.

However, the documentation added in f15a23966f ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).

Due to this, the example does not work correctly and does not actually
classify regular SSH packets.

Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.

Fixes: f15a23966f ("tc: add a man page for u32 filter")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:46:17 -08:00
Luca Boccassi af96c7b5dd Fix some typos detected by Lintian in manpages
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:45:44 -08:00
Stephen Hemminger 35c81b18c4 uapi: update vdpa.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:40:40 -08:00
David Ahern 50b668bdbf Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:45:31 -06:00
David Ahern 9c56d693f6 Merge branch 'can-tdc-plus-cleanups' into next
Vincent Mailhol  says:

====================

The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.

This patch series contains four patches which respectively:

  1. Correct the bittiming ranges in the print_usage function and add
  the units to give more clarity: some parameters are in milliseconds,
  some in nano seconds, some in time quantum and the newly TDC
  parameters introduced in this series would be in clock period.

  2. Do some code refactoring on function print_ctrlmode().

  3. factorize the many print_*(PRINT_JSON, ...) and fprintf
  occurrences in a single print_*(PRINT_ANY, ...) call and fix the
  signedness while doing that.

  4. report the value of the bitrate prescalers (brp and dbrp).

  5. adds command line support for the TDC in iproute and goes together
  with below series in the kernel:
  https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t

** Changelog **

>From RFC v5 to v6:
  * Dropped the RFC tag because the related patch series on the kernel
    side were pulled into net-next.
  * Remove the changes in include/uapi/linux/can/netlink.h because
    these should be pulled separately.
  * Add another patch (the second of this series) to do some cleanup
    on function print_ctrlmode().
  * Minor fixes in the patch comments (grammar, rephrasing).

>From RFC v4 to RFC v5:
  * Add the unit (bps, tq, ns or ms) in print_usage()
  * Rewrote void can_print_timing_min_max() to better factorize the
    code.
  * Rewrote the commit message of the two last patches (those related
    to TDC) to either add clarification of fix inacurracies.

>From v3 to RFC v4:
  * Reflect the changes made on the kernel side.

>From RFC v2 to v3:
  * Dropped the RFC tag. Now that the kernel patch reach the testing
    branch, I am finaly ready.
  * Regression fix: configuring a link with only nominal bittiming
    returned -EOPNOTSUPP
  * Added two more patches to the series:
      - iplink_can: fix configuration ranges in print_usage()
      - iplink_can: print brp and dbrp bittiming variables
  * Other small fixes on formatting.

>From RFC v1 to RFC v2:
  * Add an additional patch to the series to fix the issues reported
    by Stephen Hemminger
    Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:44:56 -06:00
Vincent Mailhol 0c263d7c36 iplink_can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.

This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).

This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
  - IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
    minimum value
  - IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
    maximum value
  - IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
    minimum value
  - IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
    maximum value
  - IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
    window minimum value
  - IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
    window maximum value
  - IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
  - IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
  - IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window

All those new parameters are nested together into the attribute
IFLA_CAN_TDC.

The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.

When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate.  Valid options for tdc-mode are:

  * auto: the transmitter dynamically measures TDCV for each of the
    transmitted frames. As such, TDCV can not be manually provided. In
    this mode, the user must specify TDCO and may also specify TDCF if
    supported.

  * manual: use a static TDCV provided by the user. In this mode, the
    user must specify both TDCV and TDCO and may also specify TDCF if
    supported.

  * off: TDC is explicitly disabled.

  * tdc-mode parameter omitted (default mode): the kernel decides
    whether TDC should be enabled or not and if so, it calculates the
    TDC values. TDC parameters are an expert option and the average
    user is not expected to provide those, thus the presence of this
    "default mode".

If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.

If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.

TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).

TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).

For reference, here are a few samples of how the output looks like:

| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto

| $ ip --details link show can0
| 1:  can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD,TDC-AUTO> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
| 	  bitrate 1000000 sample-point 0.750
| 	  tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1 brp 1
| 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp_inc 1
| 	  dbitrate 8000000 dsample-point 0.700
| 	  dtq 12 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1
| 	  tdco 7 tdcf 8
| 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp_inc 1
| 	  tdco 0..127 tdcf 0..127
| 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

| $ ip --details --json --pretty link show can0
| [ {
|         "ifindex": 1,
|         "ifname": "can0",
|         "flags": [ "NOARP","ECHO" ],
|         "mtu": 72,
|         "qdisc": "noop",
|         "operstate": "DOWN",
|         "linkmode": "DEFAULT",
|         "group": "default",
|         "txqlen": 10,
|         "link_type": "can",
|         "promiscuity": 0,
|         "min_mtu": 0,
|         "max_mtu": 0,
|         "linkinfo": {
|             "info_kind": "can",
|             "info_data": {
|                 "ctrlmode": [ "FD","TDC-AUTO" ],
|                 "state": "STOPPED",
|                 "berr_counter": {
|                     "tx": 0,
|                     "rx": 0
|                 },
|                 "restart_ms": 0,
|                 "bittiming": {
|                     "bitrate": 1000000,
|                     "sample_point": "0.750",
|                     "tq": 12,
|                     "prop_seg": 29,
|                     "phase_seg1": 30,
|                     "phase_seg2": 20,
|                     "sjw": 1,
|                     "brp": 1
|                 },
|                 "bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 256
|                     },
|                     "tseg2": {
|                         "min": 2,
|                         "max": 128
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 128
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 512
|                     },
|                     "brp_inc": 1
|                 },
|                 "data_bittiming": {
|                     "bitrate": 8000000,
|                     "sample_point": "0.700",
|                     "tq": 12,
|                     "prop_seg": 3,
|                     "phase_seg1": 3,
|                     "phase_seg2": 3,
|                     "sjw": 1,
|                     "brp": 1,
|                     "tdc": {
|                         "tdco": 7,
|                         "tdcf": 8
|                     }
|                 },
|                 "data_bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 32
|                     },
|                     "tseg2": {
|                         "min": 1,
|                         "max": 16
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 8
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 32
|                     },
|                     "brp_inc": 1,
|                     "tdc": {
|                         "tdco": {
|                             "min": 0,
|                             "max": 127
|                         },
|                         "tdcf": {
|                             "min": 0,
|                             "max": 127
|                         }
|                     }
|                 },
|                 "clock": 80000000
|             }
|         },
|         "num_tx_queues": 1,
|         "num_rx_queues": 1,
|         "gso_max_size": 65536,
|         "gso_max_segs": 65535
|     } ]

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:43:10 -06:00
Vincent Mailhol 0f7bb8d842 iplink_can: print brp and dbrp bittiming variables
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.

Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.

Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:

        brp = clock * tq / 1000000000

with clock in hertz and tq in nano second (thus the need of a 1
billion factor to convert it back to second).

But because above formula is not so trivial to remember and is
subjected to rounding errors, it makes sense to directly output
{d,}bpr.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:54 -06:00
Vincent Mailhol 67f3c7a5cc iplink_can: use PRINT_ANY to factorize code and fix signedness
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.

Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.

This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:50 -06:00
Vincent Mailhol fd5e958c49 iplink_can: code refactoring of print_ctrlmode()
This patch only does cleanup and do not introduce any functional
changes.

We do some code refactoring of print_ctrlmode() in prevision of the
upcoming patch:

  - remove the first argument of print_ctrlmode(). It is a pointer to
    FILE and is never used.

  - add a new function argument: enum output_type t in order to
    specify the output type (i.e. PRINT_{FP,JSON,ANY}).

  - add a new function argument: const char *key in order to specify
    the name of the json array (e.g. "ctrlmode").

  - replace the _PF() macro with the print_flag() function to increase
    readability.

  - directly return if none of the flags are set (previously, this
    check was done before calling the function).

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:44 -06:00
Vincent Mailhol 8316df6e6d iplink_can: fix configuration ranges in print_usage() and add unit
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.

The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".

Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.

The actual ranges are specific to each device and can be confirmed
doing:

$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state STOPPED restart-ms 0
	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.

For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { 1..1000000 }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER }
		  PROP-SEG	:= { 1..8 }
		  PHASE-SEG1	:= { 1..8 }
		  PHASE-SEG2	:= { 1..8 }
		  SJW		:= { 1..4 }
		  RESTART-MS	:= { 0 | NUMBER }

...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]
	[ cc-len8-dlc { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { NUMBER in bps }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER in ns }
		  PROP-SEG	:= { NUMBER in tq }
		  PHASE-SEG1	:= { NUMBER in tq }
		  PHASE-SEG2	:= { NUMBER in tq }
		  SJW		:= { NUMBER in tq }
		  RESTART-MS	:= { 0 | NUMBER in ms }

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:23 -06:00
Taehee Yoo 6e15d27aae ip: add AMT support
Add basic support for Automatic Multicast Tunneling (AMT) network devices.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2021-11-03 13:24:13 -06:00
David Ahern 9cae1de564 Import amt.h
Impor amt.h uapi from last kernel sync point

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:23:38 -06:00
David Ahern 258e350ca9 Update kernel headers
Update kernel headers to commit:
    cc0356d6a02e ("Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:22:15 -06:00
Moshe Shemesh 047e9ae516 devlink: Fix cmd_dev_param_set() to check configuration mode
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.

Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.

Fixes: 13925ae9eb ("devlink: Add param command support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-02 08:34:33 -07:00
Stephen Hemminger 7a8b7573a4 v5.15.0 2021-11-01 16:41:02 -07:00
Neta Ostrovsky ad3a118f88 rdma: Fix SRQ resource tracking information json
Fix the json output for the QPs that are associated with the SRQ -
The qpn are now displayed in a json array.

Sample output before the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128,130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Sample output after the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128","130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Fixes: 9b272e138d ("rdma: Add SRQ resource tracking information")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:04:45 -07:00
Antoine Tenart 7a235a101b man: devlink-port: fix pfnum for devlink port add
When configuring a devlink PCI port, the pfnumber can be specified
using 'pfnum' and not 'pcipf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:03:44 -07:00
David Ahern e2947f6fd8 Merge branch 'managed-neighbor' into next
Daniel Borkmann  says:

====================

iproute2 patches to add support for managed neighbor entries as per recent
net-next commits:

  2ed08b5ead3c ("Merge branch 'Managed-Neighbor-Entries'")
  c47fedba94bc ("Merge branch 'minor-managed-neighbor-follow-ups'")

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 09:00:26 -06:00
Daniel Borkmann 9e009e78e7 ip, neigh: Add NTF_EXT_MANAGED support
Currently, ip neigh does not support the NTF_EXT_MANAGED flag. Add cmdline
support.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:59:03 -06:00
Daniel Borkmann 040e52526c ip, neigh: Add missing NTF_USE support
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:55 -06:00
Daniel Borkmann c76a3849ec ip, neigh: Fix up spacing in netlink dump
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:50 -06:00
Nicolas Dichtel 76b30805f9 xfrm: enable to manage default policies
Two new commands to manage default policies:
 - ip xfrm policy setdefault
 - ip xfrm policy getdefault

And the corresponding part in 'ip xfrm monitor'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:28 -06:00
David Ahern 2be7d99960 Merge branch 'rdma-optional-stats' into next
Mark Zhang  says:

====================

This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg106283.html

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-16 12:52:02 -06:00
Stephen Hemminger 229eaba507 uapi: pickup fix for xfrm ABI breakage
See kernel
Commit 844f7eaaed9 ("include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-15 17:40:30 -07:00
Nicolas Dichtel 95cd2a6204 iplink: enable to specify index when changing netns
When an interface is moved to another netns, it's possible to specify a
new ifindex. Let's add this support.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eeb85a14ee34
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 18:05:09 -06:00
David Ahern a936a73fc2 Merge branch 'config-libdir' into next
Andrea Claudi  says:

====================

This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.

Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;

Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;

Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;

Patch 5 introduces support for the --opt=value style on current options,
for uniformity;

Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;

Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.

Changelog:
----------
v4 -> v5
  - bail out when multiple values are provided with a single option
  - simplify option parsing and reduce code duplication, as suggested
    by Phil Sutter
  - remove a nasty eval on libdir option processing

v3 -> v4
  - fix parsing issue on '--include_dir' and '--libbpf_dir'
  - split '--opt value' and '--opt=value' use cases, avoid code
    duplication moving semantic checks on value to dedicated functions

v2 -> v3
  - fix parsing error on prefix and libdir options.

v1 -> v2
  - consolidate '--opt value' and '--opt=value' use cases, as suggested
    by David Ahern.
  - added patch 2 to manage the --prefix option, used by the Debian
    packaging system, as reported by Luca Boccassi, and use it when
    setting lib directory.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:59:33 -06:00
Andrea Claudi cee0cf84bd configure: add the --libdir option
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.

At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).

As Luca states in commit a3272b9372 ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.

Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.

Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:20 -06:00
Andrea Claudi 0ee1950b5c configure: add the --prefix option
This commit add the '--prefix' option to the iproute2 configure script.

This mimics the '--prefix' option that autotools configure provides, and
will be used later to allow users or packagers to set the lib directory.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:17 -06:00
Andrea Claudi 4b8bca5f9e configure: support --param=value style
This commit makes it possible to specify values for configure params
using the common autotools configure syntax '--param=value'.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:05 -06:00
Andrea Claudi 99245d1741 configure: simplify options parsing
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.

- The conditional shift after the assignments is moved right after the
  case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
  like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
  making it possible to get rid of the null string case break.

As a bonus, now the help message states that on or off should follow
--libbpf_force

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:02 -06:00
Andrea Claudi c330d09794 configure: fix parsing issue with more than one value per option
With commit a9c3d70d90 ("configure: add options ability") users are no
more able to provide wrong command lines like:

$ ./configure --include_dir foo bar

The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.

Commit a3272b9372 ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.

This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.

Fixes: a3272b9372 ("configure: restore backward compatibility")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:57 -06:00
Andrea Claudi 48c379bc2a configure: fix parsing issue on libbpf_dir option
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:

$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.

Fixes: 7ae2585b86 ("configure: convert LIBBPF environment variables to command-line options")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:53 -06:00
Andrea Claudi 1d819dcc74 configure: fix parsing issue on include_dir option
configure is stuck in an endless loop if '--include_dir' option is used
without a value:

$ ./configure --include_dir
./configure: line 506: shift: 2: shift count out of range
./configure: line 506: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the include dir
exists; this avoid to produce an erroneous configuration.

Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:48 -06:00
Neta Ostrovsky 19ba785f16 rdma: Add optional-counters set/unset support
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.

Examples:
To enable a set of optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic unset link rocep8s0f0/1 optional-counters

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:57 -06:00
Neta Ostrovsky 7d5cb70e94 rdma: Add stat "mode" support
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.

An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.

Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
    $ rdma statistic mode link rocep8s0f0/1
    link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

To present supported optional counters on link rocep8s0f0/1:
    $ rdma statistic mode supported link rocep8s0f0/1
    link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:53 -06:00
Neta Ostrovsky d480cb71f5 rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit 7301d0a9834c
("RDMA/nldev: Add support to get status of all counters")

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:47 -06:00
David Ahern e4ca6a4965 Update kernel headers
Update kernel headers to commit:
    295711fa8fec ("Merge branch 'dpaa2-irq-coalescing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:49:19 -06:00
Stephen Hemminger a31e7b7967 mptcp: cleanup include section.
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.

We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.

Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:48:36 -06:00
Paul Chaignon a500c5ac87 lib/bpf: fix map-in-map creation without prepopulation
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.

Commit 6d61a2b557 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.

This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=130996, max_entries=1, map_flags=0, inner_map_fd=0, map_name="maglev_inner", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = 32
    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=2, value_size=4, max_entries=65536, map_flags=BPF_F_NO_PREALLOC, inner_map_fd=0, map_name="maglev_outer", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = -1 EINVAL (Invalid argument)

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-14 14:37:51 -07:00
Antoine Tenart 7c032cac10 man: devlink-port: remove extra .br
br. were added between options of the same command. That is not needed
and makes the output to be one 3 lines for no particular reason.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 04ee8e6f06 man: devlink-port: fix style
Values should be .I, square brackets should be used for optional values,
curly brackets for lists. Follow this in the devlink-port man page.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 14802d84d3 man: devlink-port: fix the devlink port add synopsis
When configuring a devlink PCI SF port, the sfnumber can be specified
using 'sfnum' and not 'pcisf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
David Ahern 8cd517a805 Merge branch 'main' into next
Conflicts:
	ip/ipneigh.c

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:47:47 -06:00
David Ahern 763fd793fe Merge branch 'ioam-encap-modes' into next
Justin Iurman  says:

====================

Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.

In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.

This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:

 - inline: directly inserts IOAM inside packets (by default).

 - encap:  ip6ip6 encapsulation of IOAM inside packets.

 - auto:   either inline mode for packets generated locally or encap mode for
           in-transit packets.

With current iproute2 implementation, it is configured this way:

$ ip -6 r [...] encap ioam6 trace prealloc [...]

The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:

(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]

(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]

(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]

A tunnel destination address must be configured when using the encap mode or the
auto mode.

  [1] https://lore.kernel.org/netdev/163335001045.30570.12527451523558030753.git-patchwork-notify@kernel.org/T/#m3b428d4142ee3a414ec803466c211dfdec6e0c09

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:37:12 -06:00
Justin Iurman 41020eb0fd Update documentation
This patch updates the IOAM documentation (ip-route man page) to reflect the
three encap modes that were introduced.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:54 -06:00
Justin Iurman 8fb522cde3 Add support for IOAM encap modes
This patch adds support for the three IOAM encap modes that were introduced:
inline, encap and auto.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:29 -06:00
Frank Villaro-Dixon 897772a735 cmd: use spaces instead of tabs for usage indentation
Fix rogue "tab after spaces" used for indentation of the documentation.
This causes rendering issues on terminals using a non-standard tab width.

Signed-off-by: Frank Villaro-Dixon <frank.villaro@infomaniak.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-06 10:00:49 -07:00
Nikolay Aleksandrov b840c620fe ip: nexthop: keep cache netlink socket open
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:34:29 -06:00
Jacob Keller b90174354d devlink: print maximum number of snapshots if available
Recently the kernel gained ability to report the maximum number of
snapshots a region can have. Print this value out if it was reported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:31:30 -06:00
David Ahern 6448ed373c Update kernel headers
Update kernel headers to commit:
    49ed8dde3715 ("net: usb: use eth_hw_addr_set() for dev->addr_len cases")

Update to linux/mptcp.h is removed because it breaks compilation
of ipmptcp.c in a nontrivial way.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:28:28 -06:00
Davide Caratti e7a98a96f0 mptcp: unbreak JSON endpoint list
the following command:

 # ip -j mptcp endpoint show

prints a JSON array that misses the terminating bracket. Fix this calling
delete_json_obj() to balance the call to new_json_obj().

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
2021-10-04 14:07:09 -07:00
David Ahern ec703e0629 Merge branch 'nexthop-cache' into next
Nikolay Aleksandrov  says:

====================

This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.

I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.

Comments are very welcome as usual. :)

Changes since RFC:
 - reordered parse/print splits, in order to do that I have to parse
   resilient groups first, then add nh entry parsing so code has been
   reordered as well and patch order has changed, but there have been
   no functional changes (as before refactoring of old code is done in
   the first 8 patches and then patches 9-12 add the new cache and use it)
 - re-run all tests above

Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
             can directly pass them from the nh_entry structure later
Patch     3: adds new nha_res_grp structure which describes a resilient
             nexhtop group
Patch     4: splits print_nh_res_group into a parse and print parts
             which use the new nha_res_grp structure
Patch     5: adds new nh_entry structure which describes a nexthop
Patch     6: factors out print_nexthop's attribute parsing into nh_entry
             structure used before printing
Patch     7: factors out print_nexthop's nh_entry structure printing
Patch     8: factors out ipnh_get's rtnl talk part and allows to use a
             different rt handle for the communication
Patch     9: adds nexthop cache and helpers to manage it, it uses the
             new __ipnh_get to retrieve nexthops
Patch    10: adds a new helper print_cache_nexthop_id that prints nexthop
             information from its id, if the nexthop is not found in the
             cache it fetches it
Patch    11: the new print_cache_nexthop_id helper is used when printing
             routes with show_details (-d) to output detailed nexthop
             information, the format after nh_info is the same as
             ip nexthop show
Patch    12: changes print_nexthop into print_cache_nexthop which always
             outputs the nexthop information and can also update the cache
             (based on process_cache argument), it's used to keep the
             cache up to date while monitoring

Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
	nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via 169.254.2.12 dev veth3 weight 1
	nexthop via 169.254.3.13 dev veth5 weight 1

[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
	nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via fe80:2::12 dev veth3 weight 1
	nexthop via fe80:3::13 dev veth5 weight 1

[NEXTHOP]id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
	nh_info id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink

JSON:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101,
                    "weight": 11
                },{
                    "id": 102,
                    "weight": 45
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 11,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 45,
                "flags": [ ]
            } ]
  }

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:31:44 -06:00
Nikolay Aleksandrov 7ca868a7aa ip: nexthop: add print_cache_nexthop which prints and manages the nh cache
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:59 -06:00
Nikolay Aleksandrov 5d5dc549ce ip: route: print and cache detailed nexthop information when requested
If -d (show_details) is used when printing/monitoring routes then print
detailed nexthop information in the field "nh_info". The nexthop is also
cached for future searches.

Output looks like:
 unicast 198.51.100.0/24 nhid 103 table 3 proto boot scope global
	 nh_info id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	 nexthop via 169.254.2.22 dev veth2 weight 1
	 nexthop via 169.254.3.23 dev veth4 weight 1

The nh_info field has the same format as ip -d nexthop show would've had
for the same nexthop id.

For completeness the JSON version looks like:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101
                },{
                    "id": 102
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 1,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 1,
                "flags": [ ]
            } ]
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:55 -06:00
Nikolay Aleksandrov cb3d18c29e ip: nexthop: add a helper which retrieves and prints cached nh entry
Add a helper which looks for a nexthop in the cache and if not found
reads the entry from the kernel and caches it. Finally the entry is
printed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:51 -06:00
Nikolay Aleksandrov 60a9703032 ip: nexthop: add cache helpers
Add a static nexthop cache in a hash with 1024 buckets and helpers to
manage it (link, unlink, find, add nexthop, del nexthop). Adding new
nexthops is done by creating a new rtnl handle and using it to retrieve
the nexthop so the helper is safe to use while already reading a
response (i.e. using the global rth).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:41 -06:00
Nikolay Aleksandrov 53d7c43bd3 ip: nexthop: factor out ipnh_get_id rtnl talk into a helper
Factor out ipnh_get_id's rtnl talk portion into a separate helper which
will be reused later to retrieve nexthops for caching.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:36 -06:00
Nikolay Aleksandrov a2ca431215 ip: nexthop: factor out print_nexthop's nh entry printing
Factor out nexthop entry structure printing from print_nexthop,
effectively splitting it into parse and print parts.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:32 -06:00
Nikolay Aleksandrov 945c26db68 ip: nexthop: parse attributes into nh entry structure before printing
Factor out the nexthop attribute parsing and parse attributes into a
nexthop entry structure which is then used to print.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:28 -06:00
Nikolay Aleksandrov 7ec1cee630 ip: nexthop: add nh entry structure
Add a structure which describes a nexthop, it will be later used to
parse, print and cache nexthops.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:24 -06:00
Nikolay Aleksandrov 60a7515b89 ip: nexthop: split print_nh_res_group into parse and print parts
Now that we have resilient group structure split print_nh_res_group into
a parse and print functions, print_nexthop calls the parse function
first to parse the attributes into the structure and then uses the print
function to print the parsed structure.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:19 -06:00
Nikolay Aleksandrov cfb0a8729e ip: nexthop: add resilient group structure
Add a structure which describes a resilient nexthop group. It will be
later used for parsing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:11 -06:00
Nikolay Aleksandrov 371e889da7 ip: export print_rta_gateway version which outputs prepared gateway string
Export a new __print_rta_gateway that takes a prepared gateway string to
print which is also used by print_rta_gateway for consistent format.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:06 -06:00
Nikolay Aleksandrov f72789965e ip: print_rta_if takes ifindex as device argument instead of attribute
We need print_rta_if() to take ifindex directly so later we can use it
with cached converted nexthop objects.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:23:31 -06:00
David Ahern c8c9111a4c Merge branch 'ax.25-netrom-rose' into next
Ralf Baechle  says:

====================

net-tools contain support for these three protocol but are deprecated and
no longer installed by default by many distributions.  Iproute2 otoh has
no support at all and will dump the addresses of these protocols which
actually are pretty human readable as hex numbers:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 88:98:60:a0:92:40:02 brd a2:a6:a8:40:40:40:00
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom 88:98:60:a0:92:40:0a brd 00:00:00:00:00:00:00
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 65:09:33:30:00 brd 00:00:00:00:00

This series adds basic support for the three protocols to print addresses:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 DL0PI-1 brd QST-0
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom DL0PI-5 brd *
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 6509333000 brd 0000000000

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:03:11 -06:00
Ralf Baechle e2cc9840ea ROSE: Print decoded addresses rather than hex numbers.
NETROM is a OSI layer 3 protocol sitting on top of AX.25.  It uses BCD-
encoded 10 digit telephone numbers as addresses.  Without this ip will
print a ROSE addresses like

  link/rose 12:34:56:78:90 brd 00:00:00:00:00

which is readable but ugly.  With this applied it ROSE addresses will be
printed as

  link/rose 1234567890 brd 0000000000

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:51 -06:00
Ralf Baechle 26c5782fab ROSE: Add rose_ntop implementation.
ROSE addresses are ten digit numbers, basically like North American
telephone numbers.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:45 -06:00
Ralf Baechle fd4c1c8168 NETROM: Print decoded addresses rather than hex numbers.
NETROM is an OSI layer 3 protocol sitting on top of AX.25.  It also uses
AX.25 addresses.  Without this commit ip will print NETROM address like

  link/generic 98:92:9c:aa:b0:40:02 brd 00:00:00:00:00:00:00

while with this commit the decoded result

  link/generic LINUX-1 brd *

is much more eye friendly.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:40 -06:00
Ralf Baechle c63b769ad4 NETROM: Add netrom_ntop implementation.
NETROM uses AX.25 addresses so this is a simple wrapper around ax25_ntop1.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:37 -06:00
Ralf Baechle 399ae00af5 AX.25: Print decoded addresses rather than hex numbers.
Before this, ip would have printed the AX.25 address configured for an
AX.25 interface's default addresses as:

  link/ax25 98:92:9c:aa:b0:40:02 brd a2:a6:a8:40:40:40:00

which is pretty unreadable.  With this commit ip will decode AX.25
addresses like

  link/ax25 LINUX-1 brd QST-0

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:34 -06:00
Ralf Baechle 3a92669b3a AX.25: Add ax25_ntop implementation.
AX.25 addresses are based on Amateur radio callsigns followed by an SSID
like XXXXXX-SS where the callsign is up to 6 characters which are either
letters or digits and the SSID is a decimal number in the range 0..15.
Amateur radio callsigns are assigned by a country's relevant authorities
and are 3..6 characters though a few countries have assigned callsigns
longer than that.  AX.25 is not able to handle such longer callsigns.

Being based on HDLC AX.25 encodes addresses by shifting them one bit left
thus zeroing bit 0, the HDLC extension bit for all but the last bit of
a packet's address field but for our purposes here we're not considering
the HDLC extension bit that is it will always be zero.

Linux' internal representation of AX.25 addresses in Linux is very similar
to this on the on-air or on-the-wire format.  The callsign is padded to
6 octets by adding spaces, followed by the SSID octet then all 7 octets
are left-shifted by one byte.

This for example turns "LINUX-1" where the callsign is LINUX and SSID is 1
into 98:92:9c:aa:b0:40:02.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:30 -06:00
Andrea Claudi 2f5825cb38 lib: bpf_legacy: fix bpffs mount when /sys/fs/bpf exists
bpf selftests using iproute2 fails with:

$ ip link set dev veth0 xdp object ../bpf/xdp_dummy.o section xdp_dummy
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program

This happens when the /sys/fs/bpf directory exists. In this case, mkdir
in bpf_mnt_check_target() fails with errno == EEXIST, and the function
returns -1. Thus bpf_get_work_dir() does not call bpf_mnt_fs() and the
bpffs is not mounted.

Fix this in bpf_mnt_check_target(), returning 0 when the mountpoint
exists.

Fixes: d4fcdbbec9 ("lib/bpf: Fix and simplify bpf_mnt_check_target()")
Reported-by: Mingyu Shi <mshi@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:30:52 -07:00
Puneet Sharma d756c08a3d tc/f_flower: fix port range parsing
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.

$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:28:48 -07:00
Gokul Sivakumar ebbb701714 lib: bpf_legacy: add prog name, load time, uid and btf id in prog info dump
The BPF program name is included when dumping the BPF program info and the
kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes for the program
name.

$ sudo ip link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited

The BPF program load time (ns since boottime), UID of the user who loaded
the program and the BTF ID are also included when dumping the BPF program
information when the user expects a detailed ip link info output.

$ sudo ip -details link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filt
ering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:4c:df:a4:54 designated_root 8000.2:42:4c:df:a4:54 root_port 0 r
oot_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_chan
ge_timer    0.00 gc_timer  265.36 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask
0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast
_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_
interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query
_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_v
ersion 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues
1 gso_max_size 65536 gso_max_segs 65535
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited load_time 2676682607316255 created_by_uid 0 btf_id 708

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-21 09:16:32 -06:00
David Ahern 75c5054e7a Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-14 10:46:48 -06:00
Stephen Hemminger 92e32f7791 uapi: updates from 5.15-rc1
Small changes to virtio etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-13 15:07:58 -07:00
Lahav Schlesinger 0431e1e724 ip: Support filter links/neighs with no master
Commit d3432bf10f17 ("net: Support filtering interfaces on no master")
in the kernel added support for filtering interfaces/neighbours that
have no master interface.

This patch completes it and adds this support to iproute2:
1. ip link show nomaster
2. ip address show nomaster
3. ip neighbour {show | flush} nomaster

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:17:18 -06:00
Lennert Buytenhek 12b3d6a2ad man: ip-macsec: fix gcm-aes-256 formatting issue
The 'ip link add' invocation template at the top of the ip-macsec man
page formats with a pair of extra double quotes:

   ip  link  add  link DEVICE name NAME type macsec [ [ address <lladdr> ]
   port PORT | sci <u64> ]  [  cipher  {  default  |  gcm-aes-128  |  gcm-
   aes-256"}][" icvlen ICVLEN ] [ encrypt { on | off } ] [ send_sci { on |

This is due to missing whitespace around the gcm-aes-256 identifier
in the source file.

Fixes: b16f525323 ("Add support for configuring MACsec gcm-aes-256 cipher type.")
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:13:26 -06:00
David Ahern 917d913b2e Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-08 15:13:49 -06:00
David Ahern d0cba0d1f6 Merge branch 'bridge-mcast_router' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan port/bridge multicast router option. It is
similar to the already existing bridge-wide mcast_router control. Patch 01
moves attribute adding and parsing together for vlan option setting,
similar to global vlan option setting. It simplifies adding new options
because we can avoid reserved values and additional checks. Patch 02
adds the new mcast_router option and updates the related man page.

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Note that this set depends on the latest kernel uapi headers.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:03:58 -06:00
Nikolay Aleksandrov ae895504c6 bridge: vlan: add support for mcast_router option
Add support for setting and dumping per-vlan/interface mcast_router
option. It controls the mcast router mode of a vlan/interface pair.
For bridge devices only modes 0 - 2 are allowed. The possible modes
are:
 0 - disabled
 1 - automatic router presence detection (default)
 2 - permanent router
 3 - temporary router (available only for ports)

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
Nikolay Aleksandrov 12fbe3e4eb bridge: vlan: set vlan option attributes while parsing
Set vlan option attributes immediately while parsing to simplify the
checks, avoid having reserved values (e.g. -1 for unset var) and have
more limited scope for the variables. This is also similar to how global
vlan options are set. The attribute setting and checks are moved with
option parsing, no functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
David Ahern db28c944d8 Update kernel headers
Update kernel headers to commit:
    27151f177827 ("Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:59:38 -06:00
Stephen Hemminger 6d676ad934 ip: rewrite routel in python
Not sure if anyone uses the routel script. The script was
a combination of ip route, shell and awk doing command scraping.
It is now possible to do this much better using the JSON
output formats and python.

Rewriting also fixes the bug where the old script could not parse
the current output format.  At the end was getting:
/usr/bin/routel: 48: shift: can't shift that many

The new script also has IPv6 as option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:24 -06:00
Stephen Hemminger 1eaebad2c5 ip: remove routef script
This script is old and limited to IPv4.
Using ip route command directly is better option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:23 -06:00
Stephen Hemminger adddf30cd8 ip: remove ifcfg script
This script was from olden days of ifcfg.
I don't see any distribution using it and it is time to put
it out to pasture.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:19 -06:00
Stephen Hemminger 2c8110881b ip: remove old rtpr script
This script was a one off hack for a special case.
Now that ip commands have better formatting, there is no
real reason for it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:30:36 -06:00
David Marchand e7e0e2ce65 iptuntap: fix multi-queue flag display
When creating a tap with multi_queue flag, this flag is not displayed
when dumping:

$ ip tuntap add tap23 mode tap multi_queue
$ ip tuntap
tap23: tap persist0x100

While at it, add a space between known flags and hexdump of unknown
ones.

Fixes: c41e038f48 ("iptuntap: allow creation of multi-queue tun/tap device")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:41:17 -07:00
Nikolay Aleksandrov deef844b1e man: ip-link: remove double of
Remove double "of".

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:40:36 -07:00
Luca Boccassi a3272b9372 configure: restore backward compatibility
Commit a9c3d70d90 broke backward compatibility
by making 'configure' error out if parameters are passed, instead of
ignoring them.
Sometimes packaging systems detect 'configure' and assume it's from
autotools, and pass a bunch of options. Eg:

 dh_auto_configure
	./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking

Ignore unknown options again instead of erroring out.

Fixes: a9c3d70d90 ("configure: add options ability")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Luca Boccassi ceba59308d tree-wide: fix some typos found by Lintian
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Stephen Hemminger 7a70524270 ip: remove leftovers from IPX and DECnet
Iproute2 has not supported DECnet or IPX since version 5.0.
There were some leftover support in the ip options flags
and parsing, remove these.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:03:53 -07:00
Stephen Hemminger 8ab1834e56 uapi: update headers from 5.15 merge
New headers from 5.15 early merge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:02:50 -07:00
Hangbin Liu 6d0d35bab9 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-09-01 12:51:44 -07:00
David Ahern 926ad64104 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Ilya Dmitrichenko c730bd0b11 ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman df8912ede2 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Peilin Ye 7e7270bb1f tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 86c596ed91 IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 2d83c71082 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman f0b3808afa Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern acbdef9386 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern 2d6fa30bb8 Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Gokul Sivakumar 508ad89c82 ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern fb843668fb Merge branch 'bridge-vlan-global-mcast' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan multicast options. The feature is
globally controlled by a new bridge option called mcast_vlan_snooping
which is added by patch 01. Then patches 2-5 add support for dumping
global vlan options and filtering on vlan id. Patch 06 adds support for
setting global vlan options and then patches 07-18 add all the new
global vlan options, finally patch 19 adds support for dumping vlan
multicast router ports. These options are identical in meaning, names and
functionality as the bridge-wide ones.

All the new vlan global commands are under the global keyword:
 $ bridge vlan global show [ vid VID dev DEVICE ]
 $ bridge vlan global set vid VID dev DEVICE ...

I've added command examples in each commit message. The patch-set is a
bit bigger but the global options follow the same pattern so I don't see
a point in breaking them. All man page descriptions have been taken from
the same current bridge-wide mcast options. The only additional iproute2
change which is left to do is the per-vlan mcast router control which
I'll send separately. Note to properly use this set you'll need the
updated kernel headers where mcast router was moved from a global option
to per-vlan/per-device one (changed uapi enum which was in net-next).

Example:
 # enable vlan mcast snooping globally
 $ ip link set dev bridge type bridge mcast_vlan_snooping 1
 # enable mcast querier on vlan 100
 $ bridge vlan global set dev bridge vid 100 mcast_querier 1
 # show vlan 100's global options
 $ bridge -s vlan global show vid 100
port              vlan-id
bridge            100
                    mcast_snooping 1 mcast_querier 1 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000

A following kernel patch-set will add selftests which use these commands.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:32:31 -06:00
Nikolay Aleksandrov 72222cd467 bridge: vlan: add support for dumping router ports
Add dump support for vlan multicast router ports and their details if
requested. If details are requested we print 1 entry per line, otherwise
we print all router ports on a single line similar to how mdb prints
them.

Looks like:
$ bridge vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20 ens16

Looks like (with -s):
 $ bridge -s vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20   187.57 temp
                                   ens16   118.27 temp

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:31 -06:00
Nikolay Aleksandrov 7ad5505bb5 bridge: vlan: add global mcast_querier option
Add control and dump support for the global mcast_querier option which
controls if the bridge will act as a multicast querier for that vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_querier 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:26 -06:00
Nikolay Aleksandrov 061da2e222 bridge: vlan: add global mcast_startup_query_interval option
Add control and dump support for the global mcast_startup_query_interval
option which controls the interval between queries in the startup phase.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when setting
it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_interval 15000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:12 -06:00
Nikolay Aleksandrov 60dcd5c318 bridge: vlan: add global mcast_query_response_interval option
Add control and dump support for the global mcast_query_response_interval
option which sets the Max Response Time/Maximum Response Delay for IGMP/MLD
queries sent by the bridge. To be consistent with the same bridge-wide
option the value is reported with USER_HZ granularity and the same
granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_response_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:47 -06:00
Nikolay Aleksandrov 0e4cfa0370 bridge: vlan: add global mcast_query_interval option
Add control and dump support for the global mcast_query_interval
option which controls the interval between queries sent by the bridge
after the end of the startup phase. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:29 -06:00
Nikolay Aleksandrov ebcee09ca1 bridge: vlan: add global mcast_querier_interval option
Add control and dump support for the global mcast_querier_interval
option which controls the interval after which if no other router
queries are seen the bridge will start sending its own queries.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when
setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_querier_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:12 -06:00
Nikolay Aleksandrov 3ae784f589 bridge: vlan: add global mcast_membership_interval option
Add control and dump support for the global mcast_membership_interval
option which controls the interval after which the bridge will leave a
group if no reports have been received for it. To be consistent with the
same bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 26000 (260 seconds).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_membership_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:56 -06:00
Nikolay Aleksandrov 2b6cc38d52 bridge: vlan: add global mcast_last_member_interval option
Add control and dump support for the global mcast_last_member_interval
option which controls the interval between queries to find remaining
members of a group after a leave message. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 100 (1 second).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_interval 200

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:45 -06:00
Nikolay Aleksandrov 7cc7dbf447 bridge: vlan: add global mcast_startup_query_count option
Add control and dump support for the global mcast_startup_query_count
option which controls the number of queries the bridge will send on the
vlan during startup phase (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_count 5

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:28 -06:00
Nikolay Aleksandrov 3399c0759f bridge: vlan: add global mcast_last_member_count option
Add control and dump support for the global mcast_last_member_count option
which controls the number of queries the bridge will send on the vlan after
a leave is received (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_count 10

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:26:07 -06:00
Nikolay Aleksandrov a8d7212a4f bridge: vlan: add global mcast_mld_version option
Add control and dump support for the global mcast_mld_version option
which controls the MLD version on the vlan (default 1).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_mld_version 2

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:25:17 -06:00
Nikolay Aleksandrov 29fada0f41 bridge: vlan: add global mcast_igmp_version option
Add control and dump support for the global mcast_igmp_version option
which controls the IGMP version on the vlan (default 2).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_igmp_version 3

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:24:09 -06:00
Nikolay Aleksandrov 1f608d590c bridge: vlan: add global mcast_snooping option
Add control and dump support for the global mcast_snooping option which
controls if multicast snooping is enabled or disabled for a single vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_snooping 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:23:26 -06:00
Nikolay Aleksandrov dee5eb05e5 bridge: vlan: add support to set global vlan options
Add support to change global vlan options via a new vlan global
set subcommand similar to the current vlan set subcommand. The man page
and help are updated accordingly. The command works only with bridge
devices. It doesn't support any options yet.

Syntax: $ bridge vlan global set vid VID dev DEV

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:13 -06:00
Nikolay Aleksandrov ecf6d8b4a1 bridge: vlan: add support for vlan filtering when dumping options
In order to allow vlan filtering when dumping options we need to move
all print operations into the option dumping functions and add the
filtering after we've parsed the nested attributes so we can extract the
start and end vlan ids.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:09 -06:00
Nikolay Aleksandrov 720f8613bd bridge: vlan: add support to show global vlan options
Add support for new bridge vlan command grouping called global which
operates on global options. The first command it supports is "show".
To do that we update print_vlan_rtm to recognize the global vlan options
attribute and parse it properly.
Man page and help are also updated with the new command.

Syntax is: $ bridge vlan global show [ vid VID ] [ dev DEV ]

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:04 -06:00
Nikolay Aleksandrov d3a961a9b1 bridge: vlan: skip unknown attributes when printing options
Skip unknown attributes when printing vlan options in print_vlan_rtm.
Make sure print_vlan_opts doesn't accept attributes it doesn't understand.
Currently we print only one type, later global vlan options support will
be added.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:00 -06:00
Nikolay Aleksandrov 312e22fe79 bridge: vlan: factor out vlan option printing
Factor out the code which prints current per-vlan options from
print_vlan_rtm without any changes, later we'll filter based on the vlan
attribute and add support for global vlan option printing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:53 -06:00
Nikolay Aleksandrov d2eecb9d1d ip: bridge: add support for mcast_vlan_snooping
Add support for mcast_vlan_snooping option which controls per-vlan
multicast snooping, also update the man page.
Syntax: $ ip link set dev bridge type bridge mcast_vlan_snooping 0/1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:03 -06:00
Stephen Hemminger 169f36a0c9 v5.14.0 2021-08-31 11:57:59 -07:00
Jakub Kicinski 85b0e73c77 ss: fix fallback to procfs for raw sockets
Jonas reports that ss -awp does not display any RAW sockets
on a Knoppix 4.4 kernel.

sockdiag_send() diverts to tcpdiag_send() to try the older
netlink interface. tcpdiag_send() works for TCP and DCCP
but not other protocols. Instead of rejecting unsupported
protocols (and missing RAW and SCTP) match on supported ones.

Link: https://lore.kernel.org/netdev/20210815231738.7b42bad4@mmluhan/
Reported-and-tested-by: Jonas Bechtel <post@jbechtel.de>
Fixes: 41fe6c34de ("ss: Add inet raw sockets information gathering via netlink diag interface")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 15:03:46 -07:00
Stephen Hemminger 1afde09498 uapi: update neighbour.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:09:34 -07:00
Gokul Sivakumar 10ecd12690 man: bridge: fix the typo to change "-c[lor]" into "-c[olor]" in man page
Fixes: 3a1ca9a5b ("bridge: update man page for new color and json changes")
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 057d3c6d37 bridge: fdb: don't colorize the "dev" & "dst" keywords in "bridge -c fdb"
To be consistent with the colorized output of "ip" command and to increase
readability, stop highlighting the "dev" & "dst" keywords in the colorized
output of "bridge -c fdb" cmd.

Example: in the following "bridge -c fdb" entry, only "00:00:00:00:00:00",
"vxlan100" and "2001:db8:2::1" fields should be highlighted in color.

00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1 self permanent

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 82149efee9 bridge: reorder cmd line arg parsing to let "-c" detected as "color" option
As per the man/man8/bridge.8 page, the shorthand cmd line arg "-c" can be
used to colorize the bridge cmd output. But while parsing the args in while
loop, matches() detects "-c" as "-compressedvlans" instead of "-color", so
fix this by doing the check for "-color" option first before checking for
"-compressedvlans".

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Hangbin Liu 3a09567f7d ip/bond: add arp_validate filter support
Add arp_validate filter support based on kernel commit 896149ff1b2c
("bonding: extend arp_validate to be able to receive unvalidated arp-only traffic")

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:02:44 -07:00
Parav Pandit 355c49ffa5 devlink: Show port state values in man page and in the help command
Port function state can have either of the two values - active or
inactive. Update the documentation and help command for these two
values to tell user about it.

With the introduction of state, hw_addr and state are optional.
Hence mark them as optional in man page that also aligns with the help
command output.

Fixes: bdfb9f1bd6 ("devlink: Support set of port function state")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-11 15:02:30 -07:00
Hangbin Liu ebaa603b30 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-08-11 12:26:20 -06:00
David Ahern 8d6134b204 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:23:33 -06:00
Ilya Dmitrichenko 51d8fc708c ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:17:52 -06:00
Justin Iurman 71ba9c18e0 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:16:09 -06:00
Phil Sutter 9b7ea92b9e tc: u32: Fix key folding in sample option
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:02:43 -07:00
Andrea Claudi d1eacf12b5 lib: bpf_glue: remove useless assignment
The value of s used inside the cycle is the result of strstr(), so this
assignment is useless.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:01:54 -07:00
Andrea Claudi 50a4127022 lib: bpf_legacy: fix potential NULL-pointer dereference
If bpf_map_fetch_name() returns NULL, strlen() hits a NULL-pointer
dereference on outer_map_name.

Fix this checking outer_map_name value, and returning false when NULL,
as already done for inner_map_name before.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:55:12 -07:00
Jacob Keller 954a0077c8 devlink: fix infinite loop on flash update for drivers without status
When processing device flash update, cmd_dev_flash function waits until
the flash process has completed. This requires the following two
conditions to both be true:

a) we've received an exit status from the child process
b) we've received the DEVLINK_CMD_FLASH_UPDATE_END *or*
   we haven't received any status notifications from the driver.

The original devlink flash status monitoring code in 9b13cddfe2
("devlink: implement flash status monitoring") was written assuming that
a driver will either send no status updates, or it will send at least
one DEVLINK_CMD_FLASH_UPDATE_STATUS before DEVLINK_CMD_FLASH_UPDATE_END.

Newer versions of the kernel since commit 52cc5f3a166a ("devlink: move flash
end and begin to core devlink") in v5.10 moved handling of the
DEVLINK_CMD_FLASH_UPDATE_END into the core stack, and will send this
regardless of whether or not the driver sends any of its own status
notifications.

The handling of DEVLINK_CMD_FLASH_UPDATE_END in cmd_dev_flash_status_cb
has an additional condition that it must not be the first message.
Otherwise, it falls back to treating it like
a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is wrong because it can lead to an infinite loop if a driver does
not send any status updates.

In this case, the kernel will send DEVLINK_CMD_FLASH_UPDATE_END without
any DEVLINK_CMD_FLASH_UPDATE_STATUS. The devlink application will see
that ctx->not_first is false, and will treat this like any other status
message. Thus, ctx->not_first will be set to 1.

The loop condition to exit flash update will thus never be true, since
we will wait forever, because ctx->not_first is true, and
ctx->received_end is false.

This leads to the application appearing to process the flash update, but
it will never exit.

Fix this by simply always treating DEVLINK_CMD_FLASH_UPDATE_END the same
regardless of whether its the first message or not.

This is obviously the correct thing to do: once we've received the
DEVLINK_CMD_FLASH_UPDATE_END the flash update must be finished. For new
kernels this is always true, because we send this message in the core
stack after the driver flash update routine finishes.

For older kernels, some drivers may not have sent any
DEVLINK_CMD_FLASH_UPDATE_STATUS or DEVLINK_CMD_FLASH_UPDATE_END. This is
handled by the while loop conditional that exits if we get a return
value from the child process without having received any status
notifications.

An argument could be made that we should exit immediately when we get
either the DEVLINK_CMD_FLASH_UPDATE_END or an exit code from the child
process. However, at a minimum it makes no sense to ever process
DEVLINK_CMD_FLASH_UPDATE_END as if it were a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is easy to test as it is triggered by the selftests for the
netdevsim driver, which has a test case for both with and without status
notifications.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:54:39 -07:00
Feng Zhou be99929d60 lib/bpf: Fix btf_load error lead to enable debug log
Use tc with no verbose, when bpf_btf_attach fail,
the conditions:
"if (fd < 0 && (errno == ENOSPC || !ctx->log_size))"
will make ctx->log_size != 0. And then, bpf_prog_attach,
ctx->log_size != 0. so enable debug log.
The verifier log sometimes is so chatty on larger programs.
bpf_prog_attach is failed.
"Log buffer too small to dump verifier log 16777215 bytes (9 tries)!"

BTF load failure does not affect prog load. prog still work.
So when BTF/PROG load fail, enlarge log_size and re-fail with
having verbose.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:53:54 -07:00
Peilin Ye e78411948d tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-08 11:56:55 -06:00
David Ahern 09d8ce3db1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-04 09:24:12 -06:00
David Ahern e8763fc9ab Merge branch 'ipv6-oam' into next
Justin Iurman says:

====================

The IOAM patchset was merged recently (see net-next commits [1,2,3,4,5,6]).
Therefore, this patchset provides support for IOAM inside iproute2, as well as
manpage documentation. Here is a summary of added features inside iproute2.

(1) configure IOAM namespaces and schemas:

$ ip ioam
Usage:  ip ioam { COMMAND | help }
        ip ioam namespace show
        ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
        ip ioam namespace del ID
        ip ioam schema show
        ip ioam schema add ID DATA
        ip ioam schema del ID
        ip ioam namespace set ID schema { ID | none }

(2) provide a new encap type to insert the IOAM pre-allocated trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

  [1] db67f219fc9365a0c456666ed7c134d43ab0be8a
  [2] 9ee11f0fff205b4b3df9750bff5e94f97c71b6a0
  [3] 8c6f6fa6772696be0c047a711858084b38763728
  [4] 3edede08ff37c6a9370510508d5eeb54890baf47
  [5] de8e80a54c96d2b75377e0e5319a64d32c88c690
  [6] 968691c777af78d2daa2ee87cfaeeae825255a58

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:34:09 -06:00
Justin Iurman 78832863ef IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:35 -06:00
Justin Iurman 32f4969d44 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:31 -06:00
Justin Iurman 2909812583 Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:05 -06:00
David Ahern e53f4cd504 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:32:26 -06:00
David Ahern 236696e52c Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:25:09 -06:00
Gokul Sivakumar cf866f0a5a ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:14:50 -06:00
Peilin Ye c06d313d86 tc/skbmod: Remove misinformation about the swap action
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."

This is false.  In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored.  As an example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		matchall action skbmod \
        	set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
        	swap mac

The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's.  The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():

	parm = nla_data(tb[TCA_SKBMOD_PARMS]);
	index = parm->index;
	if (parm->flags & SKBMOD_F_SWAPMAC)
		lflags = SKBMOD_F_SWAPMAC;
		^^^^^^^^^^^^^^^^^^^^^^^^^^

Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap".  Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.

If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-22 15:14:29 -07:00
Roi Dayan 71d36000dc police: Fix normal output back to what it was
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.

Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:14:30 -07:00
Lahav Schlesinger f760bff328 ipmonitor: Fix recvmsg with ancillary data
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().

This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.

This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.

Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:13:36 -07:00
Stephen Hemminger 7a7e9ed98f uapi: headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:12:47 -07:00
Christian Schürmann 1f2c908d53 man8/ip-tunnel.8: fix typo, 'encaplim' is not a valid option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-15 09:31:51 -07:00
Alexander Mikhalitsyn 115e987035 libnetlink: check error handler is present before a call
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.

Fixes: 459ce6e3d7 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-11 10:33:44 -07:00
Stephen Hemminger 0015ada629 libnetlink: cosmetic changes
Don't initialize arguments that are NULL, and format initialization
in a more logical way.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:39:07 -07:00
Alexander Mikhalitsyn 459ce6e3d7 ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".

Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated

Expected result here is to get empty dump file (as it was before this
change).

v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.

v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).

v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions

v5: space fixes, commit message reformat, empty initializers

Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:32:56 -07:00
Stephen Hemminger 8f85d085fe uapi: update kernel headers from 5.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 17:07:24 -07:00
Martynas Pumputis 83d4d61bc9 libbpf: fix attach of prog with multiple sections
When BPF programs which consists of multiple executable sections via
iproute2+libbpf (configured with LIBBPF_FORCE=on), we noticed that a
wrong section can be attached to a device. E.g.:

    # tc qdisc replace dev lxc_health clsact
    # tc filter replace dev lxc_health ingress prio 1 \
        handle 1 bpf da obj bpf_lxc.o sec from-container
    # tc filter show dev lxc_health ingress filter protocol all
        pref 1 bpf chain 0 filter protocol all pref 1 bpf chain 0
        handle 0x1 bpf_lxc.o:[__send_drop_notify] <-- WRONG SECTION
        direct-action not_in_hw id 38 tag 7d891814eda6809e jited

After taking a closer look into load_bpf_object() in lib/bpf_libbpf.c,
we noticed that the filter used in the program iterator does not check
whether a program section name matches a requested section name
(cfg->section). This can lead to a wrong prog FD being used to attach
the program.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 16:59:39 -07:00
David Ahern 02c06ffc13 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-07-01 14:29:42 +00:00
Stephen Hemminger fc3511962d lib: remove blank line at eof
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 13:20:44 -07:00
Stephen Hemminger 0e7ea3e8fe v5.13.0 2021-06-29 11:24:17 -07:00
Ben Hutchings 33cf9306c8 devlink: Fix printf() type mismatches on 32-bit architectures
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.

Fix this by using the standard macro PRIu64 instead.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Ben Hutchings 4ac0383a59 utils: Fix BIT() to support up to 64 bits on all architectures
devlink and vdpa use BIT() together with 64-bit flag fields.  devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.

Fix this by making BIT() use uint64_t instead of unsigned long.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Stephen Hemminger c73fb66070 uapi: update headers to 5.13
Final 5.13 header update

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-28 10:19:08 -07:00
Roi Dayan 6f15c21719 devlink: Fix link errors on some systems
On some systems we fail to link because of missing math lib.
add -lm to devlink.

    LINK     devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2

Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-26 14:57:27 -07:00
Asbjørn Sloth Tønnesen 2ff4761db4 tc: pedit: add decrement operation
Implement a decrement operation for ttl and hoplimit.

Since this is just syntactic sugar, it goes that:

  tc filter add ... action pedit ex munge ip ttl dec ...
  tc filter add ... action pedit ex munge ip6 hoplimit dec ...

is just a more readable version of this:

  tc filter add ... action pedit ex munge ip ttl add 0xff ...
  tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...

This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.

Tested with skip_sw on Mellanox ConnectX-6 Dx.

[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989

v3:
   - Use dedicated flags argument in parse_cmd() (David Ahern)
   - Minor rewording of the man page

v2:
   - Fix whitespace issue (Stephen Hemminger)
   - Add to usage info in explain()

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:45:19 +00:00
Asbjørn Sloth Tønnesen bc5e8473aa tc: pedit: parse_cmd: add flags argument
This patch just prepares the flags argument, so it's
available to the next patch.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:44:35 +00:00
Sergey Ryazanov 6acccd52a2 iplink: support for WWAN devices
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.

The WWAN interface for a new data channel should be created with a
command like this:

ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2

Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:57 +00:00
Sergey Ryazanov 362da458a4 iplink: add support for parent device
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".

Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:22 +00:00
David Ahern 083e2706e1 Import wwan.h uapi file
Import wwan.h uapi file at version from last kernel headers sync.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:39:47 +00:00
Stephen Hemminger 8316825a52 man: fix syntax for ip link property
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-24 11:54:04 -07:00
Paolo Lungaroni 3e26254f31 seg6: add support for SRv6 End.DT46 Behavior
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.

As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.

To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:

 $ sysctl -wq net.vrf.strict_mode=1

Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.

An SRv6 End.DT46 Behavior instance can be created as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6local",
        "action": "End.DT46",
        "vrftable": 100,
        "dev": "vrf100",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:36:17 +00:00
David Ahern 1d11326a57 Update kernel headers
Update kernel headers to commit:
    ef2c3ddaa4ed ("ibmvnic: Use strscpy() instead of strncpy()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:33:45 +00:00
Guillaume Nault f8879e85f0 utils: bump max args number to 512 for batch files
Large tc filters can have many arguments. For example the following
filter matches the first 7 MPLS LSEs, pops all of them, then updates
the Ethernet header and redirects the resulting packet to eth1.

filter add dev eth0 ingress handle 44 priority 100 \
  protocol mpls_uc flower mpls                     \
    lse depth 1 label 1040076 tc 4 bos 0 ttl 175   \
    lse depth 2 label 89648 tc 2 bos 0 ttl 9       \
    lse depth 3 label 63417 tc 5 bos 0 ttl 185     \
    lse depth 4 label 593135 tc 5 bos 0 ttl 67     \
    lse depth 5 label 857021 tc 0 bos 0 ttl 181    \
    lse depth 6 label 239239 tc 1 bos 0 ttl 254    \
    lse depth 7 label 30 tc 7 bos 1 ttl 237        \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol ipv6 pipe               \
  action vlan pop_eth pipe                         \
  action vlan push_eth                             \
    dst_mac 00:00:5e:00:53:7e                      \
    src_mac 00:00:5e:00:53:03 pipe                 \
  action mirred egress redirect dev eth1

This filter has 149 arguments, so it can't be used with tc -batch
which is limited to a 100.

Let's bump the limit to 512. That should leave a lot of room for big
batch commands.

v2:
   -Define the limit in utils.h (Stephen Hemminger)
   -Bump the limit even higher (256 -> 512) (Stephen Hemminger)

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-18 02:57:05 +00:00
Stephen Hemminger e1d3ac755d uapi: update kernel headers to 5.13-rc6
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-17 15:54:05 -07:00
David Ahern d8b3b9d32d Merge branch 'devlink-rate-support' into next
Dmytro Linkin says:

====================

Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object

Examples:

Display all rate objects:

    # devlink port function rate show
    pci/0000:03:00.0/1 type leaf parent some_group
    pci/0000:03:00.0/2 type leaf tx_share 12Mbit
    pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps

Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:

    # devlink port function rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/1 type leaf

Display node rate object with name some_group of the pci/0000:03:00.0
device:

    # devlink port function rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group type node

Display leaf rate object rate values using IEC units:

    # devlink -i port function rate show pci/0000:03:00.0/2
    pci/0000:03:00.0/2 type leaf 11718Kibit

Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:

    # devlink -jp port function rate show pci/0000:03:00.0/2
    {
        "rate": {
            "pci/0000:03:00.0/2": {
                "type": "leaf",
                "tx_share": 1500000
            }
        }
    }

Create node rate object with name "1st_group" on pci/0000:03:00.0 device:

    # devlink port function rate add pci/0000:03:00.0/1st_group

Create node rate object with specified parameters:

    # devlink port function rate add pci/0000:03:00.0/2nd_group \
        tx_share 10Mbit tx_max 30Mbit parent 1st_group

Set parameters to the specified leaf rate object:

    # devlink port function rate set pci/0000:03:00.0/1 \
        tx_share 2Mbit tx_max 10Mbit

Set leaf's parent to "1st_group":

    # devlink port function rate set pci/0000:03:00.0/1 parent 1st_group

Unset leaf's parent:

    # devlink port function rate set pci/0000:03:00.0/1 noparent

Delete node rate object:

    # devlink port function rate del pci/0000:03:00.0/2nd_group

Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:34 +00:00
Dmytro Linkin dedf895184 devlink: Add ISO/IEC switch
Add -i/--iec switch to print rate values using binary prefixes.
Update devlink(8) and devlink-rate(8) pages.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:13 +00:00
Dmytro Linkin 6c70aca76e devlink: Add port func rate support
Implement user commands to manage devlink port func rate objects.
List all rate commands:

    $ devlink port func rate help

or just

    $ devlink port func rate

To list all OR particular rate object:

    $ devlink port func rate show
    pci/0000:03:00.0/some_group: type node
    pci/0000:03:00.0/0: type leaf
    pci/0000:03:00.0/1: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/0: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group: type node

Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:

    $ devlink port func rate add pci/0000:03:00.0/some_group

or with defining tx rate limits

    $ devlink port func rate add pci/0000:03:00.0/some_group \
        tx_shara 10kbit tx_max 100mbit

NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.

To delete node object:

    $ devlink port func rate del pci/0000:03:00.0/some_group

Set rate limits of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 \
        tx_share 5MBps tx_max 25GBps

    $ devlink prot func rate set pci/0000:03:00.0/some_group \
        tx_share 0

Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.

NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.

NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.

Set or unset parent of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 parent some_group

    $ devlink port func rate set pci/0000:03:00.0/0 noparent

NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:06 +00:00
Dmytro Linkin 95339955c5 devlink: Add helper function to validate object handler
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:37:21 +00:00
David Ahern 85903c9a29 Update kernel headers
Update kernel headers to commit:
    76cf404c40ae ("Merge branch 'ipa-mem-2'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:38:23 +00:00
Parav Pandit fbd4b581cb devlink: Add optional controller user input
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.

An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev

$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:49 +00:00
Roi Dayan 0d5cf51e0d police: Add support for json output
Change to use the print wrappers instead of fprintf().

This is example output of the options part before this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1 police 0x2 ,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    }overhead 0b linklayer unspec
        ref 1 bind 1
,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

This is the output of the same dump with this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1,
                    "kind": "police",
                    "index": 2,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    },
                    "overhead": 0,
                    "linklayer": "unspec",
                    "ref": 1,
                    "bind": 1,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:36 +00:00
Eric Dumazet 52f136f640 tc: fq: add horizon attributes
Commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
added kernel support for horizon attributes in linux-5.8

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
 Sent 690924 bytes 3234 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 112 (inactive 104 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

$ tc qd change dev wlp2s0 root fq horizon 500ms horizon_cap

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 500ms horizon_cap
 Sent 831220 bytes 3844 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 122 (inactive 120 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-07 02:56:01 +00:00
Hangbin Liu 7ae2585b86 configure: convert LIBBPF environment variables to command-line options
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:59 +00:00
Hangbin Liu a9c3d70d90 configure: add options ability
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.

This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:11 +00:00
Roman Mashak 9d9b1a84a5 ss: update ss man page
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-01 15:55:06 -07:00
Ariel Levkovich 825bd5dacb tc: f_flower: Add missing ct_state flags to usage description
Add ct_state flags rpl and inv to the commands usage
description

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:40:05 +00:00
Ariel Levkovich 7fda6c588a tc: f_flower: Add option to match on related ct state
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state -est-rel+trk \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +rel+trk \
  action mirred egress redirect dev ens1f0_1

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:39:14 +00:00
Florian Westphal d3740fdc26 libgenl: make genl_add_mcast_grp set errno on error
genl_add_mcast_grp doesn't set errno in all cases.

On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with

    "can't subscribe to mptcp events: Success"

Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.

Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-05-17 11:59:37 -07:00
Heiko Thiery c5b72cc56b lib/fs: fix issue when {name,open}_to_handle_at() is not implemented
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-17 02:31:29 +00:00
David Ahern 62c88ed940 config.mk: Rerun configure when it is newer than config.mk
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
2021-05-17 02:13:56 +00:00
Jakub Kicinski 49437375b6 ip: dynamically size columns when printing stats
This change makes ip -s -s output size the columns
automatically. I often find myself using json
output because the normal output is unreadable.
Even on a laptop after 2 days of uptime byte
and packet counters almost overflow their columns,
let alone a busy server.

For max readability switch to right align.

Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8227918473 8617683  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    691937917  4727223  0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

After:

    RX:  bytes packets errors dropped  missed   mcast
    8228633710 8618408      0       0       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
     692006303 4727740      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0      10

More importantly, with large values before:

    RX: bytes  packets  errors  dropped overrun mcast
    126570234447969 15016149200 0       0       0       0
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    126570234447969 15016149200 0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

Note that in this case we have full shift by a column,
e.g. the value under "dropped" is actually for "errors" etc.

After:

    RX:       bytes     packets errors dropped  missed   mcast
    126570234447969 15016149200      0       0       0       0
    RX errors:           length    crc   frame    fifo overrun
                              0      0       0       0       0
    TX:       bytes     packets errors dropped carrier collsns
    126570234447969 15016149200      0       0       0       0
    TX errors:          aborted   fifo  window heartbt transns
                              0      0       0       0      10

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:51:59 +00:00
Paolo Lungaroni 02ca3aabe9 seg6: add counters support for SRv6 Behaviors
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

v2:
 - add help and route.8 man page updates

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:20:59 +00:00
Andrea Claudi e44786b269 tc: htb: improve burst error messages
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"

The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.

This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"

Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:13:22 +00:00
Andrea Claudi 28ee49e515 tipc: bail out if key is abnormally long
tipc segfaults when called with an abnormally long key:

$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated

Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Andrea Claudi 93c267bfb4 tipc: bail out if algname is abnormally long
tipc segfaults when called with an abnormally long algname:

$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated

Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Hoang Le 459f280813 tipc: call a sub-routine in separate socket
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.

Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Tyson Moore 0d95472a4b tc-cake: update docs to include LE diffserv
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.

Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:59:52 +00:00
Andrea Claudi 2d212aae55 dcb: fix memory leak
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.

Fix this using a goto, as it is already done in the same function.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:48:02 +00:00
Andrea Claudi cfd89a6f8b dcb: fix return value on dcb_cmd_app_show
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:47:57 +00:00
Andrea Claudi 3296d4fe77 lib: bpf_legacy: avoid to pass invalid argument to close()
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.

Avoid this checking for fd to be positive.

Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:43:54 +00:00
Andrea Claudi a2f1f66075 tc: q_ets: drop dead code from argument parsing
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
  parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
  parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
  specified and nbands is 0, terminating execution.

Thus nbands cannot be < 1 at this point and this code cannot be executed.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:42:44 +00:00
Jakub Kicinski 570d2cf0ec ip: align the name of the 'nohandler' stat
Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8848233056 8548168  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun   nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1142925945 4683483  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

After:

    RX: bytes  packets  errors  dropped missed  mcast
    8848297833 8548461  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1143049820 4683865  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:41:19 +00:00
David Ahern c3f852754f Update kernel headers
Update kernel headers to commit:
    8621436671f3 ("smc: disallow TCP_ULP in smc_setsockopt()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:16:04 +00:00
David Ahern c79fcefaaf Merge branch 'rdma-copy-on-fork' into next
Gal Pressman  says:

====================

This is the userspace part for the new copy-on-fork attribute added to
the get sys netlink command.

The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

Kernel series was merged:
https://lore.kernel.org/linux-rdma/20210418121025.66849-1-galpress@amazon.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:45:19 +00:00
Gal Pressman bce4247869 rdma: Add copy-on-fork to get sys command
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

If the attribute is not reported (expected on older kernels),
copy-on-fork is disabled.

Example:
$ rdma sys
netns shared copy-on-fork on

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:13 +00:00
Gal Pressman 212e2c1d0c rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
6cc9e215eb27 ("RDMA/nldev: Add copy-on-fork attribute to get sys command")

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:06 +00:00
Jianguo Wu 7f1d58d1a1 mptcp: make sure flag signal is set when add addr with port
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.

Fixes: 42fbca91cd ("mptcp: add support for port based endpoint")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-30 14:30:24 +00:00
David Ahern e1e089d1f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:48:28 +00:00
Jethro Beekman d56dcd3549 ip: Add nodst option to macvlan type source
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:45:59 +00:00
David Ahern e5f1505e53 Merge branch 'rdma-resource-tracking' into next
Leon Romanovsky  says:

====================

This is the user space part of already accepted to the kernel series
that extends RDMA netlink interface to return uverbs context and SRQ
information.

The accepted kernel series can be seen here:
https://lore.kernel.org/linux-rdma/20210422133459.GA2390260@nvidia.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:32 +00:00
Neta Ostrovsky 9b272e138d rdma: Add SRQ resource tracking information
Sample output:

$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon

$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon

$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon

Reviewed-by: Ido Kalir <idok@nvidia.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:16 +00:00
Neta Ostrovsky 4278941285 rdma: Add context resource tracking information
Sample output:

$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:59 +00:00
Neta Ostrovsky 4c61b5b9df rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit c6c11ad3ab9f
("RDMA/nldev: Add QP numbers to SRQ information")

Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:21 +00:00
David Ahern a5ea744ca2 Update kernel headers
Update kernel headers to commit:
    99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:35:30 +00:00
Stephen Hemminger 2363bc99f9 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Required manual fix of devlink/devlink.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 19:39:39 -07:00
Stephen Hemminger 1fdea28051 v5.12.0 2021-04-27 11:59:09 -07:00
Stephen Hemminger a3fb3fcb7d remove trailing whitespace
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 11:55:53 -07:00
Andrea Claudi e1ad689545 lib: bpf_legacy: fix missing socket close when connect() fails
In functions bpf_{send,recv}_map_fds(), when connect fails after a
socket is successfully opened, we return with error missing a close on
the socket.

Fix this closing the socket if opened and using a single return point
for both the functions.

Fixes: 6256f8c9e4 ("tc, bpf: finalize eBPF support for cls and act front-end")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 92af24c907 lib: bpf_legacy: treat 0 as a valid file descriptor
As stated in the man page(), open returns a non-negative integer as a
file descriptor. Hence, when checking for its return value to be ok, we
should include 0 as a valid value.

This fixes a covscan warning about a missing close() in this function.

Fixes: ecb05c0f99 ("bpf: improve error reporting around tail calls")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 932fe3453f tc: e_bpf: fix memory leak in parse_bpf()
envp_run is dinamically allocated with a malloc, and not freed in the
out: return path. This commit fix it.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 38ef5bb7b4 ip: netns: fix missing netns close on some error paths
In functions netns_pids() and netns_identify_pid(), the netns file is
not closed on some error paths.

Fix this using a conditional close and a single return point on both
functions.

Fixes: 44b563269e ("ip-nexthop: support flush by id")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:04:02 -07:00
Nikolay Aleksandrov c72de3713d bridge: vlan: dump port only if there are any vlans
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).

Example (no vlans):
$ bridge -d vlan show
port              vlan-id

Fixes: e5f87c8341 ("bridge: vlan: add support for the new rtm dump call")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:32:46 +00:00
Tony Ambardar e705b19d48 ip: drop 2-char command assumption
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:

  $ ./ip-full help
  Object "-full" is unknown, try "ip help".

This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).

Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.

Fixes: 351efcde4e ("Update header files to 2.6.14")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:29:42 +00:00
Stephen Hemminger b5a6ed9cc9 uapi: add missing virtio related headers
The build of iproute2 relies on having correct copy of santized
kernel headers. The vdpa utility introduced a dependency on
the vdpa related headers, but these headers were not present
in iproute2 repo.

Fixes: c2ecc82b9d ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-23 10:36:17 -07:00
Andrea Claudi 81bfd01a4c lib: move get_task_name() from rdma
The function get_task_name() is used to get the name of a process from
its pid, and its implementation is similar to ip/iptuntap.c:pid_name().

Move it to lib/fs.c to use a single implementation and make it easily
reusable.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:22:16 +00:00
David Ahern 75a35d50a3 Merge branch 'bridge-vlan' into next
Nikolay Aleksandrov  says:

====================

From: Nikolay Aleksandrov <nikolay@nvidia.com>

This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.

Example:
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state forwarding
 bridge            1 PVID Egress Untagged
                     state forwarding

 $ bridge vlan set vid 1 dev ens13 state blocking
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state blocking
 bridge            1 PVID Egress Untagged
                     state forwarding

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:20:13 +00:00
Nikolay Aleksandrov c311404780 bridge: monitor: add support for vlan monitoring
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:39 +00:00
Nikolay Aleksandrov e5f87c8341 bridge: vlan: add support for the new rtm dump call
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.

Example:
$ bridge -s -d vlan show
port              vlan-id
virbr1            1 PVID Egress Untagged
                    state forwarding

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:34 +00:00
Nikolay Aleksandrov 34c14bea22 libnetlink: add bridge vlan dump request helper
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:29 +00:00
Nikolay Aleksandrov 04e2783d5e bridge: vlan: add option set command and state option
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.

Example:
 $ bridge vlan set vid 10 dev br0 state learning
or a range:
 $ bridge vlan set vid 10-20 dev swp1 state blocking

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:24 +00:00
Nikolay Aleksandrov f2f52fcabe bridge: add parse_stp_state helper
Add a helper which parses an STP state string to its numeric value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:20 +00:00
Nikolay Aleksandrov f07516c3b0 bridge: rename and export print_portstate
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:09 +00:00
Florian Westphal ff619e4fd3 mptcp: add support for event monitoring
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.

While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.

This adds print support for the existing events.

Sample output of 'ip mptcp monitor':
[       CREATED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[   ESTABLISHED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[SF_ESTABLISHED] token=83f3a692 remid=0 locid=1 saddr4=10.0.2.2 daddr4=10.0.1.1 sport=40195 dport=10011 backup=0
[        CLOSED] token=83f3a692

Signed-off-by: Florian Westphal <fw@strlen.de>
2021-04-22 05:10:25 +00:00
David Ahern 98040c2dc1 Update kernel headers
Update kernel headers to commit:
    5d869070569a ("net: phy: marvell: don't use empty switch default case")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:09:39 +00:00
Andrea Claudi c8216fabe8 rdma: stat: fix return code
libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
return codes, and stat_qp_show_parse_cb() should do the same.

Fixes: 16ce4d2366 ("rdma: stat: initialize ret in stat_qp_show_parse_cb()")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-20 18:08:38 -07:00
Andrea Claudi 16ce4d2366 rdma: stat: initialize ret in stat_qp_show_parse_cb()
In the unlikely case in which the mnl_attr_for_each_nested() cycle is
not executed, this function return an uninitialized value.

Fix this initializing ret to 0.

Fixes: 5937552b42 ("rdma: Add "stat qp show" support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6a2c51da99 nexthop: fix memory leak in add_nh_group_attr()
grps is dinamically allocated with a calloc, and not freed in a return
path in the for cycle. This commit fix it.

While at it, make the function use a single return point.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6801ae8273 q_cake: remove useless check on argv
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6b8fa2ea2d devlink: always check strslashrsplit() return value
strslashrsplit() return value is not checked in __dl_argv_handle(),
despite the fact that it can return EINVAL.

This commit fix it and make __dl_argv_handle() return error if
strslashrsplit() return an error code.

Fixes: 2f85a9c535 ("devlink: allow to parse both devlink and port handle in the same time")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Stephen Hemminger cc718c191b uapi: update can.h
Upstream commit to force packing on ARM OABI

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:14:34 -07:00
Stephen Hemminger 06d0bbf1ee erspan: fix JSON output
The format for erspan/erspan6 output is not valid JSON, as on version 2 a
valueless key was presented. The direction should be value and erspan_dir
should be the key.

Fixes: 2897636267 ("erspan: add erspan version II support")
Cc: u9012063@gmail.com
Reported-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-10 09:52:48 -07:00
Chunmei Xu 44b563269e ip-nexthop: support flush by id
since id is unique for nexthop, it is heavy to dump all nexthops.
use existing delete_nexthop to support flush by id

Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-08 15:38:58 +00:00
Hoang Le 2021028306 tipc: use the libmnl functions in lib/mnl_utils.c
To avoid code duplication, tipc should be converted to use the helper
functions for working with libmnl in lib/mnl_utils.c

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-03 01:10:54 +00:00
Stephen Hemminger e77a0d3dc9 uapi: bpf.h update from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-30 16:38:05 -07:00
Baowen Zheng cf9ae1bd31 police: add support for packet-per-second rate limiting
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.

e.g.
 # $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
 # $TC actions ls action police
 total acts 1

	action order 0:  police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
	ref 1 bind 0

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:50 +00:00
Cooper Lees 16430e9afd Add Open/R to rt_protos
- Open Routing is using ID 99 for it's installed routes
- https://github.com/facebook/openr
- Kernel has accepted 99 in `rtnetlink.h`

Signed-of-by: Cooper Lees <me@cooperlees.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:09 +00:00
Petr Machata 7384c15e0e ip: Fix batch processing
After the comment cited below, batch mode neglects to set the global
variable batch_mode to a non-zero value. Netns and VRF commands use this
variable, and break in batch mode. Fix by setting the value again.

Fixes: 1d9a81b8c9 ("Unify batch processing across tools")
Reported-by: Tim Rice <trice@posteo.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-22 16:30:21 -07:00
David Ahern 76bfc185f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:16:01 +00:00
Sabrina Dubroca 3c75135835 ip: xfrm: add support for tfcpad
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:15:07 +00:00
Stephen Hemminger 872689d431 uapi: minor header update for l2tp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:36:07 -07:00
Stephen Hemminger 87d6d395d1 README: remove doc instructions
The out of date documentation was removed in 2017, but the instructions
in the README were not removed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:29:02 -07:00
David Ahern fa505da84b Merge branch 'nexthop-resilient-hash' into next
Petr Machata  says:

====================

Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:05:29 +00:00
Ido Schimmel 2be6d18b30 nexthop: Add support for nexthop buckets
Add ability to dump multiple nexthop buckets and get a specific one.
Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop
 id 1 via 192.0.2.2 dev dummy10 scope link
 id 2 via 192.0.2.19 dev dummy20 scope link
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
 # ip nexthop bucket
 id 10 index 0 idle_time 28.1 nhid 2
 id 10 index 1 idle_time 28.1 nhid 2
 id 10 index 2 idle_time 28.1 nhid 2
 id 10 index 3 idle_time 28.1 nhid 2
 id 10 index 4 idle_time 28.1 nhid 1
 id 10 index 5 idle_time 28.1 nhid 1
 id 10 index 6 idle_time 28.1 nhid 1
 id 10 index 7 idle_time 28.1 nhid 1
 # ip nexthop bucket show nhid 1
 id 10 index 4 idle_time 53.59 nhid 1
 id 10 index 5 idle_time 53.59 nhid 1
 id 10 index 6 idle_time 53.59 nhid 1
 id 10 index 7 idle_time 53.59 nhid 1
 # ip nexthop bucket get id 10 index 5
 id 10 index 5 idle_time 81 nhid 1
 # ip -j -p nexthop bucket get id 10 index 5
 [ {
         "id": 10,
         "bucket": {
             "index": 5,
             "idle_time": 104.89,
             "nhid": 1
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:25 +00:00
Ido Schimmel 9167671822 nexthop: Add support for resilient nexthop groups
Add ability to configure resilient nexthop groups and show their current
configuration. Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop show id 10
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
 # ip -j -p nexthop show id 10
 [ {
         "id": 10,
         "group": [ {
                 "id": 1
             },{
                 "id": 2
             } ],
         "type": "resilient",
         "resilient_args": {
             "buckets": 8,
             "idle_timer": 120,
             "unbalanced_timer": 0
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:18 +00:00
Ido Schimmel b82d6b81fa nexthop: Add ability to specify group type
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.

These two commands are equivalent:

 # ip nexthop add id 10 group 1/2/3
 # ip nexthop add id 10 group 1/2/3 type mpath

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:49 +00:00
Petr Machata 28fb925d8b nexthop: Extract a helper to parse a NH ID
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:43 +00:00
Petr Machata e757f741e9 json_print: Add print_tv()
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:08 +00:00
David Ahern a5b355c08c Update kernel headers
Update kernel headers to commit:
    38cb57602369 ("selftests: net: forwarding: Fix a typo")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 14:59:17 +00:00
Stephen Hemminger 6639fce430 ip: cleanup help message text
Wrap help message text at 80 characters, and put list of things
in alpha order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-18 11:24:06 -07:00
Tony Ambardar 06bee37c1c lib/bpf: add missing limits.h includes
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.

It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:

bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
   46 |  char buf[PATH_MAX], *s;
      |           ^~~~~~~~
      |           AF_MAX

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:53 -07:00
Sabrina Dubroca 6050055387 ip: xfrm: limit the length of the security context name when printing
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.

While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.

Fixes: b2bb289a57 ("xfrm security context support")
Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:28 -07:00
David Ahern 27ca8989c1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-15 15:08:01 +00:00
Toke Høiland-Jørgensen 60204c81e4 q_cake: Fix incorrect printing of signed values in class statistics
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.

Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-08 19:05:19 -08:00
Roi Dayan 9f366536ed dcb: Fix compilation warning about reallocarray
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:56:39 -08:00
Luca Boccassi 6739068fb0 iproute: fix printing resolved localhost
format_host_rta_r might return a cached hostname
via its return value and not use the input buffer.

Before:

$ ip -resolve -6 route
 dev lo proto kernel metric 256 pref medium

After:

$ ip/ip -resolve -6 route
localhost dev lo proto kernel metric 256 pref medium

Bug-Debian: https://bugs.debian.org/983591

Reported-by: Axel Scheepers <axel.scheepers76@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:54:16 -08:00
Parav Pandit c54e7bd605 devlink: Add error print when unknown values specified
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:16 +00:00
Parav Pandit 62ff25e51b devlink: Use generic socket helpers from library
User generic socket helpers from library for netlink generic socket
access.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:10 +00:00
Parav Pandit e3a4067e52 utils: Introduce helper routines for generic socket recv
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.

Use API in subsequent devlink patch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:04 +00:00
Parav Pandit 03662000e4 devlink: Use library provided string processing APIs
User helper routines provided by library for counting slash and
splitting string on delimiter.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 03:59:58 +00:00
Paolo Abeni 42fbca91cd mptcp: add support for port based endpoint
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.

Just parse and dump an additional, per endpoint, u16 attribute

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:15:10 +00:00
David Ahern 455c9f5361 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:07:57 +00:00
Stephen Hemminger 9d00602f82 vdpa: add .gitignore
Ignore the resulting binary vdpa.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:12:14 -08:00
Stephen Hemminger 5e0e73c347 Update kernel headers from 5.12-pre rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:11:12 -08:00
Stephen Hemminger 52c5f3f043 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2021-02-23 23:03:42 -08:00
Stephen Hemminger bbddfcec6c v5.11.0 2021-02-23 09:34:11 -08:00
Andrea Claudi b2d44b9a95 lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.

get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b3 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi 1de363b180 lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.

This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.

Fixes: ac3415f5c1 ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi d4fcdbbec9 lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.

As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.

Fixes: 95ae9a4870 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
2021-02-22 18:19:01 -08:00
Andrea Claudi 1e25de9a92 lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.

$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
.  ..

This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.

Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()

Fixes: e998e118dd ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:17:56 -08:00
Andrea Claudi e833dbe140 ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.

Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.

Fixes: 0486388a87 ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:48 -08:00
Andrea Claudi 546f738220 tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.

Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.

Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:03 -08:00
Vladimir Oltean e1d79d49ed man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 14f528a556 man8/bridge.8: explain self vs master for "bridge fdb add"
The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean b64ceb687d man8/bridge.8: fix which one of self/master is default for "bridge fdb"
The bridge program does:

fdb_modify:
	/* Assume self */
	if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
		req.ndm.ndm_flags |= NTF_SELF;

which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 10130bfafe man8/bridge.8: explain what a local FDB entry is
Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean ae3cb3d34d man8/bridge.8: document that "local" is default for "bridge fdb add"
The bridge does this:

fdb_modify:
	/* Assume permanent */
	if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
		req.ndm.ndm_state |= NUD_PERMANENT;

So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 1261459c64 man8/bridge.8: document the "permanent" flag for "bridge fdb add"
The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:

fdb_modify:
		} else if (matches(*argv, "local") == 0 ||
			   matches(*argv, "permanent") == 0) {
			req.ndm.ndm_state |= NUD_PERMANENT;

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Ido Kalir 675e2df632 rdma: Fix statistics bind/unbing argument handling
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.

Let's check the user input and ensure that arguments are valid.

Fixes: a6d0773ebe ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 10:52:39 -08:00
Thayne McCombs c7897ec2a6 ss: Make leading ":" always optional for sport and dport
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".

This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-14 22:09:37 -07:00
Amit Cohen 33e2471e8f ip route: Print "rt_offload_failed" indication
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:50:15 -07:00
David Ahern 34de4b26bf Update kernel headers
Update kernel headers to commit:
    c4762993129f ("Merge branch 'skbuff-introduce-skbuff_heads-bulking-and-reusing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:48:05 -07:00
Oleksandr Mazur c946f5d3e4 devlink: add support for port params get/set
Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:21:24 -07:00
David Ahern 143610383d Merge branch 'vdpa' into next
Parav Pandit  says:

====================

Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.

vdpa interface also includes showing supported management devices
which support such operations.

This patchset includes kernel uapi headers and a vdpa tool.

examples:

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
  supported_classes
    net
pci/0000:03.00:4
  supported_classes
    net
auxiliary/mlx5_core.sf.8
  supported_classes
    net

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:16:49 -07:00
Parav Pandit c2ecc82b9d vdpa: Add vdpa tool
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:15 -07:00
Parav Pandit 6c76994982 utils: Add helper to map string to unsigned int
In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:10 -07:00
Parav Pandit b822275ad8 utils: Add generic socket helpers
Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family

Hence add helper routines to open, close, query family and to perform
send receive operations.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:07 -07:00
Parav Pandit bd3709c3a7 utils: Add helper routines for indent handling
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:08:13 -07:00
Parav Pandit 5a6bf92a95 Add kernel headers
Add kernel headers to commit from kernel tree [1].
  6acba4951632 ("vdpa_sim_net: Add support for user supported devices")

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:07:47 -07:00
Paul Blakey 049708a002 tc: flower: Add support for ct_state reply flag
Matches on conntrack rpl ct_state.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:28 -07:00
Maxim Mikityanskiy b8b8b6d4c9 tc/htb: Hierarchical QoS hardware offload
This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:

$ tc qdisc show dev eth1
qdisc htb 1: root offloaded r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 offload
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
$ tc class show dev eth1
class htb 1:101 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:1 root rate 100Gbit ceil 100Gbit burst 0b cburst 0b  offload
class htb 1:103 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:102 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:105 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:104 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:107 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:106 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:108 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
$ tc -j qdisc show dev eth1
[{"kind":"htb","handle":"1:","root":true,"offloaded":true,"options":{"r2q":10,"default":"0","direct_packets_stat":0,"direct_qlen":1000,"offload":null}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}}]

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:13 -07:00
Thayne McCombs b7e5002456 ss: always prefer family as part of host condition to default family
ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.

This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:48:16 -07:00
Stephen Hemminger 2741208502 uapi: pick up rpl.h fix
Upstream change to fix byte order issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-03 08:16:16 -08:00
Luca Boccassi 5a37254b71 iproute: force rtm_dst_len to 32/128
Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:

 ip route get `1.0.0.0/1
  Error: ipv4: Invalid values in header for route get request.
 ip route get 0.0.0.0/0
  Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4

Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.

Bug-Debian: https://bugs.debian.org/944730
Reported-By: Clément 'wxcafé' Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:32:47 -08:00
Thayne McCombs 38957a2f6c ss: Add clarification about host conditions with multiple familes to man
In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).

The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.

I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?

Anyway, for now, this patch just notes the limitation in the man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:30:40 -08:00
Thayne McCombs df361a27c2 Add documentation of ss filter to man page
This adds some documentation of the syntax for the FILTER argument to
the ss command to the ss (8) man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:24:03 -08:00
Edwin Peer 9764761888 iplink: print warning for missing VF data
The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:18:42 -08:00
Paolo Abeni 3d6d9e6e67 ss: do not emit warn while dumping MPTCP on old kernels
Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:

RTNETLINK answers: Invalid argument

When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.

Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.

Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:17:14 -08:00
Vladimir Oltean 4712a46174 man: tc-taprio.8: document the full offload feature
Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.

This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:12:27 -08:00
Guillaume Nault 86d9660dc1 iplink_bareudp: cleanup help message and man page
* Fix PROTO description in help message (mpls isn't a valid argument).

 * Remove SRCPORTMIN description from help message since it doesn't
   appear in the syntax string.

 * Use same keywords in help message and in man page.

 * Use the "ethertype" option name (.B ethertype) rather than the
   option value (.I ETHERTYPE) in the man page description of
   [no]multiproto.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:11:32 -08:00
David Ahern d10f2a4bd8 Merge branch 'devlink-port-mgmt' into next
Parav Pandit  says:

====================

This patchset implements devlink port add, delete and function state
management commands.

An example sequence for a PCI SF:

Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false

Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active

Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive

Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:45:49 +00:00
Parav Pandit bdfb9f1bd6 devlink: Support set of port function state
Support set operation of the devlink port function state.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:48 +00:00
Parav Pandit 249465d3bf devlink: Support get port function state
Print port function state and operational state whenever reported by
kernel.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:41 +00:00
Parav Pandit 331bf89ad0 devlink: Supporting add and delete of devlink port
Enable user to add and delete the devlink port.

Examples for adding and deleting one SF port:

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

Add devlink port of flavour 'pcipf' for PF number 0 SF number 88:

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Delete newly added devlink port
$ devlink port del pci/0000:06:00.0/32768

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:36 +00:00
Parav Pandit 836a1365b7 devlink: Introduce PCI SF port flavour and attribute
Introduce PCI SF port flavour and port attributes such as PF
number and SF number.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:30 +00:00
Parav Pandit a9642c5fa6 devlink: Introduce and use string to number mapper
Instead of using static mapping in code, introduce a helper routine to
map a value to string.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:01:53 +00:00
David Ahern 1e61902180 Update kernel headers
Update kernel headers to commit:
    14e8e0f60088 ("tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 01:58:51 +00:00
Oliver Hartkopp 2ce313d1bb iplink_can: add Classical CAN frame LEN8_DLC support
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.

Add the command line support for cc-len8-dlc for Linux 5.11+

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-29 15:49:23 +00:00
Jarod Wilson 7887500008 bond: support xmit_hash_policy=vlan+srcmac
There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0

$ sudo ip link set bond0 type bond xmit_hash_policy vlan+srcmac

$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any
primary_reselect always fail_over_mac none xmit_hash_policy vlan+srcmac resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs
65535

$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlan+srcmac (5)

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
                [ clear_active_slave ] [ miimon MIIMON ]
                [ updelay UPDELAY ] [ downdelay DOWNDELAY ]
                [ peer_notify_delay DELAY ]
                [ use_carrier USE_CARRIER ]
                [ arp_interval ARP_INTERVAL ]
                [ arp_validate ARP_VALIDATE ]
                [ arp_all_targets ARP_ALL_TARGETS ]
                [ arp_ip_target [ ARP_IP_TARGET, ... ] ]
                [ primary SLAVE_DEV ]
                [ primary_reselect PRIMARY_RESELECT ]
                [ fail_over_mac FAIL_OVER_MAC ]
                [ xmit_hash_policy XMIT_HASH_POLICY ]
                [ resend_igmp RESEND_IGMP ]
                [ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
                [ all_slaves_active ALL_SLAVES_ACTIVE ]
                [ min_links MIN_LINKS ]
                [ lp_interval LP_INTERVAL ]
                [ packets_per_slave PACKETS_PER_SLAVE ]
                [ tlb_dynamic_lb TLB_DYNAMIC_LB ]
                [ lacp_rate LACP_RATE ]
                [ ad_select AD_SELECT ]
                [ ad_user_port_key PORTKEY ]
                [ ad_actor_sys_prio SYSPRIO ]
                [ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlan+srcmac
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:33:15 +00:00
wenxu c94fd71b34 tc: flower: add tc conntrack inv ct_state support
Matches on conntrack inv ct_state.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:16:35 +00:00
David Ahern c81a173f6b Update kernel headers
Update kernel headers to commit:
    59a49d9617e2 ("Merge branch 'mlxsw-expose-number-of-physical-ports'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:15:57 +00:00
Luca Boccassi 8498ca92d7 vrf: fix ip vrf exec with libbpf
The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:

$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?

https://bugs.debian.org/980046

Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com>

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:17 -08:00
Luca Boccassi 8dca565b17 vrf: print BPF log buffer if bpf_program_load fails
Necessary to understand what is going on when bpf_program_load fails

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:11 -08:00
Roi Dayan 1a22ad2721 build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366 ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc ("lib: Move get_size() from tc here")
Fixes: adbe5de966 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:28:47 -08:00
David Ahern b553cffa9f Merge branch 'dcb-app-dcbx' into next
Petr Machata  says:

====================

Add support to the dcb tool for the following two DCB objects:

- APP, which allows configuration of traffic prioritization rules based on
  several possible packet headers.

- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
  protocol is implemented in the device or in the host, and which version
  of the protocol should be used.

Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.

Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.

In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.

Patches #6 and #7 add the subtools themselves and their man pages.

v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
  This patch set now depends on that one.
- Patch #5:
    - Make it -N / --Numeric instead of -n / --no-nice-names
    - Rename the flag from no_nice_names to numeric as well
- Patch #6:
    - Adjust to s/no_nice_names/numeric/ from another patch.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:10:27 +00:00
Petr Machata 89d11ea596 dcb: Add a subtool for the DCBX object
The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.

For example:

	# dcb dcbx set dev eni1np1 host ieee
	# dcb dcbx show dev eni1np1
	host ieee

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 8e9bed1493 dcb: Add a subtool for the DCB APP object
DCB APP interfaces are standardized in 802.1q-2018, and allow configuration
of traffic prioritization rules based on several possible headers.

Add a dcb subtool for maintenance and display of the APP table. For
example:

    # dcb app add dev eni1np1 dscp-prio 0:0 CS3:3 CS6:6
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS6:6
    # dcb app add dev eni1np1 dscp-prio CS3:4
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS3:4 CS6:6
    # dcb app replace dev eni1np1 dscp-prio CS3:5
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:5 CS6:6

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 0aebd32b82 dcb: Support -N to suppress translation to human-readable names
Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata e59876ff55 dcb: Generalize dcb_get_attribute()
The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.

Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 69290c32dc dcb: Generalize dcb_set_attribute()
The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.

Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.

Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata c13216f7a6 lib: Generalize parse_mapping()
The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.

To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata bf244ee677 lib: rt_names: Add rtnl_dsfield_get_name()
For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
David Ahern fa2881b664 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 03:57:29 +00:00
Guillaume Nault 676a1a708f tc: flower: fix json output with mpls lse
The json output of the TCA_FLOWER_KEY_MPLS_OPTS attribute was invalid.

Example:

  $ tc filter add dev eth0 ingress protocol mpls_uc flower mpls \
      lse depth 1 label 100                                     \
      lse depth 2 label 200

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "  mpls":["    lse":["depth":1,"label":100],
                  "    lse":["depth":2,"label":200]]}...

This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.

Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.

Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.

Normal output isn't modified.
The json output now looks like:

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "mpls":[{"depth":1,"label":100},
                {"depth":2,"label":200}]}...

Fixes: eb09a15c12 ("tc: flower: support multiple MPLS LSE match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:13:36 -08:00
Petr Machata 934919b991 dcb: Change --Netns/-N to --netns/-n
This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata b4c0cad06e dcb: Plug a leaking DCB socket buffer
DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata 2e99c28161 dcb: Set values with RTM_SETDCB type
dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Stephen Hemminger 8b4b132261 uapi: update if_link.h from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:35 -08:00
Petr Machata ffe58c9185 include: uapi: Carry dcbnl.h
To allow building a new suite of DCB tools on an older kernel, carry a copy
of dcbnl.h.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:28 -08:00
Patrisious Haddad 537995c6d5 rdma: Add support for the netlink extack
Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:18:42 +00:00
Ido Schimmel 9bd498bfcd ipmonitor: Mention "nexthop" object in help and man page
Before:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid
 FILE := file FILENAME

After:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid | nexthop
 FILE := file FILENAME

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:17:32 +00:00
Ido Schimmel 043e03a369 nexthop: Fix usage output
Before:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get| del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
       [ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
 GROUP := [ id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

After:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get | del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
         [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
 GROUP := [ <id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:14:08 +00:00
Stephen Hemminger 2953235e61 uapi: update kernel headers to 5.11 pre rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-24 19:38:35 -08:00
Stephen Hemminger 2639bdc176 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main 2020-12-24 19:29:15 -08:00
Stephen Hemminger c9c64b8d1e 5.10.0 2020-12-21 10:28:53 -08:00
Guillaume Nault cb0debfe2d testsuite: Add mpls packet matching tests for tc flower
Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:

  * with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
    first LSE),

  * with the more generic "lse" keyword (allows matching at different
    depth of the MPLS label stack).

This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:14:26 +00:00
David Ahern c01dec8475 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:06:06 +00:00
Thomas Karlsson 42f5642a40 iplink:macvlan: Added bcqueuelen parameter
This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument

This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.

If not specified, the driver default (1000) will be used.

Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.

For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:02:07 +00:00
Andrea Claudi c8faeca5ad ss: mptcp: fix add_addr_accepted stat print
add_addr_accepted value is not printed if add_addr_signal value is 0.
Fix this properly looking for add_addr_accepted value, instead.

Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-15 13:59:13 -08:00
Andrea Claudi 0d78e8eabf tc: pedit: fix memory leak in print_pedit
keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.

Fixes: 081d6c310d ("tc: pedit: Support JSON dumping")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:24:08 -08:00
Andrea Claudi ec1346acbe devlink: fix memory leak in cmd_dev_flash()
nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.

This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:23:24 -08:00
Andrea Claudi 309e6027e5 man: tc-flower: fix manpage
Commit 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.

This commit simply removes it.

Fixes: 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:22:53 -08:00
David Ahern ee50fd58dc Merge branch 'dcb-pfc-buffer-maxrate' into next
Petr Machata  says:
====================

Add support to the dcb tool for the following three DCB objects:

- PFC, for "Priority-based Flow Control", allows configuration of priority
  lossiness, and related toggles.

- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
  allow configuration of port headroom buffers.

- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
  and allow configuration of rate with which traffic in a given traffic
  class is sent.

Patches #1-#4 fix small issues in the current DCB code and man pages.

Patch #5 adds new helpers to the DCB dispatcher.

Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.

Patches #8-#10 add the subtools themselves and their man pages.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:48:38 +00:00
Petr Machata 117939d9bd dcb: Add a subtool for the DCB maxrate object
DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.

Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:

    # dcb maxrate show dev eni1np1
    tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:07 +00:00
Petr Machata 2e36f91000 dcb: Add a subtool for the DCB buffer object
DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.

Add a dcb subtool to allow showing and tweaking of buffer priority mapping
and buffer sizes. For example:

    # dcb buf show dev eni1np1
    prio-buffer 0:0 1:0 2:0 3:3 4:0 5:0 6:6 7:0
    buffer-size 0:10000 1:0 2:0 3:70000 4:0 5:0 6:10000 7:0
    total-size 221072

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:03 +00:00
Petr Machata 6567cb588b dcb: Add a subtool for the DCB PFC object
PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.

Add a dcb subtool to allow showing and tweaking of individual PFC
configuration options, and querying statistics. For example:

    # dcb pfc show dev eni1np1
    pfc-cap 8 macsec-bypass on delay 0
    pg-pfc 0:off 1:on 2:off 3:off 4:off 5:off 6:off 7:on
    requests 0:0 1:217 2:0 3:0 4:0 5:0 6:0 7:28
    indications 0:0 1:179 2:0 3:0 4:0 5:0 6:0 7:18

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:58 +00:00
Petr Machata 808dd741fc dcb: Add -i to enable IEC mode
Allow switching "dcb" into the ISO/IEC mode of units by passing -i.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:54 +00:00
Petr Machata 6e9687db04 dcb: Add -s to enable statistics
Allow selective display of statistical counters by passing -s.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:50 +00:00
Petr Machata 11a72186a0 dcb: Add dcb_set_u32(), dcb_set_u64()
The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:45 +00:00
Petr Machata 7e94711c71 man: dcb-ets: Remove an unnecessary empty line
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:40 +00:00
Petr Machata a7c2eaac39 dcb: ets: Change the way show parameters are given in synopsis
None, one, or many parameters can be given on the command line, but
the current synopsis allows only none or one. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:22 +00:00
Petr Machata 12d41d0184 dcb: ets: Fix help display for "show" subcommand
"dcb ets show dev X help" currently shows full "ets" help instead of just
help for the show command. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:19 +00:00
Petr Machata 7fe954ee34 dcb: Remove unsupported command line arguments from getopt_long()
getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:40:32 +00:00
Stephen Hemminger 376367d917 uapi: merge in change to bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 08:07:06 -08:00
David Ahern 6e9bfdcdde Merge branch 'devlink-reload' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:43:41 +00:00
Moshe Shemesh c2d7c45c32 devlink: Add reload stats to dev show
Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.

Output examples:
$ devlink dev show -s
pci/0000:82:00.0:
  stats:
      reload:
          driver_reinit:
            unspecified 2
          fw_activate:
            unspecified 1 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
pci/0000:82:00.1:
  stats:
      reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 1
          fw_activate:
            unspecified 1 no_reset 0

$ devlink dev show -s -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 2
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                }
            }
        },
        "pci/0000:82:00.1": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 1
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                }
            }
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:15 +00:00
Moshe Shemesh 0c0023ad71 devlink: Add pr_out_dev() helper function
Add pr_out_dev() helper function and use it both by cmd_dev_show_cb()
and by cmd_mon_show_cb().

Dev stats will be added on the next patch to dev context, so
cmd_mon_show_cb() should print the whole dev context and not just dev
handle.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:09 +00:00
Moshe Shemesh f28c910274 devlink: Add devlink reload action and limit options
Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.

The following reload actions are supported:
  driver_reinit: driver entities re-initialization, applying
                 devlink-param and devlink-resource values.
  fw_activate: firmware activate.

The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.

By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.

Reload limit added:
  no_reset: No reset allowed, no down time allowed, no link flap and no
            configuration is lost.

Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

devlink dev reload pci/0000:82:00.1 action driver_reinit -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit" ]
    }
}

devlink dev reload pci/0000:82:00.0 action fw_activate -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit","fw_activate" ]
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:40:00 +00:00
David Ahern 120cdeb1b7 Merge branch 'rate-size-parsing-output' into next
Petr Machata says:
==================

The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.

s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.

The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.

Patch #4 then improves the code of sprint_size() a little bit.

Patch #5 fixes a buglet in formatting small rates in IEC mode.

Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.

This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:32:17 +00:00
Petr Machata 44396bdfcc lib: Move get_size() from tc here
The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:50 +00:00
Petr Machata f3be0e6366 lib: Move get_rate(), get_rate64() from tc here
The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:44 +00:00
Petr Machata aaeda2a768 lib: print_color_rate(): Fix formatting small rates in IEC mode
ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:41 +00:00
Petr Machata a0a4b6618c lib: sprint_size(): Uncrustify the code a bit
Ideally this and the rate printing would both be converted to a common
helper, but unfortunately the two format differently and this would break
tests and scripts out there. So just make the code look less like a wad of
hay.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:36 +00:00
Petr Machata adbe5de966 lib: Move sprint_size() from tc here, add print_size()
When displaying sizes of various sorts, tc commonly uses the function
sprint_size() to format the size into a buffer as a human-readable string.
This string is then displayed either using print_string(), or in some code
even fprintf(). As a result, a typical sequence of code when formatting a
size is something like the following:

	SPRINT_BUF(b);
	print_uint(PRINT_JSON, "foo", NULL, foo);
	print_string(PRINT_FP, NULL, "foo %s ", sprint_size(foo, b));

For a concept as broadly useful as size, it would be better to have a
dedicated function in json_print.

To that end, move sprint_size() from tc_util to json_print. Add helpers
print_size() and print_color_size() that wrap arount sprint_size() and
provide the JSON dispatch as appropriate.

Since print_size() should be the preferred interface, convert vast majority
of uses of sprint_size() to print_size(). Two notable exceptions are:

- q_tbf, which does not show the size as such, but uses the string
  "$human_readable_size/$cell_size" even in JSON. There is simply no way to
  have print_size() emit the same text, because print_size() in JSON mode
  should of course just use the raw number, without human-readable frills.

- q_cake, which relies on the existence of sprint_size() in its macro-based
  formatting helpers. There might be ways to convert this particular case,
  but given q_tbf simply cannot be converted, leave it as is.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:25 +00:00
Petr Machata 60265cc226 lib: Move print_rate() from tc here; modernize
The functions print_rate() and sprint_rate() are useful for formatting
rate-like values. The DCB tool would find these useful in the maxrate
subtool. However, the current interface to these functions uses a global
variable use_iec as a flag indicating whether 1024- or 1000-based powers
should be used when formatting the rate value. For general use, a global
variable is not a great way of passing arguments to a function. Besides, it
is unlike most other printing functions in that it deals in buffers and
ignores JSON.

Therefore make the interface to print_rate() explicit by converting use_iec
to an ordinary parameter. Since the interface changes anyway, convert it to
follow the pattern of other json_print functions (except for the
now-explicit use_iec parameter). Move to json_print.c.

Add a wrapper to tc, so that all the call sites do not need to repeat the
use_iec global variable argument, and convert all call sites.

In q_cake.c, the conversion is not straightforward due to usage of a macro
that is shared across numerous data types. Simply hand-roll the
corresponding code, which seems better than making an extra helper for one
call site.

Drop sprint_rate() now that everybody just uses print_rate().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:15 +00:00
Petr Machata cdd9425315 Move the use_iec declaration to the tools
The tools "ip" and "tc" use a flag "use_iec", which indicates whether, when
formatting rate values, the prefixes "K", "M", etc. should refer to powers
of 1024, or powers of 1000. The flag is currently kept as a global variable
in "ip" and "tc", but is nonetheless declared in util.h.

Instead, move the declaration to tool-specific headers ip/ip_common.h and
tc/tc_common.h.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:28:43 +00:00
Paolo Lungaroni 69629b4e43 seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors
We introduce the "vrftable" attribute for supporting the SRv6 End.DT4 and
End.DT6 behaviors in iproute2.
The "vrftable" attribute indicates the routing table associated with
the VRF device used by SRv6 End.DT4/DT6 for routing IPv4/IPv6 packets.

The SRv6 End.DT4/DT6 is used to implement IPv4/IPv6 L3 VPNs based on Segment
Routing over IPv6 networks in multi-tenants environments.
It decapsulates the received packets and it performs the IPv4/IPv6 routing
lookup in the routing table of the tenant.

The SRv6 End.DT4/DT6 leverages a VRF device in order to force the routing
lookup into the associated routing table using the "vrftable" attribute.

Some examples:
 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0
 $ ip -6 route add 2001:db8::2 encap seg6local action End.DT6 vrftable 200 dev eth0

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT4 vrftable 100 dev eth0 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End.DT6",
        "vrftable": 200,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

v2:
 - no changes made: resubmit after pulling out this patch from the kernel
   patchset.

v1:
 - mixing this patch with the kernel patchset confused patckwork.

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:27:42 +00:00
David Ahern cfad32569f Update kernel headers
Update kernel headers to commit:
    afae3cc2da10 ("net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:25:34 +00:00
David Ahern 8065d28218 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:25:12 +00:00
David Ahern b3c4a55064 Only compile mnl_utils when HAVE_MNL is defined
New lib/mnl_utils.c fails to compile if libmnl is not installed:

  mnl_utils.c:9:10: fatal error: libmnl/libmnl.h: No such file or directory
      9 | #include <libmnl/libmnl.h>

Make it dependent on HAVE_MNL.

Fixes: 72858c7b77 ("lib: Extract from devlink/mnlg a helper, mnlu_socket_open()")
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:19:05 +00:00
Stephen Hemminger 2e80ae89ca Merge branch 'gcc-10' into main 2020-12-03 08:33:06 -08:00
Luca Boccassi 755b1c584e tc/mqprio: json-ify output
As reported by a Debian user, mqprio output in json mode is
invalid:

{
     "kind": "mqprio",
     "handle": "8021:",
     "dev": "enp1s0f0",
     "root": true,
     "options": { tc 2 map 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
          queues:(0:3) (4:7)
          mode:channel
          shaper:dcb}
}

json-ify it, while trying to maintain the same formatting
for standard output.

New output:

{
    "kind": "mqprio",
    "handle": "8001:",
    "root": true,
    "options": {
        "tc": 2,
        "map": [ 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
        "queues": [ [ 0, 3 ], [ 4, 7 ] ],
        "mode": "channel",
        "shaper": "dcb"
    }
}

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=972784

Reported-by: Roméo GINON <romeo.ginon@ilexia.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:32:42 -08:00
Luca Boccassi 975c4944e8 ip/netns: use flock when setting up /run/netns
If multiple ip processes are ran at the same time to set up
separate network namespaces, and it is the first time so /run/netns
has to be set up first, and they end up doing it at the same time,
the processes might enter a recursive loop creating thousands of
mount points, which might crash the system depending on resources
available.

Try to take a flock on /run/netns before doing the mount() dance, to
ensure this cannot happen. But do not try too hard, and if it fails
continue after printing a warning, to avoid introducing regressions.

First reported on Debian: https://bugs.debian.org/949235

To reproduce (WARNING: run in a VM to avoid system lockups):

for i in {0..9}
do
        strace -e trace=mount -e inject=mount:delay_exit=1000000 ip \
 netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait

The strace is to ensure the problem always reproduces, to add an
artificial synchronization point after the first mount().

Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:31:23 -08:00
Vlad Buslov ea130da81e tc: implement support for action terse dump
Implement support for action terse dump using new TCA_ACT_FLAG_TERSE_DUMP
value of TCA_ROOT_FLAGS tlv. Set the flag when user requested it with
following example CLI (-br for 'brief'):

$ tc -s -br actions ls action tunnel_key
total acts 2

        action order 0: tunnel_key       index 1
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 1: tunnel_key       index 2
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
action (kind, index) and stats, if requested by the user.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:51:06 +00:00
Vlad Buslov 00fffb2d79 tc: use TCA_ACT_ prefix for action flags
Use TCA_ACT_FLAG_LARGE_DUMP_ON alias according to new preferred naming for
action flags.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:49:14 +00:00
David Ahern 23683dec32 Update kernel headers
Update kernel headers to commit:
    cec85994c6b4 ("bareudp: constify device_type declaration")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:47:07 +00:00
Sergey Ryazanov d7190d4ced ip: add IP_LIB_DIR environment variable
Do not hardcode /usr/lib/ip as a path and allow libraries path
configuration in run-time.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-02 16:37:07 +00:00
Stephen Hemminger fb054cb336 uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:17:22 -08:00
Stephen Hemminger c95d63e4fb uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:16:50 -08:00
Stephen Hemminger cae2e9291a f_u32: fix compiler gcc-10 compiler warning
With gcc-10 it complains about array subscript error.

f_u32.c: In function ‘u32_parse_opt’:
f_u32.c:1113:24: warning: array subscript 0 is outside the bounds of an interior zero-length array ‘struct tc_u32_key[0]’ [-Wzero-length-bounds]
 1113 |    hash = sel2.sel.keys[0].val & sel2.sel.keys[0].mask;
      |           ~~~~~~~~~~~~~^~~
In file included from tc_util.h:11,
                 from f_u32.c:26:
../include/uapi/linux/pkt_cls.h:253:20: note: while referencing ‘keys’
  253 |  struct tc_u32_key keys[0];
      |

This is because the keys are actually allocated in the second element
of the parent structure.

Simplest way to address the warning is to assign directly to the keys
in the containing structure.

This has always been in iproute2 (pre-git) so no Fixes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:33 -08:00
Stephen Hemminger c014983921 misc: fix compiler warning in ifstat and nstat
The code here was doing strncpy() in a way that causes gcc 10
warning about possible string overflow. Just use strlcpy() which
will null terminate and bound the string as expected.

This has existed since start of git era so no Fixes tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:31 -08:00
Stephen Hemminger 2319db9052 tc: fix compiler warnings in ip6 pedit
Gcc-10 complains about referencing a zero size array.
This occurs because the array of keys is actually in the following
structure which is part of the overall selector.

The original code was safe, but better to just use the key
array directly.

Fixes: 2d9a8dc439 ("tc: p_ip6: Support pedit of IPv6 dsfield")
Cc: petrm@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:23 -08:00
Stephen Hemminger 5bdc4e9151 bridge: fix string length warning
Gcc-10 complains about possible string length overflow.
This can't happen Ethernet address format is always limited to
18 characters or less. Just resize the temp buffer.

Fixes: 70dfb0b883 ("iplink: bridge: export bridge_id and designated_root")
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:16 -08:00
Stephen Hemminger f817699939 devlink: fix uninitialized warning
GCC-10 complains about uninitialized variable.

devlink.c: In function ‘cmd_dev’:
devlink.c:2803:12: warning: ‘val_u32’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 2803 |    val_u16 = val_u32;
      |    ~~~~~~~~^~~~~~~~~
devlink.c:2747:11: note: ‘val_u32’ was declared here
 2747 |  uint32_t val_u32;
      |           ^~~~~~~

This is a false positive because it can't figure out the control flow
when the parse returns error.

Fixes: 2557dca2b0 ("devlink: Add string to uint{8,16,32} conversion for generic parameters")
Cc: shalomt@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:19:36 -08:00
Vladimir Oltean c29f65db34 bridge: add support for L2 multicast groups
Extend the 'bridge mdb' command for the following syntax:
bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:54:02 +00:00
Luca Boccassi f5c1246e6a Add dcb/.gitignore
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:39:47 +00:00
David Ahern f98ce50046 Merge branch 'libbpf' into next
Hangbin Liu  says:

====================

This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v6:
a) print runtime libbpf version in ip -V and tc -V

v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.

Here are the test results with patched iproute2:
== Show libbpf version
$ ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
$ tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0

== setup env
$ clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
$ clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
$ rm -rf /sys/fs/bpf/xdp/globals
$ /root/iproute2/ip/ip link add type veth
$ /root/iproute2/ip/ip link set veth0 up
$ /root/iproute2/ip/ip link set veth1 up

== Load objs
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
$ /root/iproute2/ip/ip link set veth0 xdp off

== Load objs again to make sure maps could be reused
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Load new btf defined maps
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Test load objs by tc
$ /root/iproute2/tc/tc qdisc add dev veth0 ingress
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
$ /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
$ ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
$ bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:24:15 -07:00
Hangbin Liu 71c7c1fb4f examples/bpf: add bpf examples with BTF defined maps
Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:08 -07:00
Hangbin Liu 1ac8285a69 examples/bpf: move struct bpf_elf_map defined maps to legacy folder
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:06 -07:00
Hangbin Liu 6d61a2b557 lib: add libbpf support
This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.

In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.

Other maps/progs will be loaded by libbpf directly.

[1] https://lore.kernel.org/bpf/20190820114706.18546-1-toke@redhat.com/

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:05 -07:00
Hangbin Liu dc800a4ed4 lib: make ipvrf able to use libbpf and fix function name conflicts
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.

Function bpf_prog_load() is removed as it's conflict with libbpf
function name.

bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:04 -07:00
Hangbin Liu 503e9229b0 iproute2: add check_libbpf() and get_libbpf_version()
This patch aim to add basic checking functions for later iproute2
libbpf support.

First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.

Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.

When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:02 -07:00
David Ahern ee5d4b24e3 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:04:48 -07:00
Roi Dayan ed40b7e2ae tc flower: fix parsing vlan_id and vlan_prio
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e0 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:45:20 -07:00
Petr Machata ca5ec9a17a ip: iptuntap: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:41 -07:00
Petr Machata 66e574c4c5 ip: ipnetconf: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:34 -07:00
Petr Machata 07d82b4a79 ip: iplink_bridge_slave: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:30 -07:00
Petr Machata 3e0d2a73ba ip: iplink_bridge_slave: Port over to parse_on_off()
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:27 -07:00
Petr Machata 5f685d064b ip: iplink: Convert to use parse_on_off()
Invoke parse_on_off() instead of rolling a custom function.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:23 -07:00
Petr Machata 94d12fd796 bridge: link: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:19 -07:00
Petr Machata 9262ccc3ed bridge: link: Port over to parse_on_off()
Convert bridge/link.c from a custom on_off parser to the new global one.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:14 -07:00
David Ahern e1ae6efbb8 Merge branch 'nexthop-flags' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@nvidia.com>

Patch #1 prints the recently added 'RTNH_F_TRAP' flag.

Patch #2 makes sure that nexthop flags are always printed for nexthop
objects. Even when the nexthop does not have a device, such as a
blackhole nexthop or a group.

Example output with netdevsim:

$ ip nexthop
id 1 via 192.0.2.2 dev eth0 scope link trap
id 2 blackhole trap
id 3 group 2 trap

Example output with mlxsw:

$ ip nexthop
id 1 via 192.0.2.2 dev swp3 scope link offload
id 2 blackhole offload
id 3 group 2 offload

Tested with fib_nexthops.sh that uses "ip nexthop" output:

Tests passed: 164
Tests failed:   0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:46:30 -07:00
Ido Schimmel 0788678991 nexthop: Always print nexthop flags
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.

Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:43:56 -07:00
Ido Schimmel 3de35f41be ip route: Print "trap" nexthop indication
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:42:20 -07:00
David Ahern db8b149b16 Update kernel headers
Update kernel headers to commit:
    f9e425e99b07 ("octeontx2-af: Add support for RSS hashing based on Transport protocol field")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:41:23 -07:00
Stephen Hemminger 7a49ff9d79 bridge: report correct version
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-15 08:58:52 -08:00
Zahari Doychev 4c551369e0 tc flower: use right ethertype in icmp/arp parsing
Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 20:07:38 -07:00
David Ahern 1ed00380b0 Merge branch 'dcb-tool' into next
Petr Machata  says:
====================

The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc.

Currently a common libre tool for configuration of DCB is OpenLLDP. This
suite contains a daemon that uses Linux DCB interface to configure HW
according to the DCB TLVs exchanged over an interface. The daemon can also
be controlled by a client, through which the user can adjust and view the
configuration. The downside of using OpenLLDP is that it is somewhat
heavyweight and difficult to use in scripts, and does not support
extensions such as buffer and rate commands.

For access to many HW features, one would be perfectly fine with a
fire-and-forget tool along the lines of "ip" or "tc". For scripting in
particular, this would be ideal. This author is aware of one such tool,
mlnx_qos from Mellanox OFED scripts collection[1].

The downside here is that the tool is very verbose, the command line
language is awkward to use, it is not packaged in Linux distros, and
generally has the appearance of a very vendor-specific tool, despite not
being one.

This patchset addresses the above issues by providing a seed of a clean,
well-documented, easily usable, extensible fire-and-forget tool for DCB
configuration:

    # dcb ets set dev eni1np1 \
                  tc-tsa all:strict 0:ets 1:ets 2:ets \
		  tc-bw all:0 0:33 1:33 2:34

    # dcb ets show dev eni1np1 tc-tsa tc-bw
    tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict
    tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0

    # dcb ets set dev eni1np1 tc-bw 1:30 2:37

    # dcb -j ets show dev eni1np1 | jq '.tc_bw[2]'
    37

The patchset proceeds as follows:

- Many tools in iproute2 have an option to work in batch mode, where the
  commands to run are given in a file. The code to handle batching is
  largely the same independent of the tool in question. In patch #1, add a
  helper to handle the batching, and migrate individual tools to use it.

- A number of configuration options come in a form of an on-off switch.
  This in turn can be considered a special case of parsing one of a given
  set of strings. In patch #2, extract helpers to parse one of a number of
  strings, on top of which build an on-off parser.

  Currently each tool open-codes the logic to parse the on-off toggle. A
  future patch set will migrate instances of this code over to the new
  helpers.

- The on/off toggles from previous list item sometimes need to be dumped.
  While in the FP output, one typically wishes to maintain consistency with
  the command line and show actual strings, "on" and "off", in JSON output
  one would rather use booleans. This logic is somewhat annoying to have to
  open-code time and again. Therefore in patch #3, add a helper to do just
  that.

- The DCB tool is built on top of libmnl. Several routines will be
  basically the same in DCB as they are currently in devlink. In patches
  #4-#6, extract them to a new module, mnl_utils, for easy reuse.

- Much of DCB is built around arrays. A syntax similar to the iplink_vlan's
  ingress-qos-map / egress-qos-map is very handy for describing changes
  done to such arrays. Therefore in patch #7, extract a helper,
  parse_mapping(), which manages parsing of key-value arrays. In patch #8,
  fix a buglet in the helper, and in patch #9, extend it to allow setting
  of all array elements in one go.

- In patch #10, add a skeleton of "dcb", which contains common helpers and
  dispatches to subtools for handling of individual objects. The skeleton
  is empty as of this patch.

  In patch #11, add "dcb_ets", a module for handling of specifically DCB
  ETS objects.

  The intention is to gradually add handlers for at least PFC, APP, peer
  configuration, buffers and rates.

[1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:48:52 -07:00
Petr Machata ef15b07601 dcb: Add a subtool for the DCB ETS object
ETS, for "Enhanced Transmission Selection", is a set of configurations that
permit configuration of mapping of priorities to traffic classes, traffic
selection algorithm to use per traffic class, bandwidth allocation, etc.

Add a dcb subtool to allow showing and tweaking of individual ETS
configuration options. For example:

    # dcb ets show dev eni1np1
    willing on ets_cap 8 cbs off
    tc-bw 0:0 1:0 2:0 3:0 4:100 5:0 6:0 7:0
    pg-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    tc-tsa 0:strict 1:strict 2:strict 3:strict 4:ets 5:strict 6:strict 7:strict
    prio-tc 0:1 1:3 2:5 3:0 4:0 5:0 6:0 7:0
    reco-tc-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    reco-tc-tsa 0:strict 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
    reco-prio-tc 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 67033d1c1c Add skeleton of a new tool, dcb
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc. Add a new tool to show that
configuration and tweak it.

DCB allows configuration of several objects, and possibly could expand to
pre-standard CEE interfaces. Therefore the tool itself is a lean shell that
dispatches to subtools each dedicated to one of the objects.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 66a2d71487 lib: parse_mapping: Recognize a keyword "all"
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.

To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.

Have iplink_vlan pass false for allow_all.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata bc3523ae70 lib: parse_mapping: Update argc, argv on error
Currently argc and argv are not updated unless parsing of all of the
mapping was successful. However in that case, "ip link" will point at the
wrong argument when complaining:

    # ip link add name eth0.100 link eth0 type vlan id 100 egress 1:1 2:foo
    Error: argument "1" is wrong: invalid egress-qos-map

Update argc and argv even in the case of parsing error, so that the right
element is indicated.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 28e663ee65 lib: Extract from iplink_vlan a helper to parse key:value arrays
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.

Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 6dd778e837 lib: Extract from devlink/mnlg a helper, mnlu_socket_recv_run()
Receiving a message in libmnl is a somewhat involved operation. Devlink's
mnlg library has an implementation that is going to be handy for other
tools as well. Extract it into a new helper.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata dd78dfc7be lib: Extract from devlink/mnlg a helper, mnlu_msg_prepare()
Allocation of a new netlink message with the two usual headers is reusable
with other netlink netlink message types. Extract it into a helper,
mnlu_msg_prepare(). Take the second header as an argument, instead of
passing in parameters to initialize it, and copy it in.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 72858c7b77 lib: Extract from devlink/mnlg a helper, mnlu_socket_open()
This little dance of mnl_socket_open(), option setting, and bind, is the
same regardless of tool. Extract into a new module that should hold helpers
for working with libmnl, mnl_util.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 9091ff0251 lib: json_print: Add print_on_off()
The value of a number of booleans is shown as "on" and "off" in the plain
output, and as an actual boolean in JSON mode. Add a function that does
that.

RDMA tool already uses a function named print_on_off(). This function
always shows "on" and "off", even in JSON mode. Since there are probably
very few if any consumers of this interface at this point, migrate it to
the new central print_on_off() as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 82604d2852 lib: Add parse_one_of(), parse_on_off()
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.

Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".

Convert the macsec code to the new helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 1d9a81b8c9 Unify batch processing across tools
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Guillaume Nault 8682f588bf tc-mpls: fix manpage example and help message string
Manpage:
 * Remove the extra "and to ip packets" part from command description
   to make it more understandable.

 * Redirect packets to eth1, instead of eth0, as told in the
   description.

Help string:
 * "mpls pop" can be followed by a CONTROL keyword.

 * "mpls modify" can also set the MPLS_BOS field.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:28 -08:00
Guillaume Nault 7c7a0fe0c8 tc-vlan: fix help and error message strings
* "vlan pop" can be followed by a CONTROL keyword.

 * Add missing space in error message.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:18 -08:00
Stephen Hemminger 72f88bd42a uapi: update kernel headers from 5.10-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:47:27 -08:00
Stephen Hemminger b90c39be33 rdma: fix spelling error in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:44:19 -08:00
Stephen Hemminger c8424b73e1 man: fix spelling errors
Lots of little typo errors on man pages.
Found by running codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:40:30 -08:00
Stephen Hemminger cbf6481797 tc/m_gate: fix spelling errors
Fix spelling errors in error messages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:34:23 -08:00
Stephen Hemminger 14b189f066 uapi: updates from 5.10-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-03 08:29:53 -08:00
David Ahern 51f28eb928 Merge branch 'tc-terse-dump' into next
Vlad Buslov  says:

====================

Implement support for terse dump mode which provides only essential
classifier/action info (handle, stats, cookie, etc.). Use new
TCA_DUMP_FLAGS_TERSE flag to prevent copying of unnecessary data from
kernel.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:18:43 -06:00
Vlad Buslov 477ca0dfb4 tc: implement support for terse dump
Implement support for classifier/action terse dump using new TCA_DUMP_FLAGS
tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Set the flag when
user requested it with following example CLI (-br for 'brief'):

$ tc -s -br filter show dev ens1f0 ingress
filter protocol ip pref 49151 flower chain 0
filter protocol ip pref 49151 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter protocol ip pref 49152 flower chain 0
filter protocol ip pref 49152 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
filter and action (handle, cookie, etc.) and stats, if requested by the
user. The intention is to significantly improve rule dump rate by omitting
all static data that do not change after rule is created.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:15:15 -06:00
Vlad Buslov a99ebeeef2 tc: skip actions that don't have options attribute when printing
Modify implementations that return error from action_until->print_aopt()
callback to silently skip actions that don't have their corresponding
TCA_ACT_OPTIONS attribute set (some actions already behave like this).
Print action kind before returning from action_until->print_aopt()
callbacks. This is necessary to support terse dump mode in following patch
in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:14:01 -06:00
Johannes Berg 9fc5bf734f libnetlink: define __aligned conditionally
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.

Use our own definition only if there isn't one already.

Fixes: d5acae244f ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-28 10:24:02 -07:00
David Ahern eb12cc9ae1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:08:12 -06:00
Guillaume Nault f1298d7660 m_mpls: test the 'mac_push' action after 'modify'
Commit 02a261b5ba ("m_mpls: add mac_push action") added a matches()
test for the "mac_push" string before the test for "modify".
This changes the previous behaviour as 'action m' used to match
"modify" while it now matches "mac_push".

Revert to the original behaviour by moving the "mac_push" test after
"modify".

Fixes: 02a261b5ba ("m_mpls: add mac_push action")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:07:13 -06:00
David Ahern 2b7a768408 Merge branch 'tipc-encryption' into next
Tuong Lien  says:

====================

This series adds two new options in the 'iproute2/tipc' command, enabling users
to use the new TIPC encryption features, i.e. the master key and rekeying which
have been recently merged in kernel.

The help menu of the "tipc node set key" command is also updated accordingly:

 # tipc node set key --help
Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES]
       tipc node set key rekeying REKEYING

KEY
  Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form:
  [KEY: 16, 24 or 32 octets][SALT: 4 octets]

ALGNAME
  Cipher algorithm [default: "gcm(aes)"]

PROPERTIES
  master                - Set KEY as a cluster master key
  <empty>               - Set KEY as a cluster key
  nodeid NODEID         - Set KEY as a per-node key for own or peer

REKEYING
  INTERVAL              - Set rekeying interval (in minutes) [0: disable]
  now                   - Trigger one (first) rekeying immediately

EXAMPLES
  tipc node set key this_is_a_master_key master
  tipc node set key 0x746869735F69735F615F6B657931365F73616C74
  tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002
  tipc node set key rekeying 600

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:05:40 -06:00
Tuong Lien 2bf1ba5a5c tipc: add option to set rekeying for encryption
As supported in kernel, the TIPC encryption rekeying can be tuned using
the netlink attribute - 'TIPC_NLA_NODE_REKEYING'. Now we add the
'rekeying' option correspondingly to the 'tipc node set key' command so
that user will be able to perform that tuning:

tipc node set key rekeying REKEYING

where the 'REKEYING' value can be:

INTERVAL              - Set rekeying interval (in minutes) [0: disable]
now                   - Trigger one (first) rekeying immediately

For example:
$ tipc node set key rekeying 60
$ tipc node set key rekeying now

The command's help menu is also updated with these descriptions for the
new command option.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:45 -06:00
Tuong Lien 5fb3681885 tipc: add option to set master key for encryption
In addition to the support of master key in kernel, we add the 'master'
option to the 'tipc node set key' command for user to be able to
specify a key as master key during the key setting. This is carried out
by turning on the new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER'.
For example:

$ tipc node set key "this_is_a_master_key" master

The command's help menu is also updated to give a better description of
all the available options.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:37 -06:00
David Ahern b4edd6a8a6 Merge branch 'tc-mpls-l2-vpn' into next
Guillaume Nault  says:

====================

This patch series adds the possibility for TC to tunnel Ethernet frames
over MPLS.

Patch 1 allows adding or removing the Ethernet header.
Patch 2 allows pushing an MPLS LSE before the MAC header.

By combining these actions, it becomes possible to encapsulate an
entire Ethernet frame into MPLS, then add an outer Ethernet header
and send the resulting frame to the next hop.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:47 -06:00
Guillaume Nault 02a261b5ba m_mpls: add mac_push action
Add support for the new TCA_MPLS_ACT_MAC_PUSH action (kernel commit
a45294af9e96 ("net/sched: act_mpls: Add action to push MPLS LSE before
Ethernet header")). This action let TC push an MPLS header before the
MAC header of a frame.

Example (encapsulate all outgoing frames with label 20, then add an
outer Ethernet header):
 # tc filter add dev ethX matchall \
       action mpls mac_push label 20 ttl 64 \
       action vlan push_eth dst_mac 0a:00:00:00:00:02 \
                            src_mac 0a:00:00:00:00:01

This patch also adds an alias for ETH_P_TEB, since it is useful when
decapsulating MPLS packets that contain an Ethernet frame.

With MAC_PUSH, there's no previous Ethertype to modify. However, the
"protocol" option is still needed, because the kernel uses it to set
skb->protocol. So rename can_modify_ethtype() to can_set_ethtype().

Also add a test suite for m_mpls, which covers the new action and the
pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:08 -06:00
Guillaume Nault d61167dd88 m_vlan: add pop_eth and push_eth actions
Add support for the new TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH
actions (kernel commit 19fbcb36a39e ("net/sched: act_vlan:
Add {POP,PUSH}_ETH actions"). These action let TC remove or add the
Ethernet at the head of a frame.

Drop an Ethernet header:
 # tc filter add dev ethX matchall action vlan pop_eth

Push an Ethernet header (the original frame must have no MAC header):
 # tc filter add dev ethX matchall action vlan \
       push_eth dst_mac 0a:00:00:00:00:02 src_mac 0a:00:00:00:00:01

Also add a test suite for m_vlan, which covers these new actions and
the pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:36:38 -06:00
Jacob Keller 3342688a66 devlink: display elapsed time during flash update
For some devices, updating the flash can take significant time during
operations where no status can meaningfully be reported. This can be
somewhat confusing to a user who sees devlink appear to hang on the
terminal waiting for the device to update.

Recent changes to the kernel interface allow such long running commands
to provide a timeout value indicating some upper bound on how long the
relevant action could take.

Provide a ticking counter of the time elapsed since the previous status
message in order to make it clear that the program is not simply stuck.

Display this message whenever the status message from the kernel
indicates a timeout value. Additionally also display the message if
we've received no status for more than couple of seconds. If we elapse
more than the timeout provided by the status message, replace the
timeout display with "timeout reached".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-17 09:30:06 -06:00
Stephen Hemminger cb7ce51cc1 v5.9.0 2020-10-15 15:18:35 -07:00
zhangkaiheb@126.com 78ace1c211 tc: fq: clarify the length of orphan_mask.
Signed-off-by: kai zhang <zhangkaiheb@126.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-15 15:16:52 -07:00
Jan Engelhardt 0ca1312c20 ip: add error reporting when RTM_GETNSID failed
`ip addr` when run under qemu-user-riscv64, fails. This likely is due
to qemu-5.1 not doing translation of RTM_GETNSID calls. Aborting ip
completely is not helpful for the user however. This patch reworks
the error handling.

Before:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ffrtest:/ #

Afterwards:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
rtnl_send(RTM_GETNSID): Operation not supported. Continuing anyway.
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.147/28 brd 192.168.72.159 scope global host0
       valid_lft forever preferred_lft forever
    inet6 fe80::443f:2dff:fe88:3ddb/64 scope link
       valid_lft forever preferred_lft forever

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-12 08:10:25 -07:00
Dmitry Yakunin 58c3c55f38 lib: ignore invalid mounts in cg_init_map
In case of bad entries in /proc/mounts just skip cgroup cache initialization.
Cgroups in output will be shown as "unreachable:cgroup_id".

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 23:02:35 -07:00
Stephen Hemminger 003b9af516 uapi: add new SNMP entry
Update to snmp.h from 5.9

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 22:50:22 -07:00
David Ahern b5a583fb32 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:11:09 -06:00
Johannes Berg 7812012849 genl: ctrl: print op -> policy idx mapping
Newer kernels can dump per-op policies, so print out the new
mapping attribute to indicate which op has which policy.

v2:
 * print out both do/dump policy idx
v3:
 * fix userspace API which renumbered after patch rebasing

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:10:09 -06:00
David Ahern 91c54917cd Merge branch 'bridge-igmpv3-mldv2' into next
Nikolay Aleksandrov  says:

====================
This set adds support for IGMPv3/MLDv2 attributes, they're mostly
read-only at the moment. The only new "set" option is the source address
for S,G entries. It is added in patch 01 (see the patch commit message for
an example). Patch 02 shows a missing flag (fast_leave) for
completeness, then patch 03 shows the new IGMPv3/MLDv2 flags:
added_by_star_ex and blocked. Patches 04-06 show the new extra
information about the entry's state when IGMPv3/MLDv2 are enabled. That
includes its filter mode (include/exclude), source list with timers and
origin protocol (currently only static/kernel), in order to show the new
information the user must use "-d"/show_details.
Here's the output of a few IGMPv3 entries:
 dev bridge port ens12 grp 239.0.0.1 src 20.21.22.23 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 8.9.10.11 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 1.2.3.1 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 temp filter_mode exclude source_list 20.21.22.23/0.00,8.9.10.11/0.00,1.2.3.1/0.00 proto kernel    26.65

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:09:14 -06:00
Nikolay Aleksandrov 86588450c5 bridge: mdb: print protocol when available
Print the mdb entry's protocol (i.e. who added it)  when it's available if
the user requested to show details (-d). Currently the only possible
values are RTPROT_STATIC (user-space added) or RTPROT_KERNEL
(automatically added by kernel). The value is kernel controlled.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:50 -06:00
Nikolay Aleksandrov 2de81d1eff bridge: mdb: print source list when available
Print the mdb entry's source list when it's available if the user
requested to show details (-d). Each source has an associated timer
which controls if traffic should be forwarded to that S,G entry (if the
timer is non-zero traffic is forwarded, otherwise it's not).
Currently the source list is kernel controlled and can't be changed by
user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:45 -06:00
Nikolay Aleksandrov 1d28c48046 bridge: mdb: print filter mode when available
Print the mdb entry's filter mode when it's available if the user
requested to show details (-d). It can be either include or exclude.
Currently it's kernel controlled and can't be changed by user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:39 -06:00
Nikolay Aleksandrov e331677ea2 bridge: mdb: show igmpv3/mldv2 flags
With IGMPv3/MLDv2 support we have 2 new flags:
 - added_by_star_ex: set when the S,G entry was automatically created
                     because of a *,G entry in EXCLUDE mode
 - blocked: set when traffic for the S,G entry for that port has to be
            blocked
Both flags are used only on the new S,G entries and are currently kernel
managed, i.e. similar to other flags which can't be set from user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:34 -06:00
Nikolay Aleksandrov f94e8b0749 bridge: mdb: print fast_leave flag
We're not showing the fast_leave flag when it's set. Currently that can
be only when an mdb entry is being deleted due to fast leave, so it will
only affect mdb monitor.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:30 -06:00
Nikolay Aleksandrov 547b319762 bridge: mdb: add support for source address
This patch adds the user-space control and dump of mdb entry source
address. When setting the new MDBA_SET_ENTRY_ATTRS nested attribute is
used and inside is added MDBE_ATTR_SOURCE based on the address family.
When dumping we look for MDBA_MDB_EATTR_SOURCE and if present we add the
"src x.x.x.x" output. The source address will be always shown as it's
needed to match the entry to modify it from user-space.

Example:
 $ bridge mdb add dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100
 $ bridge mdb show
 dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:25 -06:00
David Ahern f905191a48 Update kernel headers
Update kernel headers to commit:
    bc081a693a56 ("Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:04:57 -06:00
Antony Antony 4322b13c8d ip xfrm: support setting XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute can be set in states (4.19+)
It is optional and the kernel default is 0xffffffff
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

e.g.
./ip/ip xfrm state add output-mark 0x6 mask 0xab proto esp \
 auth digest_null 0 enc cipher_null ''
ip xfrm state
src 0.0.0.0 dst 0.0.0.0
	proto esp spi 0x00000000 reqid 0 mode transport
	replay-window 0
	output-mark 0x6/0xab
	auth-trunc digest_null 0x30 0
	enc ecb(cipher_null)
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	sel src 0.0.0.0/0 dst 0.0.0.0/0

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:10:47 -06:00
Jiri Pirko 8dc1db80e4 devlink: Add health reporter test command support
Add health reporter test command and allow user to trigger a test event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:08:53 -06:00
Jacob Keller 012164718b devlink: support setting the overwrite mask attribute
The recently added DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK allows
userspace to indicate how a device should handle subsections of a flash
component when updating. For example, a flash component might contain
vital data such as PCIe serial number or configuration fields such as
settings that control device bootup.

The overwrite mask allows specifying whether the device should overwrite
these subsections when updating from the provided image. If nothing is
specified, then the update is expected to preserve all vital fields and
configuration.

Add support for specifying the overwrite mask using the new "overwrite"
option to the flash command line.

By specifying "overwrite identifiers", the user request that the flash
update should overwrite any settings in the updated flash component with
settings from the provided flash image

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite identifiers

By specifying "overwrite settings" the user requests that the flash update
should overwrite any settings in the updated flash component with setting
values from the provided flash image.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings

These options may be combined, in which case both subsections will be sent
in the overwrite mask, resulting in a request to overwrite all settings and
identifiers stored in the updated flash components.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings overwrite identifiers

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:02:16 -06:00
David Ahern 34be2d2619 Update kernel headers
Update kernel headers to commit:
    9faebeb2d800 ("Merge branch 'ethtool-allow-dumping-policies-to-user-space'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:01:26 -06:00
Stephen Hemminger be1bea8432 addr: Fix noprefixroute and autojoin for IPv4
These were reported as IPv6-only and ignored:

     # ip address add 192.0.2.2/24 dev dummy5 noprefixroute
     Warning: noprefixroute option can be set only for IPv6 addresses
     # ip address add 224.1.1.10/24 dev dummy5 autojoin
     Warning: autojoin option can be set only for IPv6 addresses

This enables them back for IPv4.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Adel Belhouane <bugs.a.b@free.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:15:56 -07:00
Eyal Birger e410c963e3 ipntable: add missing ndts_table_fulls ntable stat
Used for tracking neighbour table overflows.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:07:10 -07:00
Kamal Heib 10414de9e6 ip: iplink_ipoib.c: Remove extra spaces
Remove the extra space between the reported ipoib attrs - use only one
space instead of two.

Fixes: de0389935f ("iplink: Added support for the kernel IPoIB RTNL ops")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 22:29:05 -07:00
Ciara Loftus d2be31d9b6 ss: add support for xdp statistics
The patch exposes statistics for XDP sockets which can be useful for
debugging purposes.

The stats exposed are:
    rx dropped
    rx invalid
    rx queue full
    rx fill ring empty
    tx invalid
    tx ring empty

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:21:24 -06:00
David Ahern f481515c89 Update kernel headers
Update kernel headers to commit:
    280095713ce2 ("Merge branch 'ibmvnic-refactor-some-send-handle-functions'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:13:21 -06:00
Stephen Hemminger 03fb6fa1d8 uapi: update headers from 5.9-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:50:36 -07:00
Jan Engelhardt fece144abc build: avoid make jobserver warnings
I observe:

	» make -j8 CCOPTS=-ggdb3
	lib
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	make[1]: Nothing to be done for 'all'.
	ip
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	    CC       ipntable.o

MFLAGS is a historic variable of some kind; removing it fixes the
jobserver issue.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:49:24 -07:00
Jakub Kicinski b8663da049 ip: promote missed packets to the -s row
missed_packet_errors are much more commonly reported:

linux$ git grep -c '[.>]rx_missed_errors ' -- drivers/ | wc -l
64
linux$ git grep -c '[.>]rx_over_errors ' -- drivers/ | wc -l
37

Plus those drivers are generally more modern than those
using rx_over_errors.

Since recently merged kernel documentation makes this
preference official, let's make ip -s output more informative
and let rx_missed_errors take the place of rx_over_errors.

Before:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    6.04T      4.67G    0       0       0       67.7M
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       7
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

After:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped missed  mcast
    6.04T      4.67G    0       0       7       67.7M
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:23:29 -06:00
David Ahern cec67df974 Merge branch 'devlink-controller-external-info' into next
Parav Pandit  says:

====================

For certain devlink port flavours controller number and optionally external=
 attributes are reported by the kernel.

(a) controller number indicates that a given port belong to which local or =
external controller.
(b) external port attribute indicates that if a given port is for external =
or local controller.

This short series shows this attributes to user.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:17:48 -06:00
Parav Pandit 748cbad33b devlink: Show controller number of a devlink port
Show the controller number of the devlink port whenever kernel reports
it.

Example of a PCI VF port for an external controller number 1:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:09 -06:00
Parav Pandit 8fadd01101 devlink: Show external port attribute
If a port is for an external controller, port's external attribute is
set. Show such external attribute.

An example of an external controller port for PCI VF:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:04 -06:00
David Ahern 454429e8b4 Update kernel headers
Update kernel headers to commit:
    748d1c8a425e ("Merge branch 'devlink-Use-nla_policy-to-validate-range'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:10:43 -06:00
Roman Mashak aba44dc2ea ip: updated ip-link man page
Added description of link flags allmulticast, promisc and trailers.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-14 20:42:04 -07:00
Wei Wang ad34d5fadb iproute2: ss: add support to expose various inet sockopts
This commit adds support to expose the following inet socket options:
-- recverr
-- is_icsk
-- freebind
-- hdrincl
-- mc_loop
-- transparent
-- mc_all
-- nodefrag
-- bind_address_no_port
-- recverr_rfc4884
-- defer_connect
with the option --inet-sockopt. The individual option is only shown
when set.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:36:06 -06:00
David Ahern c8eb4b52c1 Update kernel headers
Update kernel headers to commit:
4349abdb409b ("net: dsa: don't print non-fatal MTU error if not supported")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:35:28 -06:00
Hoang Le abee772ff1 tipc: support 128bit node identity for peer removing
Problem:
In kernel upstream, we add the support to set node identity with
128bit. However, we are still using legacy format in command tipc
peer removing. Then, we got a problem when trying to remove
offline node i.e:

$ tipc node list
Node Identity                    Hash     State
d6babc1c1c6d                     1cbcd7ca down

$ tipc peer remove address d6babc1c1c6d
invalid network address, syntax: Z.C.N
error: No such device or address

Solution:
We add the support to remove a specific node down with 128bit
node identifier, as an alternative to legacy 32-bit node address.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 20:01:39 -06:00
Roopa Prabhu 6fd53b2a1c iplink: add support for protodown reason
This patch adds support for recently
added link IFLA_PROTO_DOWN_REASON attribute.
IFLA_PROTO_DOWN_REASON enumerates reasons
for the already existing IFLA_PROTO_DOWN link
attribute.

$ cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ ip link set dev vx10 protodown on protodown_reason vrrp on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "proto_down_reason": [ "vrrp" ]
} ]
$ip link set dev vx10 protodown_reason mlag on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <mlag,vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "protodown_reason": [ "mlag","vrrp" ]
} ]

$ip -p -j link show dev vx10
$ip link set dev vx10 protodown off protodown_reason vrrp off
Error: Cannot clear protodown, active reasons.
$ip link set dev vx10 protodown off protodown_reason mlag off
$

Note: for somereason the json and non-json key for protodown
are different (protodown and proto_down). I have kept the
same for protodown reason for consistency (protodown_reason and
proto_down_reason).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:52:13 -06:00
Antony Antony af27494d2e ip xfrm: support printing XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute is set in states (4.19+).
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

sample output: note the output-mark mask
ip xfrm state
	src 192.1.2.23 dst 192.1.3.33
	proto esp spi 0xSPISPI reqid REQID mode tunnel
	replay-window 32 flag af-unspec
	output-mark 0x3/0xffffff
	aead rfc4106(gcm(aes)) 0xENCAUTHKEY 128
	if_id 0x1

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:49:29 -06:00
David Ahern 275eed9be5 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:46:20 -06:00
Phil Sutter 23203b750e ip link: Fix indenting in help text
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.

While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Fixes: 5a3ec4ba64 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-31 12:32:26 -07:00
Johannes Berg cc889b8241 genl: ctrl: support dumping netlink policy
Support dumping the netlink policy of a given generic netlink
family, the policy (with any sub-policies if appropriate) is
exported by the kernel in a general fashion.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:14 -06:00
Johannes Berg d5acae244f libnetlink: add nl_print_policy() helper
This prints out the data from the given nested attribute
to the given FILE pointer, interpreting the firmware that
the kernel has for showing netlink policies.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:07 -06:00
Johannes Berg 784fa9f62f libnetlink: add rtattr_for_each_nested() iteration macro
This is useful for iterating elements in a nested attribute,
if they're not parsed with a strict length limit or such.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:34:29 -06:00
Murali Karicheri ea6aeeb90c ip: iplink: prp: update man page for new parameter
PRP support requires a proto parameter which is 0 for hsr and 1 for
prp. Default is hsr and is backward compatible.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Murali Karicheri 68f027724b iplink: hsr: add support for creating PRP device similar to HSR
This patch enhances the iplink command to add a proto parameters to
create PRP device/interface similar to HSR. Both protocols are
quite similar and requires a pair of Ethernet interfaces. So re-use
the existing HSR iplink command to create PRP device/interface as
well. Use proto parameter to differentiate the two protocols.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Amit Cohen 8e6bce735a devlink: Add fflush() in cmd_mon_show_cb()
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

Without it, stdout output is buffered and not written to the disk.

This is useful when writing scripts that rely on devlink-monitor output.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:13:11 -07:00
Sascha Hauer 7e7a1d107b iproute2: ip maddress: Check multiaddr length
ip maddress add|del takes a MAC address as argument, so insist on
getting a length of ETH_ALEN bytes. This makes sure the passed argument
is actually a MAC address and especially not an IPv4 address which
was previously accepted and silently taken as a MAC address.

While at it, do not print *argv in the error path as this has been
modified by ll_addr_a2n() and doesn't contain the full string anymore,
which can lead to misleading error messages.

Also while at it, replace the hardcoded buffer size with the actual
buffer size using sizeof().

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:12:30 -07:00
Stephen Hemminger bf538de59d uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 16:09:52 -07:00
Leon Romanovsky db6d6becb0 rdma: Properly print device and link names in CLI output
The citied commit broke the CLI output and printed ifindex/ifname
instead of dev/link.

Before:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/lqpn 1 type GSI state RTS sq-psn 0 comm ib_core
[leonro@vm ~]$ rdma res show cq
ifindex 0 ifname rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm ib_core

After:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/- lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
[leonro@vm ~]$ rdma res show cq
dev rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm [ib_core]

It was missed because rdmatool mostly used in JSON mode.

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Leon Romanovsky 7ded3c97b9 rdma: Fix owner name for the kernel resources
Owner of kernel resources is printed in different format than user
resources to easy with the reader by simply looking on the name.
The kernel owner will have "[ ]" around the name.

Before this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm ib_core

After this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm [ib_core]

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Stephen Hemminger 52d767aff8 uapi: update kernel headers
pre-rc1 version of Linux kernel headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-11 13:18:41 -07:00
Mark Zhang e8e8f16ed1 rdma: Document the new "pid" criteria for auto mode
Document the new supported criteria of auto mode. Examples:
$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto pid,type on

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:12 +00:00
Mark Zhang e28133316d rdma: Add "PID" criteria support for statistic counter auto mode
With this new criteria, QPs have different PIDs will be bound to
different counters in auto mode. This can be used in combination with
other criteria like "type". Examples:

$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto type,pid on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp show link mlx5_0 qp-type UD

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:04 +00:00
Mark Zhang cb69794736 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 76251e15ea73
("RDMA/counter: Add PID category support in auto mode")

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:25:04 +00:00
David Ahern e572e3af0d Merge branch 'main' into next
Conflicts:
	bridge/fdb.c
	man/man8/bridge.8

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:21:35 +00:00
Stephen Hemminger 53159d8115 v5.8.0 2020-08-03 10:03:42 -07:00
Stephen Hemminger d530608d33 lnstat: use same version as iproute2
Lnstat was trying to be different and have its own version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Stephen Hemminger fbef655568 replace SNAPSHOT with auto-generated version string
Replace the iproute2 snapshot with a version string which is
autogenerated as part of the build process using git describe.

This will also allow seeing if the version of the command
is built from the same sources is as upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Vasundhara Volam 7332b188a6 devlink: Add board.serial_number to info subcommand.
Add support for reading board serial_number to devlink info
subcommand. Example:

$ devlink dev info pci/0000:af:00.0 -jp
{
    "info": {
        "pci/0000:af:00.0": {
            "driver": "bnxt_en",
            "serial_number": "00-10-18-FF-FE-AD-1A-00",
            "board.serial_number": "433551F+172300000",
            "versions": {
                "fixed": {
                    "board.id": "7339763 Rev 0.",
                    "asic.id": "16D7",
                    "asic.rev": "1"
                },
                "running": {
                    "fw": "216.1.216.0",
                    "fw.psid": "0.0.0",
                    "fw.mgmt": "216.1.192.0",
                    "fw.mgmt.api": "1.10.1",
                    "fw.ncsi": "0.0.0.0",
                    "fw.roce": "216.1.16.0"
                }
            }
        }
    }
}

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 15:35:56 +00:00
Petr Vaněk a7f1974f6e ip-xfrm: add support for oseq-may-wrap extra flag
This flag allows to create SA where sequence number can cycle in
outbound packets if set.

Signed-off-by: Petr Vaněk <pv@excello.cz>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:57:25 +00:00
David Ahern 91922a4121 Update kernel headers
Update kernel headers to commit:
    bd0b33b24897 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:56:28 +00:00
Danielle Ratson e5f4165a9e devlink: Expose port split ability
Add a new attribute that indicates the port split ability to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1
splittable false lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:37:14 +00:00
Danielle Ratson fcbc6c1c71 devlink: Expose number of port lanes
Add a new attribute that indicates the port's number of lanes to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1 lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:36:52 +00:00
Julien Fortin cb17e0cc57 bridge: fdb show: fix fdb entry state output for json context
bridge json fdb show is printing an incorrect / non-machine readable
value, when using -j (json output) we are expecting machine readable
data that shouldn't require special handling/parsing.

$ bridge -j fdb show | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
	"master": "br0",
	"mac": "56:23:28:4f:4f:e5",
	"flags": [],
	"ifname": "vx0",
	"state": "state=0x80"  <<<<<<<<< with the patch: "state": "0x80"
    }
]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-29 18:08:46 -07:00
Briana Oursler 41a9e38469 tc: Add space after format specifier
Add space after format specifier in print_string call. Fixes broken
qdisc tests within tdc testing suite. Per suggestion from Petr Machata,
remove a space and change spacing in tc/q_event.c to complete the fix.

Tested fix in tdc using:
./tdc.py -c qdisc

All qdisc RED tests return ok.

Fixes: d0e450438571("tc: q_red: Add support for qevents "mark" and "early_drop")
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-29 17:03:46 +00:00
Anton Danilov 65c0c4d21b bridge: fdb: the 'dynamic' option in the show/get commands
In most of cases a user wants to see only the dynamic mac addresses
in the fdb output. But currently the 'fdb show' displays tons of
various self entries, those only waste the output without any useful
goal.

New option 'dynamic' for 'show' and 'get' commands forces display
only relevant records.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:41:39 -07:00
Matthieu Baerts 3a53ff7e58 mptcp: show all endpoints when no ID is specified
According to 'ip mptcp help', 'endpoint show' can accept no argument:

  ip mptcp endpoint show [ id ID ]

It makes sense to print all endpoints when no filter is used.

So here if the following command is used, all endpoints are printed:

  ip mptcp endpoint show

Same as:

  ip mptcp endpoint

Fixes: 7e0767cd ("add support for mptcp netlink interface")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:39:58 -07:00
David Ahern 1ca65af1c5 Merge branch 'devlink-port-health' into next
Moshe Shemesh  says:

====================

Implement commands for interaction with per-port devlink health
reporters. To do this, adapt devlink-health for usage of port handles
with any existing devlink-health subcommands. Add devlink-port health
subcommand as an alias for devlink-health.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:34:07 +00:00
Vladyslav Tarasiuk 1fe8c44bd9 devlink: Update devlink-health and devlink-port manpages
Describe support for per-port reporters in devlink-health and
devlink-port commands.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:37 +00:00
Vladyslav Tarasiuk 211c8d6ca9 devlink: Add devlink port health command
Add devlink port health show subcommand which displays information about
specified port reporter or all present port reporters as in the example.
Device and port reporters can be distinguished by a handle being used.

Make other devlink-health subcommands be aliased by devlink port health.
Refactor devlink-health commands for usage of port handles in order to
interact with port reporters.

Change devlink health show output to dump information about both device
and port reporters with correct handles.

Example:
$ devlink health show
pci/0000:00:0b.0:
  reporter fw
    state healthy error 0 recover 0 auto_dump true
  reporter fw_fatal
    state healthy error 0 recover 0 grace_period 1200000 auto_recover true auto_dump true
pci/0000:00:0b.0/1:
  reporter tx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink health show pci/0000:00:0b.0/1 reporter rx
Which is equivalent to:
$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink port health show pci/0000:00:0b.0/1 reporter rx -j --pretty
{
    "health": {
         "pci/0000:00:0b.0/1": [ {
                 "reporter": "rx",
                 "state": "healthy",
                 "error": 0,
                 "recover": 0,
                 "grace_period": 500,
                 "auto_recover": true,
                 "auto_dump": true
              } ]
    }
}

$ devlink health set pci/0000:00:0b.0/1 reporter rx grace_period 5000
Which is equivalent to:
$ devlink port health set pci/0000:00:0b.0/1 reporter rx grace_period 5000

$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 5000 auto_recover true auto_dump true

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:32 +00:00
Vladyslav Tarasiuk e533faa72e devlink: Add a possibility to print arrays of devlink port handles
Add a capability of printing port handles for arrays in non-JSON format
in devlink-health manner.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:26 +00:00
Stephen Hemminger 848b1b8e04 uapi: update bpf.h
Upstrean 5.8-rc6 changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-21 09:18:15 -07:00
Guillaume Nault 4735df15a2 testsuite: Add tests for bareudp tunnels
Test the plain MPLS (unicast and multicast) and IP (v4 and v6) modes.
Also test the multiproto option for MPLS and for IP.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:30:55 -07:00
Anton Danilov 8f5a602f7a misc: make the pattern matching case-insensitive
To improve the usability better use case-insensitive pattern-matching
in ifstat, nstat and ss tools.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:29:55 -07:00
Jamie Gloudon 66702fb9ba tc/m_estimator: Print proper value for estimator interval in raw.
While looking at the estimator code, I noticed an incorrect interval
number printed in raw for the handles. This patch fixes the formatting.

Before patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=4294967294 e=2]

After patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=-2 e=2]

Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:25:56 -07:00
David Ahern 7b6361bf61 Merge branch 'tc-qevent-block' into next
Petr Machata  says:

====================

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well:

    # ip link add up type dummy
    # tc qdisc add dev dummy1 root handle 1: \
         red min 30000 max 60000 avpkt 1000 qevent early_drop block 100
    # tc filter add block 100 pref 1234 handle 102 matchall action drop
    # tc filter show block 100
    Cannot find block "100"

This patchset fixes this issue:

    # tc filter show block 100
    filter protocol all pref 1234 matchall chain 0
    filter protocol all pref 1234 matchall chain 0 handle 0x66
      not_in_hw
            action order 1: gact action drop
             random type none pass val 0
             index 2 ref 1 bind 1

In patch #1, the helpers and necessary infrastructure is introduced,
including a new qdisc_util callback that implements sniffing out bound
blocks in a given qdisc.

In patch #2, RED implements the new callback.

v3:
- Patch #1:
    - Do not pass &ctx->found directly to has_block. Do it through a
      helper variable, so that the callee does not overwrite the result
      already stored in ctx->found.

v2:
- Patch #1:
    - In tc_qdisc_block_exists_cb(), do not initialize 'q'.
    - Propagate upwards errors from q->has_block.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:36:41 +00:00
Petr Machata 02dce2fdce tc: q_red: Implement has_block for RED
In order for "tc filter show block X" to find a given block, implement the
has_block callback.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:49 +00:00
Petr Machata af0e036c09 tc: Look for blocks in qevents
When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well.

In order to support that, extend struct qdisc_util with a new callback,
has_block. That should report whether, give the attributes in TCA_OPTIONS,
a blocks with a given number is bound to a qevent. In
tc_qdisc_block_exists_cb(), invoke that callback when set.

Add a helper to the tc_qevent module that walks the list of qevents and
looks for a given block. This is meant to be used by the individual qdiscs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:02 +00:00
Paolo Abeni 9c3be2c0ee ss: mptcp: add msk diag interface support
This implement support for MPTCP sockets type, comprising
extended socket info. Note that we need to add an extended
attribute carrying the actual protocol number to the diag
request.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:57:36 +00:00
David Ahern beaf281cff Update kernel headers
Update kernel headers to commit:
    81adcd65b685 ("ksz884x: switch from 'pci_' to 'dma_' API")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:56:53 +00:00
David Ahern b78c480532 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:52:43 +00:00
Eyal Birger f33a871b80 ip xfrm: policy: support policies with IF_ID in get/delete/deleteall
The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).

Add support for getting/deleting policies with this attribute.

For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Eyal Birger ee93c1107f ip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies
In commit aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
I added the ability to set/print the xfrm interface ID without updating
the man page.

Fixes: aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Hoang Huu Le ca75a86337 tipc: fixed a compile warning in tipc/link.c
Fixes: 5027f233e3 ("tipc: add link broadcast get")
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:43:32 -07:00
Julien Fortin 8fc09aff8d bridge: fdb get: add missing json init (new_json_obj)
'bridge fdb get' has json support but the json object is never initialized

before patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0
56:23:28:4f:4f:e5 dev vx0 master br0 permanent
$

after patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0 | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
        "master": "br0",
        "mac": "56:23:28:4f:4f:e5",
        "flags": [],
        "ifname": "vx0",
        "state": "permanent"
    }
]
$

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:41:42 -07:00
Tony Ambardar 650591a7a7 configure: support ipset version 7 with kernel version 5
The configure script checks for ipset v6 availability but doesn't test
for v7, which is backward compatible and used on kernel v5.x systems.
Update the script to test for both ipset versions. Without this change,
the tc ematch function em_ipset will be disabled.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:48:02 -07:00
Andrea Claudi a8d6f51c84 ip address: remove useless include
utils.h is included two times in ipaddress.c, there is no need for that.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:47:28 -07:00
Stephen Hemminger 0689785782 genl: use <> for system includes
Be consistent about local versus system headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:41:24 -07:00
Stephen Hemminger a12b203c78 rtacct: drop unused header 2020-07-08 08:40:20 -07:00
Stephen Hemminger d44bcd2fbf iplink_bareudp: use common include syntax
Follow the precedent of other parts of iproute2 follow the example of:
  Standard libc headers
  Linux headers

  Iproute2 support headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:38:58 -07:00
Louis Peens 7c8d7848c7 devlink: add 'disk' to 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:57 -07:00
Ido Schimmel 2d4c3f65e2 devlink: Document zero policer identifier
When setting a policer to a trap group, a value of "0" will unbind the
currently bound policer from the group.

The behavior is intentional and tested in kernel selftests, so document
it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:24 -07:00
Guillaume Nault eb09a15c12 tc: flower: support multiple MPLS LSE match
Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:12:43 -07:00
Guillaume Nault a6c5c952ab ip link: initial support for bareudp devices
Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:11:05 -07:00
Dmitry Yakunin 8f1cd119b3 lib: fix checking of returned file handle size for cgroup
Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:05:54 -07:00
Sorah Fukumori 9e5d246877 ip fou: respect preferred_family for IPv6
ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:03:09 -07:00
Anton Danilov d80a05b795 tc: improve the qdisc show command
Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Stephen Hemminger 085622b1f5 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Bjarni Ingi Gislason 860a5d12d5 devlint-health.8: use a single-font macro for a single argument
Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:47 -07:00
Bjarni Ingi Gislason f9bc806c9d devlink-dev.8: use a single-font macro for one argument
Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:38 -07:00
Bjarni Ingi Gislason 472fb39d55 devlink.8: Use a single-font macro for a single argument
Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:34 -07:00
Bjarni Ingi Gislason 57cfcc62af man8/bridge.8: fix misuse of two-fonts macros
Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:28 -07:00
Bjarni Ingi Gislason 2df0dc2437 libnetlink.3: display section numbers in roman font, not boldface
Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-07-06 10:46:23 -07:00
David Ahern a5c9d01c5c Merge branch 'rdma-raw-format-dumps' into next
Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:49 +00:00
Maor Gottlieb e2bbf737e6 rdma: Add support to get MR in raw format
Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:37 +00:00
Maor Gottlieb 94323e9611 rdma: Add support to get CQ in raw format
Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:33 +00:00
Maor Gottlieb 7c01e0fc9c rdma: Add support to get QP in raw format
Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:00 +00:00
Maor Gottlieb 8f23492823 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:10:50 +00:00
David Ahern 79ea01927c Merge branch 'tc-qevents' into next
Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:45:48 +00:00
Petr Machata d0e4504385 tc: q_red: Add support for qevents "mark" and "early_drop"
The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:49 +00:00
Petr Machata 3cf51fb3c8 man: tc: Describe qevents
Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:45 +00:00
Petr Machata 01bb0bcd00 tc: Add helpers to support qevent handling
Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:27 +00:00
Po Liu bc4d9f982f action police: make 'mtu' could be set independently in police action
Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:34:04 +00:00
Po Liu 3c5570706b action police: change the print message quotes style
Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:33:59 +00:00
Alexandre Cassen 30f3beea0d add support to keepalived rtm_protocol
Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:03:45 +00:00
David Ahern 482e463d6c Merge branch 'devlink-port-mac-addr' into next
Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:53 +00:00
Parav Pandit 4dca81e9a8 devlink: Support setting port function hardware address
Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:32 +00:00
Parav Pandit b3adafd154 devlink: Support querying hardware address of port function
Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:22 +00:00
Parav Pandit 2de449df19 devlink: Move devlink port code at start to reuse
To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:48:34 +00:00
David Ahern e17466e484 Update kernel headers
Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:33:15 +00:00
Stephen Hemminger 2f31d12a25 man/tc: remove obsolete reference to ipchains
It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-24 12:13:46 -07:00
Roi Dayan 473d18e219 ip address: Fix loop initial declarations are only allowed in C99
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 15:05:20 -07:00
Stephen Hemminger 3d66d83d25 uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:52:38 -07:00
Ido Schimmel abda1e9d2b devlink: Add 'mirror' trap action
Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Ido Schimmel fd71244a20 devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Stephen Hemminger 12fafa27c7 devlink: update include files
Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:49:46 -07:00
Roopa Prabhu 468f787f64 bridge: support for nexthop id in fdb entries
This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:58 +00:00
Roopa Prabhu a56d17463c ipnexthop: support for fdb nexthops
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:29 +00:00
David Ahern 5f6f17db3b Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-08 14:40:54 +00:00
Stephen Hemminger e4932ae6b3 uapi: update headers
Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-05 08:36:54 -07:00
Stephen Hemminger 0a5dbbeddb Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2020-06-05 08:33:29 -07:00
Stephen Hemminger 1bfa3b3f66 v5.7.0 2020-06-02 20:35:00 -07:00
Donald Sharp 2c78aba2fb nexthop: Fix Deletion display
Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:08:46 -07:00
Stephen Hemminger 6facadcfb6 uapi: fix comment in xfrm.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:07:02 -07:00
Ian K. Coolidge 5413a735a6 iproute2: ip addr: Add support for setting 'optimistic'
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:33 +00:00
Ian K. Coolidge 9d59c86e57 iproute2: ip addr: Organize flag properties structurally
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:22 +00:00
Roman Mashak bd4b8c632e tc: report time an action was first used
Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 22:51:19 +00:00
Andrea Claudi 354efaec38 bpf: Fixes a snprintf truncation warning
gcc v9.3.1 reports:

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
Andrea Claudi 358abfe004 Revert "bpf: replace snprintf with asprintf when dealing with long buffers"
This reverts commit c0325b0638.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b0638 is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b0638 ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
David Ahern e50290e687 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:08:27 +00:00
Tuong Lien 9a25abde3a tipc: enable printing of broadcast rcv link stats
This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:07:22 +00:00
Alexander Aring 9f91f1b7b8 lwtunnel: add support for rpl segment routing
This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 00:03:17 +00:00
Roman Mashak db35e411ec tc: action: fix time values output in JSON format
Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd6255 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 21:19:04 -07:00
Stephen Hemminger 1c7aa12104 uapi: update to bpf.h
Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:31:54 -07:00
Eric Dumazet d7c67a6ed4 utils: remove trailing zeros in print_time() and print_time64()
Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:30 -07:00
Paul Blakey 924c43778a man: tc-ct.8: Add manual page for ct tc action
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:24 -07:00
Maciej Fijalkowski 42796dcd36 tc: mqprio: reject queues count/offset pair count higher than num_tc
Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-18 14:57:15 +00:00
Dmitry Yakunin 7bd9188581 ss: add checks for bc filter support
As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:38 +00:00
Dmitry Yakunin 14f4bda590 ss: add support for cgroup v2 information and filtering
This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:35 +00:00
Dmitry Yakunin d5e6ee0dac ss: introduce cgroup2 cache and helper functions
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:04 +00:00
Po Liu 965a5f6a1b iproute2-next: add gate action man page
This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:20:12 +00:00
Po Liu 07d5ee70b5 iproute2-next:tc:action: add a gate control action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:19:46 +00:00
David Ahern 0e9b227e2d Update kernel headers and import tc_gate.h
Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:18:15 +00:00
Eric Dumazet 0ecb90b33c tc: fq: fix two issues
My latest patch missed the fact that this file got JSON support.

Also fixes a spelling error added during JSON change.

Fixes: be9ca9d541 ("tc: fq: add timer_slack parameter")
Fixes: d15e2bfc04 ("tc: fq: add support for JSON output")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:27:26 -07:00
Stephen Hemminger 8142c76232 ss: update to bw print
Display kilobit with the standard suffix.
Add comment to describe where data rate suffixes come from.
Add support for terrabit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:18:58 -07:00
Jakub Kicinski ec04b6fc24 devlink: support kernel-side snapshot id allocation
Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 17:10:27 +00:00
Eric Dumazet e133fa9c73 ss: add support for Gbit speeds in sprint_bw()
Also use 'g' specifier instead of 'f' to remove trailing zeros,
and increase precision.

Examples of output :
 Before        After
 8.0Kbps       8Kbps
 9.9Mbps       9.92Mbps
 55001Mbps     55Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 09:50:22 -07:00
David Ahern 8c109059b5 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:49:38 +00:00
David Ahern c1b21f5286 Import rpl.h and rpl_iptunnel.h uapi headers
Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:23:14 +00:00
Davide Caratti 3175bca718 tc: full JSON support for 'bpf' filter
example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:19:06 +00:00
David Ahern ae57e82da0 Update kernel headers
Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:11:22 +00:00
Benjamin Poirier 0501fe734f Replace open-coded instances of print_nl()
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier e0c457b1a5 bridge: Align output columns
Use fixed column widths to improve readability.

Before:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
vx0      2       2
         1010-1020       1010-1020
         1030    65556
vx-longname      2       2

After:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port              vlan-id    tunnel-id
vx0               2          2
                  1010-1020  1010-1020
                  1030       65556
vx-longname       2          2

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 5a07a5df5a json_print: Return number of characters printed
When outputting in normal mode, forward the return value from
color_fprintf().

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier b262a9becb bridge: Fix output with empty vlan lists
Consider this configuration:

ip link add br0 type bridge
ip link add vx0 type vxlan dstport 4789 external
ip link set dev vx0 master br0
bridge vlan del vid 1 dev vx0
ip link add vx1 type vxlan dstport 4790 external
ip link set dev vx1 master br0

	root@vsid:/src/iproute2# ./bridge/bridge vlan
	port    vlan-id
	br0      1 PVID Egress Untagged

	vx0     None
	vx1      1 PVID Egress Untagged

	root@vsid:/src/iproute2#

Note the useless and inconsistent empty lines.

	root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
	port    vlan-id tunnel-id
	br0
	vx0     None
	vx1

What's the difference between "None" and ""?

	root@vsid:/src/iproute2# ./bridge/bridge -j -p vlan tunnelshow
	[ {
		"ifname": "br0",
		"tunnels": [ ]
	    },{
		"ifname": "vx1",
		"tunnels": [ ]
	    } ]

Why does vx0 appear in normal output and not json output?
Why output an empty list for br0 and vx1?

Fix these inconsistencies and avoid outputting entries with no values. This
makes the behavior consistent with other iproute2 commands, for example
`ip -6 addr`: if an interface doesn't have any ipv6 addresses, it is not
part of the listing.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 91b1b49ed3 bridge: Fix typo
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 594b2d7799 bridge: Use consistent column names in vlan output
Fix singular vs plural. Add a hyphen to clarify that each of those are
single fields.

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Xin Long 4e578c78fe tc: f_flower: add options support for erspan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:27 +00:00
Xin Long 93c8d5f72f tc: f_flower: add options support for vxlan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:22 +00:00
Xin Long 668fd9b25d tc: m_tunnel_key: add options support for erpsan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:18 +00:00
Xin Long f72c3ad00f tc: m_tunnel_key: add options support for vxlan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:14 +00:00
Xin Long 39fa047938 iproute_lwtunnel: add options support for erspan metadata
This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:09 +00:00
Xin Long b1bc0f3892 iproute_lwtunnel: add options support for vxlan metadata
This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:03 +00:00
Xin Long ca7614d4c6 iproute_lwtunnel: add options support for geneve metadata
This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.

Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.

With this patch, users can add and dump geneve options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up; ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
  # ip -n b addr add 1.1.1.1/24 dev geneve1
  # ip -n b link set geneve1 up
  # ip -n b route add 2.1.1.0/24 dev geneve1
  # ip -n a link add geneve1 type geneve external
  # ip -n a addr add 2.1.1.1/24 dev geneve1
  # ip -n a link set geneve1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
    1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
    dst 10.1.0.2 dev geneve1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print class and type as uint and print data as hex string.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:31:58 +00:00
Jacob Keller 7ae84fedcb devlink: add support for DEVLINK_CMD_REGION_NEW
Add support to request that a new snapshot be taken immediately for
a devlink region. To avoid confusion, the desired snapshot id must be
provided.

Note that if a region does not support snapshots on demand, the kernel
will reject the request with -EOPNOTSUP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:31:55 -07:00
Stephen Hemminger 2b93f66863 uapi: update bpf.h
Minor spelling in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:30:48 -07:00
Petr Machata 081d6c310d tc: pedit: Support JSON dumping
The action pedit does not currently support dumping to JSON. Convert
print_pedit() to the print_* family of functions so that dumping is correct
both in plain and JSON mode. In plain mode, the output is character for
character the same as it was before. In JSON mode, this is an example dump:

$ tc filter add dev dummy0 ingress prio 125 flower \
         action pedit ex munge udp dport set 12345 \
	                 munge ip ttl add 1        \
			 munge offset 10 u8 clear
$ tc -j filter show dev dummy0 ingress | jq
[
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0
  },
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0,
    "options": {
      "handle": 1,
      "keys": {},
      "not_in_hw": true,
      "actions": [
        {
          "order": 1,
          "kind": "pedit",
          "control_action": {
            "type": "pass"
          },
          "nkeys": 3,
          "index": 1,
          "ref": 1,
          "bind": 1,
          "keys": [
            {
              "htype": "udp",
              "offset": 0,
              "cmd": "set",
              "val": "3039",
              "mask": "ffff0000"
            },
            {
              "htype": "ipv4",
              "offset": 8,
              "cmd": "add",
              "val": "1000000",
              "mask": "ffffff"
            },
            {
              "htype": "network",
              "offset": 8,
              "cmd": "set",
              "val": "0",
              "mask": "ffff00ff"
            }
          ]
        }
      ]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:43:23 +00:00
William Tu 846b6b2da8 erspan: Add type I version 0 support.
The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.

To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
	local 172.16.1.100 remote 172.16.1.200 \
	erspan_ver 0

CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:40:10 +00:00
Paolo Abeni 0c42c6b130 man: ip.8: add reference to mptcp man-page
While at it, additionally fix a mandoc warning in mptcp.8

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 17:36:14 +00:00
David Ahern d38f2a10dd Merge branch 'mptcp' into next
Paolo Abeni  says:

====================

This introduces support for the MPTCP PM netlink interface, allowing admins
to configure several aspects of the MPTCP path manager. The subcommand is
documented with a newly added man-page.

This series also includes support for MPTCP subflow diag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:50:25 +00:00
Paolo Abeni 2d8b5fe93e man: mptcp man page
describe the mptcp subcommands implemented so far.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:47:45 +00:00
Davide Caratti 712fdd98c0 ss: allow dumping MPTCP subflow information
[root@f31 packetdrill]# ss -tni

 ESTAB    0        0           192.168.82.247:8080           192.0.2.1:35273
          cubic wscale:7,8 [...] tcp-ulp-mptcp flags:Mec token:0000(id:0)/5f856c60(id:0) seq:b810457db34209a5 sfseq:1 ssnoff:0 maplen:190

Additionally extends ss manpage to describe the new entry layout.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:44:55 +00:00
Paolo Abeni 7e0767cd86 add support for mptcp netlink interface
Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits

Examples:
1. Allows multiple subflows per MPTCP connection
   $ ip mptcp limits set subflows 2

2. Accept ADD_ADDR announcement from the peer (server):
   $ ip mptcp limits set add_addr_accepted 2

3. Add a ipv4 address to be annunced for backup subflows:
   $ ip mptcp endpoint add 10.99.1.2 signal backup

4. Add an ipv6 address used as source for additional subflows:
   $ ip mptcp endpoint add 2001::2 subflow

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:43:18 +00:00
David Ahern 02ade5a8ea Update kernel headers and import mptcp.h
Update kernel headers to commit
    790ab249b55d ("net: ethernet: fec: Prevent MII event after MII_SPEED write")

and import mptcp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:41:39 +00:00
Eric Dumazet be9ca9d541 tc: fq: add timer_slack parameter
Commit 583396f4ca4d ("net_sched: sch_fq: enable use of hrtimer slack")
added TCA_FQ_TIMER_SLACK parameter, with a default value of 10 usec.

Add the corresponding tc support to get/set this tunable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Eric Dumazet 7868f802e2 tc: fq_codel: add drop_batch parameter
Commit 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
added the new TCA_FQ_CODEL_DROP_BATCH_SIZE parameter, set by default to 64.

Add to tc command the ability to get/set the drop_batch

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Xin Long d27fc6390c xfrm: also check for ipv6 state in xfrm_state_keep
As commit f9d696cf41 ("xfrm: not try to delete ipcomp states when using
deleteall") does, this patch is to fix the same issue for ip6 state where
xsinfo->id.proto == IPPROTO_IPV6.

  # ip xfrm state add src 2000::1 dst 2000::2 spi 0x1000 \
    proto comp comp deflate mode tunnel sel src 2000::1 dst \
    2000::2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

Note that the xsinfo->proto in common states can never be IPPROTO_IPV6.

Fixes: f9d696cf41 ("xfrm: not try to delete ipcomp states when using deleteall")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:37 -07:00
Jiri Pirko 0149dabf2a tc: m_action: check cookie hex string len
Check the cookie hex string len is dividable by 2 as the valid hex
string always should be.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:27 -07:00
David Ahern 60f1075c21 Merge branch 'macsec-offload' into next
Igor Russkikh  says:

====================

From: Mark Starovoytov <mstarovoitov@marvell.com>

This series adds support for selecting the offloading mode of a MACsec
interface at link creation time.
Available modes are for now 'off', 'phy' and 'mac', 'off' being the default
when an interface is created.

First patch adds support for MAC offloading.

Last patch allows a user to change the offloading mode at runtime
through a new attribute, `ip link add link ... offload`:

  # ip link add link enp1s0 type macsec encrypt on offload off
  # ip link add link enp1s0 type macsec encrypt on offload phy
  # ip link add link enp1s0 type macsec encrypt on offload mac

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:20 +00:00
Mark Starovoytov bcbeb35ca4 macsec: add support for specifying offload at link add time
This patch adds support for configuring offload mode upon MACsec
device creation.

If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).

Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:03 +00:00
Mark Starovoytov 998534c99e macsec: add support for MAC offload
This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:31:37 +00:00
Stephen Hemminger b831c5ffcc bridge: man page spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:48:57 -07:00
Bastien Roucariès 8d5d91fd58 State of bridge STP port are now case insensitive
Improve use experience

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 498883a00f Document root_block option
Root_block is also called root port guard, document it.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 19bbebc459 Better documentation of BDPU guard
Document that guard disable the port and how to reenable it

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 420febf961 Document BPDU filter option
Disabled state is also BPDU filter

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 1cad8f8d78 Improve hairpin mode description
Mention VEPA and reflective relay.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 706f7d35e2 Better documentation of mcast_to_unicast option
This option is useful for Wifi bridge but need some tweak.

Document it from kernel patches documentation

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Brian Norris 8b9d5728c1 man: replace $(NETNS_ETC_DIR) and $(NETNS_RUN_DIR) in ip-netns(8)
These can be configured to different paths. Reflect that in the
generated documentation.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Brian Norris 48e05899d0 man: add ip-netns(8) as generation target
Prepare for adding new variable substitutions. Unify the sed rules while
we're at it, since there's no need to write this out 4 times.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Benjamin Lee f03ad792f3 tc: fq_codel: fix class stat deficit is signed int
The fq_codel class stat deficit is a signed int.  This is a regression
from when JSON output was added.

Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:34:56 -07:00
Odin Ugedal 14d2df8874 q_cake: properly print memlimit
Load memlimit so that it will be printed if it isn't set to zero.

Also add a space to properly print it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal 6f883f168c q_cake: Make fwmark uint instead of int
This will help avoid overflow, since setting it to 0xffffffff would
result in -1 when converted to integer, resulting in being "-1", setting
the fwmark to 0x00.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal e07c57e94e tc_util: detect overflow in get_size
This detects overflow during parsing of value using get_size:

eg. running:

$ tc qdisc add dev lo root cake memlimit 11gb

currently gives a memlimit of "3072Mb", while with this patch it errors
with 'illegal value for "memlimit": "11gb"', since memlinit is an
unsigned integer.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:31:01 -07:00
Eran Ben Elisha 4aa0c9c9f8 devlink: Add devlink health auto_dump command support
Add support for configuring auto_dump attribute per reporter.
With this attribute, one can indicate whether the devlink kernel core
should execute automatic dump on error.

The change will be reflected in show, set and man commands.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:27:13 +00:00
David Ahern 59ba1dd011 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:26:27 +00:00
Benjamin Lee fe821d64e6 man: tc-htb.8: fix class prio is not mandatory
Fix description for htb class prio parameter to indicate it's not
mandatory.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee 6ecd0198c0 man: tc-htb.8: add missing class parameter quantum
Add description for htb class parameter quantum.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee d8d59421b6 man: tc-htb.8: add missing qdisc parameter r2q
Add description for htb qdisc parameter r2q.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Petr Machata 20927e0525 ip: link_gre: Do not send ERSPAN attributes to GRE tunnels
In the commit referenced below, ip link started sending ERSPAN-specific
attributes even for GRE and gretap tunnels. Fix by more carefully
distinguishing between the GRE/tap and ERSPAN modes. Do not show
ERSPAN-related help in GRE/tap mode, likewise do not accept ERSPAN
arguments, or send ERSPAN attributes.

Fixes: 83c543af87 ("erspan: set erspan_ver to 1 by default")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:02:54 -07:00
Jiri Pirko c4dfddccef devlink: fix JSON output of mon command
The current JSON output of mon command is broken. Fix it and make sure
that the output is a valid JSON. Also, handle SIGINT gracefully to allow
to end the JSON properly.

Example:
$ devlink mon -j -p
{
    "mon": [ {
            "command": "new",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "eth",
                    "netdev": "eth0",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        } ]
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 13:59:12 -07:00
David Ahern 5c762c3bc2 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:42:33 +00:00
Petr Machata 74c8610f3b man: tc-pedit: Drop the claim that pedit ex is only for IPv4
This sentence predates addition of extended pedit for IPv6 packets.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:59 +00:00
Petr Machata f91f788c70 man: tc-pedit: Add examples for dsfield and retain
Describe a way to update just the DSCP and just the ECN part of the
dsfield. That is useful on its own, but also it shows how retain works.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Petr Machata 2d9a8dc439 tc: p_ip6: Support pedit of IPv6 dsfield
Support keywords dsfield, traffic_class and tos in the IPv6 context.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Jiri Pirko 1c3ed78001 devlink: remove unused "jw" field
This field is not used. Remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:28 +00:00
Stephen Hemminger 27136cab54 man/tc-actions: fix formatting
Fix error from make check.
n-old.tmac: <standard input>: line 86: 'R' is a string (producing the registered sign), not a macro.
Error in tc-actions.8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:07:54 -07:00
Jiri Pirko e00248d296 man: add man page for devlink dpipe
Add simple man page for devlink dpipe.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:06:00 -07:00
Jiri Pirko 885f4b0d7a devlink: remove "dev" object sub help messages
Remove duplicate sub help messages for "dev" object and have them all
show help message for "dev".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b2522187d8 devlink: Fix help message for dpipe
Have one help message for all dpipe commands, as it is done for the rest
of the devlink object. Possible and required options to the help.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 342f462efa devlink: rename dpipe_counters_enable struct field to dpipe_counters_enabled
To be consistent with the rest of the code and name of netlink
attribute, rename the dpipe_counters_enable struct fielt
to dpipe_counters_enabled.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 192e7b3ffa devlink: Add alias "counters_enabled" for "counters" option
To be consistent with netlink attribute name and also with the
"dpipe table show" output, add "counters_enabled" for "counters" in
"dpipe table set" command.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 0b1875cdc6 devlink: fix encap mode manupulation
DEVLINK_ATTR_ESWITCH_ENCAP_MODE netlink attribute carries enum. But the
code assumes bool value. Fix this by treating the encap mode in the same
way as other eswitch mode attributes, switching from "enable"/"disable"
to "basic"/"none", according to the enum. Maintain the backward
compatibility to allow user to pass "enable"/"disable" too. Also to be
in-sync with the rest of the "mode" commands, rename to "encap-mode".
Adjust the help and man page accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 90ce848b05 devlink: Fix help and man of "devlink health set" command
Fix the help and man page of "devlink health set" command to be aligned
with the rest of helps and man pages.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b37a863cb2 devlink: remove custom bool command line options parsing
Change the code that is doing custom processing of boolean command line
options to use dl_argv_bool(). Extend strtobool() to accept
"enable"/"disable" to maintain current behavior.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Stephen Hemminger 5d10f24fdd Merge ../iproute2-next 2020-04-06 10:00:12 -07:00
Jiri Pirko 0827cc53f3 tc: show used HW stats types
If kernel provides the attribute, show the used HW stats types.

Example:

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:30:04 +00:00
Ido Schimmel 0141ca64b8 bash-completion: devlink: Extend bash-completion for new commands
Extend bash-completion for two new commands:

devlink trap policer set DEV policer POLICER [ rate RATE ] [ burst BURST ]
devlink trap policer show DEV policer POLICER

And for "policer" / "nopolicer" parameters in existing command:

devlink trap group set DEV group GROUP [ action { trap | drop } ]
                       [ policer POLICER ] [ nopolicer ]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:13 +00:00
Ido Schimmel 02a2a6683f devlink: Add ability to bind policer to trap group
Add ability to associate a policer with a trap group. The policer can be
unbound by using the 'nopolicer' keyword. In which case, the value
encoded in the 'DEVLINK_ATTR_TRAP_POLICER_ID' attribute will be '0'.
This is consistent with ip-link 'nomaster' keyword and the 'IFLA_MASTER'
attribute.

Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops policer 2
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "policer": 2
            } ]
    }
}

# devlink trap group set netdevsim/netdevsim10 group l3_drops nopolicer
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:07 +00:00
Ido Schimmel a66af55693 devlink: Add devlink trap policer set and show commands
The trap policer set command allows the user to set the parameters of
the packet trap policer, such as rate and burst size. Example:

# devlink trap policer set netdevsim/netdevsim10 policer 1 rate 1000 burst 32

The trap policer show command allows the user to get the current
parameters of an individual policer or a dump of all policers in case
one is not specified. When '-s' is specified the policer's statistics
are shown. Example:

# devlink -jps trap policer show netdevsim/netdevsim10 policer 1
{
    "trap_policer": {
        "netdevsim/netdevsim10": [ {
                "policer": 1,
                "rate": 1000,
                "burst": 32,
                "stats": {
                    "rx": {
                        "dropped": 53
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:24:35 +00:00
David Ahern ce9191ffee Update kernel headers
Update kernel headers to commit:
    7f80ccfe9968 ("net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:23:28 +00:00
Stephen Hemminger 29981db0e0 v5.6.0 2020-03-30 08:06:08 -07:00
Andrea Claudi 0641bed8a3 man: bridge.8: fix bridge link show description
When multiple bridges are present, 'bridge link show' diplays ports
for all bridges. Make this clear in the command description, and
point out the user to the ip command to display ports for a specific
bridge.

Reported-by: Marc Muehlfeld <mmuehlfe@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-30 08:01:02 -07:00
Danielle Ratson 5a3faf2949 bash-completion: devlink: add bash-completion function
Add function for command completion for devlink in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:46:09 +00:00
Petr Machata 6c10fdca70 tc: q_red: Support 'nodrop' flag
Recognize the new configuration option of the RED Qdisc, "nodrop". Add
support for passing flags through TCA_RED_FLAGS, and use it when passing
TC_RED_NODROP flag.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:45:37 +00:00
Jakub Kicinski 1c74c20cbe tc: m_action: rename hw stats type uAPI
Follow the kernel rename to shorten the identifiers.
Rename hw_stats_type to hw_stats.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:42:33 +00:00
David Ahern 1ff1edb6d5 Update kernel headers
Update kernel headers to commit:
    cd556e40fdf3 ("devlink: expand the devlink-info documentation")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:41:49 +00:00
Stephen Hemminger adcab267b8 uapi: update linux/in.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-20 11:05:26 -07:00
Jiri Pirko 341903dd3b tc: m_action: introduce support for hw stats type
Introduce support for per-action hw stats type config.

This patch allows user to specify one of the following types of HW
stats for added action:
immediate - queried during dump time
delayed - polled from HW periodically or sent by HW in async manner
disabled - no stats needed

Note that if "hw_stats" option is not passed, user does not care about
the type, just expects any type of stats.

Examples:
$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats disabled
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 7 sec used 2 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        hw_stats disabled

$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats immediate
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 11 sec used 4 sec
        Action statistics:
        Sent 102 bytes 1 pkt (dropped 1, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 102 bytes 1 pkt
        backlog 0b 0p requeues 0
        hw_stats immediate

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:18:44 +00:00
David Ahern 25091a761f Update kernel headers
Update kernel headers to commit:
    3fd177cb2b47 ("net: stmmac: dwmac_lib: remove unnecessary checks in dwmac_dma_reset()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:17:55 +00:00
Guillaume Nault 72cc0bafb9 iproute2: fix MPLS label parsing
The initial value of "label" in parse_mpls() is 0xffffffff. Therefore
we should test for this value, and not 0, to detect if a label has been
provided. The "!label" test not only fails to detect a missing label
parameter, it also prevents the use of the IPv4 explicit NULL label,
which actually equals 0.

Reproducer:
  $ ip link add name dm0 type dummy
  $ tc qdisc add dev dm0 ingress

  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: act_mpls: Label is required for MPLS push.
  We have an error talking to the kernel
  --> Filter was pushed to the kernel, where it got rejected.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  Error: argument "label" is required
  --> Label 0 was rejected by iproute2.

Expected result:
  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: argument "label" is required
  --> Filter was directly rejected by iproute2.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  --> Filter is accepted.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:56:53 -07:00
Andrea Claudi d9b868436a nexthop: fix error reporting in filter dump
nh_dump_filter is missing a return value check in two cases.
Fix this simply adding an assignment to the proper variable.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:54:42 -07:00
Leslie Monis 94c4ce822c Revert "tc: pie: change maximum integer value of tc_pie_xstats->prob"
This reverts commit 92cfe3260e.

Kernel commit 3f95f55eb55d ("net: sched: pie: change tc_pie_xstats->prob")
removes the need to change the maximum integer value of
tc_pie_stats->prob here.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-10 18:29:26 +00:00
Leslie Monis 92cfe3260e tc: pie: change maximum integer value of tc_pie_xstats->prob
Kernel commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows"),
changes the maximum value of tc_pie_xstats->prob from (2^64 - 1) to
(2^56 - 1).

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-09 02:46:45 +00:00
David Ahern ad19a18e42 Merge branch 'macsec-offload' into next
Antoine Tenart  says:

====================

This series adds support for selecting and reporting the offloading mode
of a MACsec interface. Available modes are for now 'off' and 'phy',
'off' being the default when an interface is created. Modes are not only
'off' and 'on' as the MACsec operations can be offloaded to multiple
kinds of specialized hardware devices, at least to PHYs and Ethernet
MACs. The later isn't currently supported in the kernel though.

The first patch adds support for reporting the offloading mode currently
selected for a given MACsec interface through the `ip macsec show`
command:

   # ip macsec show
   18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c860001 on SA 0
           0: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           0: PN 1, state on, key 01000000000000000000000000000000
->     offload: phy
   19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c880001 on SA 0
           1: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           1: PN 1, state on, key 01000000000000000000000000000000
->     offload: off

The second patch allows an user to change the offloading mode at runtime
through a new subcommand, `ip macsec offload`:

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

If a mode isn't supported, `ip macsec offload` will report an issue
(-EOPNOTSUPP).

Giving the offloading mode when a macsec interface is created was
discussed; it is not implemented in this series. It could come later
on, when needed, as we'll still want to support updating the offloading
mode at runtime (what's implemented in this series).
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:59:35 +00:00
Antoine Tenart c15674d80d macsec: add an accessor for validate_str
This patch adds an accessor for the validate_str array, to handle future
changes adding a member.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:41 +00:00
Antoine Tenart 69166f909b man: document the ip macsec offload command
Add a description of the `ip macsec offload` command used to select the
offloading mode on a macsec interface when the underlying device
supports it.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:36 +00:00
Antoine Tenart 791bc7ee48 macsec: add support for changing the offloading mode
MacSEC can now be offloaded to specialized hardware devices. Offloading
is off by default when creating a new MACsec interface, but the mode can
be updated at runtime. This patch adds a new subcommand,
`ip macsec offload`, to allow users to select the offloading mode of a
MACsec interface. It takes the mode to switch to as an argument, which
can for now either be 'off' or 'phy':

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:30 +00:00
Antoine Tenart da6abdba09 macsec: report the offloading mode currently selected
This patch adds support to report the MACsec offloading mode currently
being enabled, which as of now can either be 'off' or 'phy'. This
information is reported through the `ip macsec show` command:

  # ip macsec show
  18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c860001 on SA 0
          0: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          0: PN 1, state on, key 01000000000000000000000000000000
      offload: phy
  19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c880001 on SA 0
          1: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          1: PN 1, state on, key 01000000000000000000000000000000
      offload: off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:56:41 +00:00
Parav Pandit a5c44b821c devlink: Introduce devlink port flavour virtual
Currently PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.

Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.

An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.

$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:48:11 +00:00
Jiri Pirko 4fe07b8146 devlink: add trap metadata type for flow action cookie
Flow action cookie has been recently added to kernel, print it out.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:46:29 +00:00
David Ahern b6b8e40bf7 Update kernel headers
Update kernel headers to commit
ef71037047b0 ("Merge branch 'act_ct-software-offload-of-established-flows-fixes'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:44:21 +00:00
David Ahern b6de0bf7db Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-28 22:42:49 +00:00
Stephen Hemminger b5a77cf701 uapi: update bpf.h
Updated upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:55:38 -08:00
Andrea Claudi 31824e2299 man: rdma-statistic: Add filter description
Add description for filters on rdma statistics show command.
Also add a filter description on the help message of the command.
Additionally, fix some whitespace issue in the man page.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Andrea Claudi 8f1c9d4a3c man: rdma.8: Add missing resource subcommand description
Add resource subcommand in the OBJECT section and a short
description for it.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Xin Long f9d696cf41 xfrm: not try to delete ipcomp states when using deleteall
In kernel space, ipcomp(sub) states used by main states are not
allowed to be deleted by users, they would be freed only when
all main states are destroyed and no one uses them.

In user space, ip xfrm sta deleteall doesn't filter these ipcomp
states out, and it causes errors:

  # ip xfrm state add src 192.168.0.1 dst 192.168.0.2 spi 0x1000 \
      proto comp comp deflate mode tunnel sel src 192.168.0.1 dst \
      192.168.0.2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

This patch is to fix it by filtering ipcomp states with a check
xsinfo->id.proto == IPPROTO_IPIP.

Fixes: c7699875be ("Import patch ipxfrm-20040707_2.diff")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:50:58 -08:00
Andrea Claudi 229bb886a3 man: ip.8: Add missing vrf subcommand description
Add description to the vrf subcommand and a reference to the
dedicated man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:48:23 -08:00
Donald Sharp 320c5c6e09 ip route: Do not imply pref and ttl-propagate are per nexthop
Currently `ip -6 route show` gives us this output:

sharpd@eva ~/i/ip (master)> ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1 pref medium

Displaying `pref medium` as the last bit of output implies
that the RTA_PREF is a per nexthop value, when it is infact
a per route piece of data.

Change the output to display RTA_PREF and RTA_TTL_PROPAGATE
before the RTA_MULTIPATH data is shown:

sharpd@eva ~/i/ip (master)> ./ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20 pref medium
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:42:59 -08:00
Andrea Claudi 2c7056ac26 nstat: print useful error messages in abort() cases
When nstat temporary file is corrupted or in some other corner cases,
nstat use abort() to stop its execution. This can puzzle some users,
wondering what is the reason for the crash.

This commit replaces abort() with some meaningful error messages and exit()

Reported-by: Renaud Métrich <rmetrich@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:52:08 -08:00
Xin Long 83c543af87 erspan: set erspan_ver to 1 by default
Commit 2897636267 ("erspan: add erspan version II support")
breaks the command:

 # ip link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1

as erspan_ver is set to 0 by default, then IFLA_GRE_ERSPAN_INDEX
won't be set in gre_parse_opt().

  # ip -d link show erspan1
    ...
    erspan remote 10.1.0.1 local 10.1.0.2 ... erspan_index 0 erspan_ver 1
                                              ^^^^^^^^^^^^^^

This patch is to change to set erspan_ver to 1 by default.

Fixes: 2897636267 ("erspan: add erspan version II support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:50:17 -08:00
Stephen Hemminger 0a6ea03be4 uapi: update magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-11 08:16:42 -08:00
Moshe Shemesh 5023df6a21 devlink: Add health error recovery status monitoring
Add support for devlink health error recovery status monitoring.
Update devlink-monitor man page accordingly.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-10 05:29:24 +00:00
Mohit P. Tahiliani 9dced637f8 tc: add support for FQ-PIE packet scheduler
This patch adds support for the FQ-PIE packet Scheduler

Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecn_prob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:24:39 -08:00
Peter Junos 39995691b5 ss: fix tests to reflect compact output
This fixes broken tests in commit c4f5862994 ("ss: use compact output
for undetected screen width")

It also escapes stars as grep is used and more bugs could sneak under
the radar with the previous solution.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:21:06 -08:00
Stephen Hemminger 8f9f2b9cdf devlink: fix warning from unchecked write
Warning seen on Ubuntu

devlink.c: In function ‘cmd_dev_flash’:
devlink.c:3071:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
 3071 |   write(pipe_w, &err, sizeof(err));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-02 04:20:58 -08:00
Andrea Claudi 5cdeb77cd6 ip link: xstats: fix TX IGMP reports string
This restore the string format we have before jsonification, adding a
missing space between v2 and v3 on TX IGMP reports string.

Fixes: a9bc23a792 ("ip: bridge: add xstats json support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:35 -08:00
Andrea Claudi 38dd041bfe ip-xfrm: Fix help messages
After commit 8589eb4efd ("treewide: refactor help messages") help
messages for xfrm state and policy are broken, printing many times the
same protocol in UPSPEC section:

$ ip xfrm state help
[...]
UPSPEC := proto { { tcp | tcp | tcp | tcp } [ sport PORT ] [ dport PORT ] |
                  { icmp | icmp | icmp } [ type NUMBER ] [ code NUMBER ] |
                  gre [ key { DOTTED-QUAD | NUMBER } ] | PROTO }

This happens because strxf_proto function is non-reentrant and gets called
multiple times in the same fprintf instruction.

This commit fix the issue avoiding calls to strxf_proto() with a constant
param, just hardcoding strings for protocol names.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:14 -08:00
David Ahern 8e66c8c112 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-29 15:16:54 +00:00
Stephen Hemminger f9ed2db593 uapi: updates to tcp.h, snmp.h and if_bridge.h
Upstream changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:48:19 -08:00
Stephen Hemminger 74da271551 uapi/pkt_sched: upstream changes from fq_pie
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:46:43 -08:00
Stephen Hemminger aa6d6b223b uapi: update bpf.h and btf.h
Upstream headers from 5.6 pre rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:45:53 -08:00
Stephen Hemminger d80d22d5fd Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Resolved conflict in tc/f_flower.c
2020-01-29 05:44:53 -08:00
Stephen Hemminger d4df55404a v5.5.0 2020-01-27 05:53:09 -08:00
Ron Diskin ff360fe984 devlink: Replace pr_out_bool/uint() wrappers with common print functions
Replace calls for pr_out_bool() and pr_out_uint() with direct calls
to common json_print library function print_bool() and print_uint().

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 5a71671a94 devlink: Replace pr_#type_value wrapper functions with common functions
Replace calls for pr_bool/uint/uint64_value with direct calls for the
matching common json_print library function: print_bool(), print_uint()
and print_u64()

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 3666eb0eb6 devlink: Replace pr_out_str wrapper function with common function
Replace calls for pr_out_str() and pr_out_str_value() with direct calls to
common json_print library functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin b20555e402 devlink: Replace json prints by common library functions
Substitute json prints to use json_print.c common library functions,
instead of directly calling jsonw_functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 98e48e7dd0 json_print: Add new json object function not as array item
Currently new json object opens (and delete_json_obj closes) the object as
an array, what adds prints for the matching bracket '[' ']' at the
start/end of the object. This patch adds new_json_obj_plain() and the
matching delete_json_obj_plain() to enable opening and closing json object,
not as array and leave it to the using function to decide which type of
object to open/close as the main object.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 31ca29b2be json_print: Introduce print_#type_name_value
Until now print_#type functions supported printing constant names and
unknown (variable) values only.
Add functions to allow printing when the name is also sent to the
function as a variable.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Leslie Monis eae5f4b5c8 tc: parse attributes with NLA_F_NESTED flag
The kernel now requires all new nested attributes to set the
NLA_F_NESTED flag. Enable tc {qdisc,class,filter} to parse
attributes that have the NLA_F_NESTED flag set.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:45:48 +00:00
Sabrina Dubroca 22aec42679 ip: xfrm: add espintcp encapsulation
While at it, convert xfrm_xfrma_print and xfrm_encap_type_parse to use
the UAPI macros for encap_type as suggested by David Ahern, and add the
UAPI udp.h header (sync'd from ipsec-next to get the TCP_ENCAP_ESPINTCP
definition).

Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:42:01 +00:00
David Ahern 4df5ad933c Update kernel headers and import udp.h
Update kernel headers to commit:
    4f2c17e0f332 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next")

and import udp.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:40:26 +00:00
Roi Dayan 919046d326 tc: flower: fix print with oneline option
This commit fix all location in flower to use _SL_ instead of \n for
newline to allow support for oneline option.

Example before this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1
  indev ens1f0
  dst_mac 11:22:33:44:55:66
  eth_type ipv4
  ip_proto tcp
  src_ip 2.2.2.2
  src_port 99
  dst_port 1-10\  tcp_flags 0x5/5
  ip_flags frag
  ct_state -trk\  ct_zone 4\  ct_mark 255
  ct_label 00000000000000000000000000000000
  skip_hw
  not_in_hw\    action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 287 sec used 287 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Example output after this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1 \  indev ens1f0\  dst_mac 11:22:33:44:55:66\  eth_type ipv4\  ip_proto tcp\  src_ip 2.2.2.2\  src_port 99\  dst_port 1-10\  tcp_flags 0x5/5\  ip_flags frag\  ct_state -trk\  ct_zone 4\  ct_mark 255\  ct_label 00000000000000000000000000000000\  skip_hw\  not_in_hw\action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 346 sec used 346 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-21 15:40:21 -08:00
Ethan Sommer 5f78bc3e1d make yacc usage POSIX compatible
config: put YACC in config.mk and use environmental variable if present

ss:
use YACC variable instead of hardcoding bison
place options before source file argument
use -b to specify file prefix instead of output file, as -o isn't POSIX
compatible, this generates ssfilter.tab.c instead of ssfilter.c
replace any references to ssfilter.c with references to ssfilter.tab.c

tc:
use -p flag to set name prefix instead of bison-specific api.prefix
directive
remove unneeded bison-specific directives
use -b instead of -o, replace references to previously generated
emp_ematch.yacc.[ch] with references to newly generated
emp_ematch.tab.[ch]

Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:43:22 -08:00
Jan Engelhardt 31f45088c9 build: fix build failure with -fno-common
$ make CCOPTS=-fno-common
gcc ... -o ip
ld: rt_names.o (symbol from plugin): in function "rtnl_rtprot_n2a":
(.text+0x0): multiple definition of "numeric"; ip.o (symbol from plugin):(.text+0x0): first defined here

gcc ... -o tipc
ld: ../lib/libutil.a(utils.o):(.bss+0xc): multiple definition of `pretty';
tipc.o:tipc.c:28: first defined here

References: https://bugzilla.opensuse.org/1160244
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:40:59 -08:00
Stephen Hemminger f4d7ce9bfa ip: use print_nl() to handle one line mode
The helper function print_nl() does the right thing and prints
the newline or backslash.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:32:51 -08:00
Vladis Dronov 970db267a0 ip: fix link type and vlan oneline output
Move link type printing in print_linkinfo() so multiline output does not
break link options line. Add oneline support for vlan's ingress and egress
qos maps.

Before the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>               the option line is broken ^^^
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }   <<< a multiline output despite -oneline
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

After the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 }

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR> \      ingress-qos-map { 1:2 } \      egress-qos-map { 2:1 }

Fixes: 5c302d518f ("vlan support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206241
Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:28:39 -08:00
David Ahern 46eb08dc2e Merge branch 'tc-ets-qdisc' into next
Petr Machata  says:

====================

A new Qdisc, "ETS", has been accepted into Linux at kernel commit
6bff00170277 ("Merge branch 'ETS-qdisc'"). Add iproute2 support for this
Qdisc.

Patch #1, changes libnetlink to admit NLA_F_NESTED in nested attributes.
Patch #2 then adds ETS support as such.

Examples (taken from the kernel patchset):

- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5

- Tweak quantum of one of the classes of the previous Qdisc:

    # tc class ch dev swp1 classid 1:4 ets quantum 1000
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
    # tc class ch dev swp1 classid 1:3 ets quantum 1000
    Error: Strict bands do not have a configurable quantum.

- Purely strict Qdisc with 1:1 mapping between priorities and TCs:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

- Use "bands" to specify number of bands explicitly. Underspecified bands
  are implicitly ETS and their quantum is taken from MTU. The following
  thus gives each band the same weight:

    # tc qdisc add dev swp1 root handle 1: \
	ets bands 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:55:02 +00:00
Petr Machata d2773f1261 tc: Add support for ETS Qdisc
Add a new module to generate and parse options specific to the ETS Qdisc.

Example output:

    bands 8 strict 3 priomap 0 1 2 3 4 5 6 7
qdisc ets 1: root refcnt 2 offloaded bands 8 strict 3 quanta 1514 1514 1514 1514 1514 priomap 0 1 2 3 4 5 6 7 7 7 7 7 7 7 7 7
[
  {
    "kind": "ets",
    "handle": "1:",
    "root": true,
    "refcnt": 2,
    "offloaded": true,
    "options": {
      "bands": 8,
      "strict": 3,
      "quanta": [1514, 1514, 1514, 1514, 1514],
      "priomap": [0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:54:12 +00:00
Petr Machata 0da4cfaa5d libnetlink: parse_rtattr_nested should allow NLA_F_NESTED flag
In kernel commit 8cb081746c03 ("netlink: make validation more configurable
for future strictness"), Linux started implicitly flagging nests with
NLA_F_NESTED, unless the nest is created with nla_nest_start_noflag().

The ETS code uses nla_nest_start() where possible, so it does not work with
the current iproute2 code. Have libnetlink catch up by admitting the flag
in the attribute.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:46:12 +00:00
Ido Schimmel ed81a2a040 ip route: Print "rt_offload" and "rt_trap" indication
The kernel now signals the offload state of a route using the
'RTM_F_OFFLOAD' and 'RTM_F_TRAP' flags. Print these to help users
understand the offload state of each route. The "rt_" prefix is used in
order to distinguish it from the offload state of nexthops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:40:20 +00:00
David Ahern 8b802d20e4 Update kernel headers
Update kernel headers to commit
    9aaa29494030 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:39:15 +00:00
Tuong Lien d5391e186f tipc: fix clang warning in tipc/node.c
When building tipc with clang, the following warning is found:

tipc
    CC       bearer.o
    CC       cmdl.o
    CC       link.o
    CC       media.o
    CC       misc.o
    CC       msg.o
    CC       nametable.o
    CC       node.o
node.c:182:24: warning: field 'key' with variable sized type 'struct tipc_aead_key' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct tipc_aead_key key;

This commit fixes it by putting the memory area allocated for the user
input key along with the variable-sized 'key' structure in the 'union'
form instead.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:48 -08:00
Stephen Hemminger f8bebea915 tc: skbprio: add support for JSON output
Print limit in JSON

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:02 -08:00
Stephen Hemminger 1d6b73be70 tc: prio: fix space in JSON tag
The priomap should not have extra space in the tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:11:41 -08:00
Peter Junos c4f5862994 ss: use compact output for undetected screen width
This change fixes calculation of width in case user pipes the output.

SS output output works correctly when stdout is a terminal. When one
pipes the output, it tries to use 80 or 160 columns. That adds a
line-break if user has terminal width of 100 chars and output is of
the similar width. No width is assumed here.

To reproduce the issue, call
ss | less
and see every other line empty if your screen is between 80 and 160
columns wide.

This second version of the patch fixes screen_width being set to arbitrary
value.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 18:38:08 +00:00
David Ahern 404f2de114 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:49:45 +00:00
Andy Roulin 39ac2d2b80 iplink: bond: print lacp actor/partner oper states as strings
The 802.3ad/LACP actor/partner operating states are only printed as
numbers, e.g,

ad_actor_oper_port_state 15

Add an additional output in ip link show that prints a string describing
the individual 3ad bit meanings in the following way:

ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>

JSON output is also supported, the field becomes a json array:

"ad_actor_oper_port_state_str":
	["active","short_timeout","aggregating","in_sync"]

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:45:32 +00:00
David Ahern a6cf98c23f Update kernel headers
Update kernel headers to commit:
    fe23d63422c8 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:26:50 +00:00
Leslie Monis e819d3a03d tc: fq_codel: fix missing statistic in JSON output
Print JSON object even if tc_fq_codel_xstats->class_stats.drop_next
is negative.

Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 669314e817 tc: tbf: add support for JSON output
Enable proper JSON output for the TBF Qdisc.
Also, fix the style of the statement that's calculating "latency" in
tbf_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 85fdef052b tc: sfq: add support for JSON output
Enable proper JSON output for the SFQ Qdisc.
Use the long double format specifier to print the value of
"probability".
Also, fix the indentation in the online output of the contents in the
tc_sfqred_stats structure.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 46d032d002 tc: sfb: add support for JSON output
Enable proper JSON output for the SFB Qdisc.
Make the output for options "rehash" and "db" explicit.
Use the long double format specifier to print probability values.
Use sprint_time() to print time values.
Also, fix the indentation in sfb_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 0154d096c5 tc: pie: add support for JSON output
Enable proper JSON output for the PIE Qdisc.
Use sprint_time() to print the value of tc_pie_xstats->delay.
Use the long double format specifier to print tc_pie_xstats->prob.
Also, fix the indentation in the oneline output of statistics and update
the man page to reflect this change.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis f6564ed60d tc: hhf: add support for JSON output
Enable proper JSON output for the HHF Qdisc.
Also, use sprint_size() to print size values.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d15e2bfc04 tc: fq: add support for JSON output
Enable proper JSON output for the FQ Qdisc.
Use the "KEY VALUE" format for oneline output of statistics instead of
"VALUE KEY", and remove unnecessary commas from the output.
Use sprint_size() to print size values in fq_print_opt().
Use sprint_time64() to print time values in fq_print_xstats().
Also, update the man page to reflect the changes in the output format.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 90a50a6fa2 tc: codel: add support for JSON output
Enable proper JSON output for the CoDel Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d3136b1e80 tc: choke: add support for JSON output
Enable proper JSON output for the choke Qdisc.
Also, use the long double format specifier to print the value of
"probability".

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d8f673074b tc: cbs: add support for JSON output
Enable proper JSON output for the CBS Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Stephen Hemminger 2dda733f6d utils: fix indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:53:09 -08:00
Antony Antony 2cf4d7af72 ip: xfrm if_id -ve value is error
if_id is u32, error on -ve values instead of setting to 0

after :
 ip link add ipsec1 type xfrm dev lo if_id -10
 Error: argument "-10" is wrong: if_id value is invalid

before : note xfrm if_id 0
 ip link add ipsec1 type xfrm dev lo if_id -10
 ip -d  link show dev ipsec1
 9: ipsec1@lo: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/none 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
    xfrm if_id 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-25 12:38:13 -08:00
Vivien Didelot 0dcf36db1a iplink: add support for STP xstats
Add support for the BRIDGE_XSTATS_STP xstats, as follow:

    # ip link xstats type bridge_slave dev lan4 stp
    lan4
                        STP BPDU:  RX: 0 TX: 61
                        STP TCN:   RX: 0 TX: 0
                        STP Transitions: Blocked: 2 Forwarding: 1

Or below as JSON:

    # ip -j -p link xstats type bridge_slave dev lan0 stp
    [ {
            "ifname": "lan0",
            "stp": {
                "rx_bpdu": 0,
                "tx_bpdu": 500,
                "rx_tcn": 0,
                "tx_tcn": 0,
                "transition_blk": 0,
                "transition_fwd": 0
            }
        } ]

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:31:39 +00:00
Michal Kubecek 2b8e6995fe ip link: show permanent hardware address
Display permanent hardware address of an interface in output of
"ip link show" and "ip addr show". To reduce noise, permanent address is
only shown if it is different from current one.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:28:02 +00:00
David Ahern 974f889c2d Update kernel headers
Update kernel headers to commit:
    6f6dded1385c ("Merge branch 'WireGuard-CI-and-housekeeping'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:25:26 +00:00
Aya Levin af646bf953 devlink: Fix fmsg nesting in non JSON output
When an object or an array opening follows a name (label), add a new
line and indentation before printing the label. When name (label) is
followed by a value, print both at the same line.

Prior to this patch nesting was not visible in a non JSON output:
JSON:
{
    "Common config": {
        "SQ": {
            "stride size": 64,
            "size": 1024
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        } },
    "SQs": [ {
            "channel ix": 0,
            "sqn": 10,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 6,
                "HW status": 0
            }
         },{
            "channel ix": 0,
            "sqn": 14,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 10,
                "HW status": 0
            }
         } ]
}

Before this patch:
Common Config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0

With this patch:
Common config:
  SQ:
    stride size: 64 size: 1024
    CQ:
      stride size: 64 size: 1024
SQs:
  channel ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 6 HW status: 0
  channel ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 10 HW status: 0

Fixes: 7b8baf834d ("devlink: Add devlink health diagnose command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin 746b66b005 devlink: Add a new time-stamp format for health reporter's dump
Introduce a new attribute representing a new time-stamp format: current
time in ns (to comply with y2038) instead of jiffies. If the new
attribute was received, translate the time-stamp accordingly (ns).

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin f678a2d08e devlink: Print health reporter's dump time-stamp in a helper function
Add pr_out_dump_reporter prefix to the helper function's name and
encapsulate the print in it.

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Benjamin Poirier 1a500c78ae bridge: Fix tunnelshow json output
repeats for "vlan tunnelshow" what commit 0f36267485 ("bridge: fix vlan
show formatting") did for "vlan show". This fixes problems in json output.

Note that the resulting json output format of "vlan tunnelshow" is not the
same as the original, introduced in commit 8652eeb3ab ("bridge: vlan:
support for per vlan tunnel info"). Changes similar to the ones done for
"vlan show" in commit 0f36267485 ("bridge: fix vlan show formatting") are
carried over to "vlan tunnelshow".

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Fixes: 0f36267485 ("bridge: fix vlan show formatting")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 955a20be02 bridge: Deduplicate vlan show functions
print_vlan() and print_vlan_tunnel() are almost identical copies, save for
a missing newline in the latter which leads to broken output of "vlan
tunnelshow" in normal mode.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier dfa13e2273 bridge: Fix vni printing
Since commit c7c1a1ef51 ("bridge: colorize output and use JSON print
library"), print_range() is used for vid (16bits) and vni. However, the
latter are 32bits so they get truncated. They got truncated even before
that commit though.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 1f53ba7297 bridge: Fix BRIDGE_VLAN_TUNNEL attribute sizes
As per the kernel's vlan_tunnel_policy, IFLA_BRIDGE_VLAN_TUNNEL_VID and
IFLA_BRIDGE_VLAN_TUNNEL_FLAGS have type NLA_U16.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier df1262155c bridge: Fix src_vni argument in man page
"SRC VNI" is only one argument and should appear as such. Moreover, this
argument to the src_vni option is documented under three forms: "SRC_VNI",
"SRC VNI" and "VNI" in different places. Consistenly use the simplest form,
"VNI".

Fixes: c5b176e5ba ("bridge: fdb: add support for src_vni option")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 43b0b6ec84 bridge: Fix typo in error messages
Fixes: 9eff0e5cc4 ("bridge: Add vlan configuration support")
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier d88a6a98e8 testsuite: Fix line count test
a substring match is not enough, ex: 10 != 1

Fixes: 30383b074d ("tests: Add output testing")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 15322f46c3 json_print: Remove declaration without implementation
Fixes: 6377572f0a ("ip: ip_print: add new API to print JSON or regular format output")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Paolo Lungaroni 0486388a87 add support for table name in SRv6 End.DT* behaviors
it allows to specify also the table name in addition to the table number in
SRv6 End.DT* behaviors.

To add an End.DT6 behavior route specifying the table by name:

    $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table main dev eth0

The ip route show to print output this route:

    $ ip -6 route show 2001:db8::1
    2001:db8::1  encap seg6local action End.DT6 table main dev eth0 metric 1024 pref medium

The JSON output:
    $ ip -6 -j -p route show 2001:db8::1
    [ {
            "dst": "2001:db8::1",
            "encap": "seg6local",
            "action": "End.DT6",
            "table": "main",
            "dev": "eth0",
            "metric": 1024,
            "flags": [ ],
            "pref": "medium"
        } ]

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:22:07 +00:00
Stephen Hemminger 7b0d424abe tc: do not output newline in oneline mode
In oneline mode the line seperator should be \
but several parts of tc aren't doing it right.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:21:10 +00:00
David Ahern f39da545b6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:13:49 +00:00
Brian Vazquez 9eee92a41a ss: fix end-of-line printing in misc/ss.c
The previous change to ss to show header broke the printing of
end-of-line for the last entry.

Tested:

diff <(./ss.old -nltp) <(misc/ss -nltp)
38c38
< LISTEN   0  128   [::1]:35417  [::]:*  users:(("foo",pid=65254,fd=116))
\ No newline at end of file

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:19:00 -08:00
Brian Vazquez 908985c670 tc: fix warning in tc/q_pie.c
Warning was:
q_pie.c:202:22: error: implicit conversion from 'unsigned long' to
'double'

Fixes: 492ec9558b ("tc: pie: change maximum integer value of tc_pie_xstats->prob")
Cc: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:18:54 -08:00
Brian Vazquez cad1b0bc5f tc: fix warning in tc/m_ct.c
Warning was:
m_ct.c:370:13: warning: variable 'nat' is used uninitialized whenever
'if' condition is false

Cc: Paul Blakey <paulb@mellanox.com>
Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:32:53 -08:00
Bjarni Ingi Gislason 9ab56784a2 man: Fix unequal number of .RS and .RE macros
Add missing or excessive ".RE" macros.

  Remove an excessive ".EE" macro.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:13:12 -08:00
Gautam Ramakrishnan 920700a425 tc: pie: add dq_rate_estimator option
PIE now uses per packet timestamps to calculate queuing
delay. The average dequeue rate based queue delay
calculation is now made optional. This patch adds the option
to enable or disable the use of Little's law to calculate
queuing delay.

Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:49:42 -08:00
Stephen Hemminger 42060e8d35 tc_util: break long lines
Try to keep lines less than 100 characters or so.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:45:47 -08:00
Eric Dumazet 81b365eb50 tc_util: support TCA_STATS_PKT64 attribute
Kernel exports 64bit packet counters for qdisc/class stats in linux-5.5

Tested:

$ tc -s -d qd sh dev eth1 | grep pkt
 Sent 4041158922097 bytes 46393862190 pkt (dropped 0, overlimits 0 requeues 2072)
 Sent 501362903764 bytes 5762621697 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 533282357858 bytes 6128246542 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 515878280709 bytes 5875638916 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 516221011694 bytes 5933395197 pkt (dropped 0, overlimits 0 requeues 258)
 Sent 513175109761 bytes 5898402114 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 480207942964 bytes 5519535407 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 483111196765 bytes 5552917950 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 497920120322 bytes 5723104387 pkt (dropped 0, overlimits 0 requeues 271)
$ tc -s -d cl sh dev eth1 | grep pkt
 Sent 513196316238 bytes 5898645862 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 533304444981 bytes 6128500406 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 480227709687 bytes 5519762597 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 501383660279 bytes 5762860276 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 483131168192 bytes 5553147506 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 515899485505 bytes 5875882649 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 497940747031 bytes 5723341475 pkt (dropped 0, overlimits 0 requeues 271)
 Sent 516242376893 bytes 5933640774 pkt (dropped 0, overlimits 0 requeues 258)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:43:46 -08:00
Stephen Hemminger 214299be7b uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:42:32 -08:00
Tuong Lien 24bee3bf97 tipc: add new commands to set TIPC AEAD key
Two new commands are added as part of 'tipc node' command:

 $tipc node set key KEY [algname ALGNAME] [nodeid NODEID]
 $tipc node flush key

which enable user to set and remove AEAD keys in kernel TIPC (requires
the kernel option - 'TIPC_CRYPTO').

For the 'set key' command, the given 'nodeid' parameter decides the
mode to be applied to the key, particularly:

- If NODEID is empty, the key is a 'cluster' key which will be used for
all message encryption/decryption from/to the node (i.e. both TX & RX).
The same key will be set in the other nodes.

- If NODEID is own node, the key is used for message encryption (TX)
from the node. Whereas, if NODEID is a peer node, the key is for
message decryption (RX) from that peer node. This is the 'per-node-key'
mode that each nodes in the cluster has its specific (TX) key.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:14:11 +00:00
David Ahern 7438afd2cc Update kernel headers
Update kernel headers to commit:
    c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:13:09 +00:00
Eli Britstein 482fd40adf tc: flower: support masked port destination and source match
Extend destination and source port match to support masks, accepting
both decimal and hexadecimal formats.
Also add missing documentation to synopsis in manpage.

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_proto tcp dst_port 1234/0xff00 action drop

$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 1234/0xff00
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 26 sec used 26 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

$ tc -p -j filter show dev eth0 parent ffff:
        "options": {
            "keys": {
                "dst_port": 1234,
                "dst_port_mask": 65280
                ...

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:08 +00:00
Eli Britstein 75fb816d9f tc_util: add functions for big endian masked numbers
Add functions for big endian masked numbers as a pre-step towards masked
port numbers.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:01 +00:00
Eli Britstein b20dcd0b31 tc: flower: add u16 big endian parse option
Add u16 big endian parse option as a pre-step towards TCP/UDP/SCTP
ports usage.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:36:25 +00:00
Moshe Shemesh 5ccc365740 ip: fix oneline output
Ip tool oneline option should output each record on a single line. While
oneline option is active the variable _SL_ replaces line feeds with the
'\' character. However, at the end of print_linkinfo() the variable _SL_
shouldn't be used, otherwise the whole output is on a single line.

Before this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00\2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP mode DEFAULT group default qlen 1000\
link/ether 52:54:00:60:0a:db brd ff:ff:ff:ff:ff:ff\3: eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode
DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd brd
ff:ff:ff:ff:ff:ff\

After this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 52:54:00:60:0a:db
brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd
brd ff:ff:ff:ff:ff:ff

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:34:13 +00:00
David Ahern 3d9608b923 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:33:28 +00:00
Stephen Hemminger 74ea2526bf v5.4.0 2019-11-25 08:07:24 -08:00
Stephen Hemminger 1285475fe0 netem: remove redundant README
The README for distribution format was already in netem/
2019-11-25 08:02:43 -08:00
Stephen Hemminger 6fec41d5ef remove no longer useful README for lnstat
Superseded by man page.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-25 08:01:23 -08:00
Jakub Kicinski 668bd8d356 devlink: fix requiring either handle
devlink sb occupancy show requires device or port handle.
It passes both device and port handle bits as required to
dl_argv_parse() so since commit 1896b100af ("devlink: catch
missing strings in dl_args_required") devlink will now
complain that only one is present:

$ devlink sb occupancy show pci/0000:06:00.0/0
BUG: unknown argument required but not found

Drop the bit for the handle which was not found from required.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 1896b100af ("devlink: catch missing strings in dl_args_required")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-21 21:38:52 +00:00
David Ahern 536dcd2016 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/devlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:31:01 +00:00
Ido Kalir b0a688a542 rdma: Rewrite custom JSON and prints logic to use common API
Instead of doing open-coded solution to generate JSON and prints, let's
reuse existing infrastructure and APIs to do the same as ip/*.

Before this change:
 if (rd->json_output)
     jsonw_uint_field(rd->jw, "sm_lid", sm_lid);
 else
     pr_out("sm_lid %u ", sm_lid);

After this change:
 print_uint(PRINT_ANY, "sm_lid", "sm_lid %u ", sm_lid);

All the print functions are converted to support color but for now the
type of color is COLOR_NONE. This is done as a preparation to addition
of color enable option. Such change will require rewrite of command line
arguments parser which is out-of-scope for this patch.

Signed-off-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:27:36 +00:00
Danit Goldberg 738728cc6c ip link: Add support to get SR-IOV VF node GUID and port GUID
Extend iplink to show VF GUIDs (IFLA_VF_IB_NODE_GUID, IFLA_VF_IB_PORT_GUID),
giving the ability for user-space application to print GUID values.
This ability is added to the one of setting new node GUID and port GUID values.

Suitable ip link command:
- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:25:50 +00:00
Stephen Hemminger a7fa739d12 uapi: devlink.h health timestamp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:38:17 -08:00
Eli Britstein 9479ec1ed0 tc: flower: fix output for ip tos and ttl
Fix the output for ip tos and ttl to be numbers in JSON format.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_tos 5/0xf action drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 5/0xf
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ip_tos": "0x5/f",
                ...
After:
        "options": {
            "keys": {
                "ip_tos": 5,
                "ip_tos_mask": 15,
                ...

Fixes: 6ea2c2b1cf ("tc: flower: add support for matching on ip tos and ttl")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein bb3ee8b313 tc_util: fix JSON prints for ct-mark and ct-zone
Fix the output of ct-mark and ct-zone (both for matches and actions) to
be different in JSON/non-JSON mode.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
$ tc -s filter show dev ens1f0_0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ct_zone": "5",
                "ct_mark": "6/0xf"
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": "8/0xf",
                "zone": "7",
                ...
After:
        "options": {
            "keys": {
                "ct_zone": 5,
                "ct_mark": 6,
                "ct_mark_mask": 15
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": 8,
                "mark_mask": 15,
                "zone": 7,
                ...

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 99d5ee8368 tc: flower: fix newline prints for ct-mark and ct-zone
Matches of ct-mark and ct-zone were printed all in the same line. Fix
that so each ct match is printed in a separate line.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Before:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4 ct_zone 5 ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 31 sec used 31 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

After:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 746e6c0fd3 tc_util: add an option to print masked numbers with/without a newline
Add an option to print masked numbers with or without a newline, as a
pre-step towards using a common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 04b215015b tc_util: introduce a function to print JSON/non-JSON masked numbers
Introduce a function to print masked number with a different output for
JSON or non-JSON methods, as a pre-step towards printing numbers using
this common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Roman Mashak cc08619c3c man: tc-ematch.8: documented canid() ematch rule
tc-ematch.8 was missing the description of canid() ematch rule, so document
this.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:31:04 -08:00
Roman Mashak 5d5c394726 man: tc-ematch.8: update list of filter using extended matches
Extended match rules are currently supported by basic, flow and cgroup
filters, so update the man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:28:01 -08:00
Stephen Hemminger d24f5ae3f2 uapi: SPDX license updates
Upstream changes to SPDX licenses in headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:24:10 -08:00
Hritik Vijay 5883c6eba5 ss: show header for --processes/-p
ss by default shows headers for every column but omits it for --processes
for no apparent reason. This patch adds the "Process" header.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:20:14 -08:00
Guillaume Nault 130f549604 man: remove ppp from list of devices not allowed to change netns
PPP devices can be moved to different network namespaces. The feature
was added by commit 79c441ae505c ("ppp: implement x-netns support")
in Linux 4.3.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:19:39 -08:00
David Ahern 611d3123be Merge branch 'nsid-cleanup' into next
Guillaume Nault  says:

====================

It's currently hard to review ipnetns. The netns ids are inconsistently
treated as signed or unsigned and most helper functions aren't prepared
to use negative ids.

Netns id attributes can be negative: NETNSA_NSID_NOT_ASSIGNED =3D=3D -1.
So let's consistently treat nsids as signed and also reject negative
values in functions that are supposed to only handle assigned netns
ids.

While there, let's drop the extra blank line generated by some command
line parsing errors (patch 5/5).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:34:24 +00:00
Guillaume Nault d0b645a51e ipnetns: remove blank lines printed by invarg() messages
Since invarg() automatically adds a '\n' character, having one in the
error message generates an extra blank line.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:06 +00:00
Guillaume Nault 1c9b69276c ipnetns: don't print unassigned nsid in json export
Don't output the nsid and current-nsid json keys if they're not set.
Otherwise a parser would have to special case the "not-assigned"
string.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:05 +00:00
Guillaume Nault 08ba67db7b ipnetns: harden helper functions wrt. negative netns ids
Negative values are invalid netns ids. Ensure that helper functions
don't accidentally try to process them.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:03 +00:00
Guillaume Nault df6da60bcb ipnetns: fix misleading comment about 'ip monitor nsid'
'ip monitor nsid' doesn't call print_nsid().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:02 +00:00
Guillaume Nault f19966efee ipnetns: treat NETNSA_NSID and NETNSA_CURRENT_NSID as signed
These attributes are signed (with -1 meaning NETNSA_NSID_NOT_ASSIGNED).
So let's use rta_getattr_s32() and print_int() instead of their
unsigned counterpart to avoid confusion.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:00 +00:00
Jakub Kicinski c3f69bf923 devlink: allow full range of resource sizes
Resource size is a 64 bit attribute at netlink level.
Make the command line argument 64 bit as well.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:50 +00:00
Jakub Kicinski 1896b100af devlink: catch missing strings in dl_args_required
Currently if dl_args_required doesn't contain a string
for a given option the fact that the option is missing
is silently ignored.

Add a catch-all case and print a generic error.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:44 +00:00
Jakub Kicinski 9a0a2fcbf4 devlink: fix referencing namespace by PID
netns parameter for devlink reload is supposed to take PID
as well as string name. However, the PID parsing has two
bugs:
 - the opts->netns member is unsigned so the < 0
   condition is always false;
 - the parameter list is not rewinded after parsing as
   a name, so parsing as a pid uses the wrong argument.

Fixes: 08e8e1ca3e ("devlink: extend reload command to add support for network namespace change")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:03 +00:00
David Ahern 081140bbc4 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:38:37 +00:00
Jakub Kicinski 0932814458 devlink: require resource parameters
If devlink resource set parameters are not provided it crashes:
$ devlink resource set netdevsim/netdevsim0
Segmentation fault (core dumped)

This is because even though DL_OPT_RESOURCE_PATH and
DL_OPT_RESOURCE_SIZE are passed as o_required, the validation
table doesn't contain a relevant string.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-07 20:36:08 -08:00
Vlad Buslov fb2e033add tc: implement support for action flags
Implement setting and printing of action flags with single available flag
value "no_percpu" that translates to kernel UAPI TCA_ACT_FLAGS value
TCA_ACT_FLAGS_NO_PERCPU_STATS. Update man page with information regarding
usage of action flags.

Example usage:

 # tc actions add action gact drop no_percpu
 # sudo tc actions list action gact
 total acts 1

        action order 0: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 0
        no_percpu

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:44:23 -07:00
David Ahern 17a948c80a Update kernel headers
Update kernel headers to commit:
    c23fcbbc6aa4 ("tc-testing: added tests with cookie for conntrack TC action")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:43:01 -07:00
Michał Łyszczek eca5123948 libnetlink.c, ss.c: properly handle fread() errors
fread(3) returns size_t data type which is unsigned, thus check
`if (fread(...) < 0)' is always false. To check if fread(3) has
failed, user should check error indicator with ferror(3).

This commit also changes read logic a little bit by being less
forgiving for errors. Previous logic was checking if fread(3)
read *at least* required ammount of data, now code checks if
fread(3) read *exactly* expected ammount of data. This makes
sense because code parses very specific binary file, and reading
even 1 less/more byte than expected, will later corrupt data anyway.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:05:41 -07:00
Vlad Buslov cb83101626 tc: remove duplicated NEXT_ARG_FWD() in parse_ct()
Function parse_ct() manually calls NEXT_ARG_FWD() after
parse_action_control_dflt(). This is redundant because
parse_action_control_dflt() modifies argc and argv itself. Moreover, such
implementation parses out any following actions option. For example, adding
action ct with cookie errors:

$ sudo tc actions add action ct cookie 111111111111
Bad action type 111111111111
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

With fix:

$ sudo tc actions add action ct cookie 111111111111
$ sudo tc actions list action ct
total acts 1

        action order 0: ct zone 0 pipe
         index 1 ref 1 bind 0
        cookie 111111111111

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:04:28 -07:00
Julien Fortin 4f73cd7f0d ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-01 09:03:54 -07:00
Michał Łyszczek 6749801b06 rdma/sys.c: fix possible out-of-bound array access
netns_modes_str[] array has 2 elements, when netns_mode is 2,
condition (2 <= 2) will be true and `mode_str = netns_modes_str[2]'
will be executed, which will result in out-of-bound read.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-28 10:33:27 -07:00
David Ahern 7534b3cd8c Merge branch 'alt-names' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:38:24 -07:00
Jiri Pirko afd67550c2 ip: allow to use alternative names as handle
Extend ll_name_to_index() to get the index of a netdevice using
alternative interface name. Allow alternative long names to pass checks
in couple of ip link/addr commands.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 3aa0e51be6 ip: add support for alternative name addition/deletion/list
Implement addition/deletion of lists of properties, currently
alternative ifnames. Also extent the ip link show command to list them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 20fbe90771 lib/ll_map: cache alternative names
Alternative names are related to the "parent name". That means,
whenever ll_remember_index() is called to add/delete/update and it founds
the "parent name" im object by ifindex, processes related
alternative name im objects too. Put them in a list which holds the
relationship with the parent.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
David Ahern 9a1e4561f1 Merge branch 'rdma-mr-stats' into next
Leon Romanovsky  says:

====================

This is supplementary part of "ODP information and statistics"
kernel series.
https://lore.kernel.org/linux-rdma/20191016062308.11886-1-leon@kernel.org

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 10:28:49 -07:00
Erez Alfasi 5c78ffa0e5 rdma: Document MR statistics
Add document of accessing the MR counters into
the rdma-statistic man pages.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:38 -07:00
Erez Alfasi 33552ade17 rdma: Add "stat show mr" support
Show MR counters statistics. Filters are also enabled.

Examples:
~$: rdma stat show mr
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0
dev mlx5_0 mrn 9 page_faults 1221 page_invalidations 0

~$: rdma stat show mr mrn 8
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:30 -07:00
David Ahern c9dc3af42e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 09:53:46 -07:00
Stephen Hemminger 085ab19bc3 don't install examples
No longer relevant
2019-10-23 10:21:06 -07:00
Stephen Hemminger e15011b5e5 remove out of date README
The original old README refers to stuff from the pre 2.6
era including cbz. Just kill it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 10:19:45 -07:00
Michał Łyszczek b7f28e0bd9 ipnetns: do not check netns NAME when -all is specified
When `-all' argument is specified netns runs cmd on all namespaces
and NAME is not used, but netns nevertheless checks if argv[1] is a
valid namespace name ignoring the fact that argv[1] contains cmd
and not NAME. This results in bug where user cannot specify
absolute path to command.

    # ip -all netns exec /usr/bin/whoami
    Invalid netns name "/usr/bin/whoami"

This forces user to have his command in PATH.

Solution is simply to not validate argv[1] when `-all' argument is
specified.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:18:08 -07:00
Stephen Hemminger d49e5c2437 examples: remove diffserv
The diffserv examples here are out of date and incomplete.
Remove them rather than try and fix them.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:13:55 -07:00
Stephen Hemminger 86c0bf5982 examples: remove gaiconf
The gaiconf script is a workaround for something now handled
in distros as part of libc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:12:19 -07:00
Stephen Hemminger 14fd32d3c6 examples: remove out of date cbq stuff
The examples around cbq are out of date and never updated.
There are better ways to achieve same kind of thing with more
modern qdisc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-22 09:53:26 -07:00
Nicolas Dichtel 6ed2915f9c ip-netns.8: document target-nsid and nsid options of list-id
This is a follow up of the commit eaefb07804 ("ipnetns: enable to dump
nsid conversion table").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Nicolas Dichtel 63ab204e7b ip-netns.8: document the 'auto' keyword of 'ip netns set'
This is a follow up of the commit ebe3ce2fcc ("ipnetns: parse nsid as a
signed integer").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Florent Fourcot 10d39984b7 man: remove "defaut group" sentence on ip link
By default, all devices are listed, not only the default group.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Davide Caratti 14cadc707b ss: allow dumping kTLS info
now that INET_DIAG_INFO requests can dump TCP ULP information, extend 'ss'
to allow diagnosing kTLS when it is attached to a TCP socket. While at it,
import kTLS uAPI definitions from the latest net-next tree.

CC: Andrea Claudi <aclaudi@redhat.com>
Co-developed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:21 -07:00
David Ahern 4c23b12865 Update kernel headers and import tls.h
Update kernel headers to commit:
    85a83a8fca7f ("Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA'")

and add tls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:20 -07:00
Nicolas Dichtel eaefb07804 ipnetns: enable to dump nsid conversion table
This patch enables to dump/get nsid from a netns into another netns.

Example:
$ ./test.sh
+ ip netns add foo
+ ip netns add bar
+ touch /var/run/netns/init_net
+ mount --bind /proc/1/ns/net /var/run/netns/init_net
+ ip netns set init_net 11
+ ip netns set foo 12
+ ip netns set bar 13
+ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
+ ip -n foo netns set init_net 21
+ ip -n foo netns set foo 22
+ ip -n foo netns set bar 23
+ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
+ ip -n bar netns set init_net 31
+ ip -n bar netns set foo 32
+ ip -n bar netns set bar 33
+ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
+ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
+ ip -n foo netns list-id target-nsid 21
nsid 11 current-nsid 21 (iproute2 netns name: init_net)
nsid 12 current-nsid 22 (iproute2 netns name: foo)
nsid 13 current-nsid 23 (iproute2 netns name: bar)
+ ip -n bar netns list-id target-nsid 33 nsid 32
nsid 32 current-nsid 32 (iproute2 netns name: foo)
+ ip -n bar netns list-id target-nsid 31 nsid 32
nsid 12 current-nsid 32 (iproute2 netns name: foo)
+ ip netns list-id nsid 13
nsid 13 (iproute2 netns name: bar)

CC: Petr Oros <poros@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Petr Oros <poros@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-14 13:04:19 -07:00
Aya Levin 94ff4d6882 devlink: Fix inconsistency between command input and output
In devlink health show command the reporter's name parameter is called
reporter, but in the output the reporter's name is referred to as name

Before this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   name tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

After this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   reporter tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

Reported-by: Jiri Pirko <jiri@mellanox.com>
Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 7dd3d51b91 devlink: Left justification on FMSG output
FMSG output is dynamic, space separator must be on the left hand side of
the value. Otherwise output has redundant left indentation regardless
the hierarchy.

Before the patch:
 Common config: SQ: stride size: 64 size: 1024
 CQ: stride size: 64 size: 1024
 SQs:
   channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
   channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
   channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
   channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

With the patch:
Common config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
  channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
  channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 56b725a442 devlink: Add helper for left justification print
Introduce a helper function which wraps code that adds a left hand side
space separator unless it follows a newline.

Fixes: e3d0f0c0e3 ("devlink: add option to generate JSON output")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Andrea Claudi e047ca988f tc: fix segmentation fault on gact action
tc segfaults if gact action is used without action or index:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Segmentation fault

We expect tc to fail gracefully with an error message.

This happens if gact is the last argument of the incomplete
command. In this case the "gact" action is parsed, the macro
NEXT_ARG_FWD() is executed and the next matches() crashes
because of null argv pointer.

To avoid this, simply use NEXT_ARG() instead.

With this change in place:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Command line is not complete. Try option "help"

Fixes: fa49588973 ("tc: Fix binding of gact action by index.")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:51 -07:00
Damien Robert 7c503d88d2 man: add reference to `ip route add encap ... src`
The ability to specify the source adresse for 'encap ip' / 'encap ip6'
was added in commit 94a8722f2f but the man
page was not updated.

Also fixes a missing page in ip-route.8.in.

Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:15 -07:00
David Ahern 47a4c1533c Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:02:36 +00:00
Jiri Pirko 08e8e1ca3e devlink: extend reload command to add support for network namespace change
Extend existing devlink reload command by adding option "netns" by which
user can instruct kernel to reload the devlink instance into specified
network namespace.

Example:

$ ip netns add testns1
$ devlink dev reload netdevsim/netdevsim10 netns testns1

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:00:49 +00:00
Jiri Pirko 29993df876 devlink: introduce cmdline option to switch to a different namespace
Similar to ip tool, add an option to devlink to operate under certain
network namespace. Unfortunately, "-n" is already taken, so use "-N"
instead.

Example:

$ devlink -N testns1 dev show

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:59:50 +00:00
Leon Romanovsky f93134841e rdma: Relax requirement to have PID for HW objects
RDMA has weak connection between PIDs and HW objects, because
the latter tied to file descriptors for their lifetime management.

The outcome of such connection is that for the following scenario,
the returned PID will be 0 (not-valid):
 1. Create FD and context
 2. Share it with ephemeral child
 3. Create any object and exit that child

This flow was revealed in testing environment and of course real users
are not running such scenario, because it makes no sense at all in RDMA
world.

Let's do two changes in the code to support such workflow anyway:
 1. Remove need to provide PID/kernel name. Code already supports it,
    just need to remove extra validation.
 2. Ball-out in case PID is 0.

Link: https://lore.kernel.org/linux-rdma/20191002123245.18153-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:54:30 +00:00
David Ahern 9dcd8788fe Update kernel headers
Update kernel headers to commit:
    940f13821528 ("Merge branch 'dpaa2-eth-misc-cleanup'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 20:43:13 +00:00
Stephen Hemminger 2d0445c67b uapi: update btf from 5.4-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-01 08:55:01 -07:00
Roopa Prabhu 6284236237 ipneigh: neigh get support
This patch adds support to lookup a neigh entry
using recently added support in the kernel using RTM_GETNEIGH

example:
$ip neigh get 10.0.2.4 dev test-dummy0
10.0.2.4 dev test-dummy0 lladdr de:ad:be:ef:13:37 PERMANENT

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:23:43 -07:00
Roopa Prabhu 4ed5ad7bd3 bridge: fdb get support
This patch adds support to lookup a bridge fdb entry
using recently added support in the kernel using RTM_GETNEIGH
(and AF_BRIDGE family).

example:
$bridge fdb get 02:02:00:00:00:03 dev test-dummy0 vlan 1002
02:02:00:00:00:03 dev test-dummy0 vlan 1002 master bridge

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:22:32 -07:00
Julien Fortin 4ecefff3cf ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 07:59:47 -07:00
Thomas Haller 0d82ee9939 man: add note to ip-macsec manual about necessary key management
The man page of ip-macsec and the existance of the tool makes it seem like
the user could just configure static keys once, and be done with it. That is
not the case. Some form or key management must be done in user space.

Add a note about that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-26 14:11:27 -07:00
David Ahern 8c2093e5d2 ip vrf: Add json support for show command
Add json support to 'ip vrf sh':
$ ip -j -p vrf ls
[ {
        "name": "mgmt",
        "table": 1001
    } ]

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:35:41 -07:00
David Ahern 92754430a6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:34:34 -07:00
Stephen Hemminger 8d88c37724 uapi: update headers from 5.4-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:38:57 -07:00
Stephen Hemminger 38e9ba9dc9 Merge ../iproute2-next 2019-09-24 12:37:33 -07:00
Stephen Hemminger 18e631bd4b v5.3.0 2019-09-24 12:32:05 -07:00
Joe Stringer e4c4685fd6 bpf: Fix race condition with map pinning
If two processes attempt to invoke bpf_map_attach() at the same time,
then they will both create maps, then the first will successfully pin
the map to the filesystem and the second will not pin the map, but will
continue operating with a reference to its own copy of the map. As a
result, the sharing of the same map will be broken from the two programs
that were concurrently loaded via loaders using this library.

Fix this by adding a retry in the case where the pinning fails because
the map already exists on the filesystem. In that case, re-attempt
opening a fd to the map on the filesystem as it shows that another
program already created and pinned a map at that location.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:29:38 -07:00
David Ahern a32692ac9c Merge branch 'master' into next
Conflicts:
	devlink/devlink.c

Fixed the conflict by updating the numbering for all new attributes
after the ones in master branch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:55:53 -07:00
Jiri Pirko 88fdbf8030 devlink: add reload failed indication
Add indication about previous failed devlink reload.

Example outputs:

$ devlink dev
netdevsim/netdevsim10: reload_failed true
$ devlink dev -j -p
{
    "dev": {
        "netdevsim/netdevsim10": {
            "reload_failed": true
        }
    }
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:51:47 -07:00
Andrea Claudi c0325b0638 bpf: replace snprintf with asprintf when dealing with long buffers
This reduces stack usage, as asprintf allocates memory on the heap.

This indirectly fixes a snprintf truncation warning (from gcc v9.2.1):

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:49:46 -07:00
Nicolas Dichtel 80d0e62673 link_xfrm: don't force to set phydev
Since linux commit 22d6552f827e ("xfrm interface: fix management of
phydev"), phydev is not mandatory anymore.

Note that it also could be useful before the above commit to not force the
user to put a phydev (the kernel was checking it anyway).
For example, it was useful to not set it in case of x-netns, because the
phydev is not available in the current netns:

Before the patch:
$ ip netns add foo
$ ip link add xfrm1 type xfrm dev eth1 if_id 1
$ ip link set xfrm1 netns foo
$ ip -n foo link set xfrm1 type xfrm dev eth1 if_id 2
Cannot find device "eth1"
$ ip -n foo link set xfrm1 type xfrm if_id 2
must specify physical device

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:26:21 +02:00
Andrea Claudi 6296d51825 man: ss.8: add documentation for drop counter
After commit 6df9c7a06a ("ss: add SK_MEMINFO_DROPS display") ss -m
displays also a drop counter for each socket.

This commit properly document it into the man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:25:25 +02:00
Mark Zhang 4e2d9fc4d8 rdma: Check comm string before print in print_comm()
Broken kernels (not-upstream) can provide wrong empty "comm" field.
It causes to segfault while printing in JSON format.

Fixes: 8ecac46a60 ("rdma: Add QP resource tracking information")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:17:38 +02:00
Jiri Pirko 9b13cddfe2 devlink: implement flash status monitoring
Listen to status notifications coming from kernel during flashing and
put them on stdout to inform user about the status.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:25 -07:00
Jiri Pirko 853be43f9e devlink: implement flash update status monitoring
Kernel sends notifications about flash update status, so implement these
messages for monitoring.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:24 -07:00
Dirk van der Merwe 850de16f12 devlink: unknown 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter now supports an unknown value.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:55:16 -07:00
Dirk van der Merwe c240e6748e devlink: add 'reset_dev_on_drv_probe' devlink param
Add support for the new devlink parameter along with string to uint
conversion.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:52:53 -07:00
David Dai 1157a6fc36 iproute2-next: police: support 64bit rate and peakrate in tc utility
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward
binary compatibility.

Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:39:19 -07:00
David Ahern 3d72f125c3 Update kernel headers
Update kernel headers to commit:
    aa2eaa8c272a ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:32:58 -07:00
David Ahern 2caa8012e8 nexthop: Add space after blackhole
Add a space after 'blackhole' is missing to properly separate the
protocol when it is given.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:05:43 -07:00
Andrea Claudi 4fb98f0895 devlink: fix segfault on health command
devlink segfaults when using grace_period without reporter

$ devlink health set pci/0000:00:09.0 grace_period 3500
Segmentation fault

devlink is instead supposed to gracefully fail printing a warning
message

$ devlink health set pci/0000:00:09.0 grace_period 3500
Reporter's name is expected.

This happens because DL_OPT_HEALTH_REPORTER_NAME and
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD are both defined as BIT(27).
When dl_opts_put() parse options and grace_period is set, it erroneously
tries to set reporter name to null.

This is fixed simply shifting by 1 bit enumeration starting with
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD.

Fixes: b18d89195b ("devlink: Add devlink health set command")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:01:19 -07:00
Donald Sharp 84b9168328 ip nexthop: Allow flush|list operations to specify a specific protocol
In the case where we have a large number of nexthops from a specific
protocol, allow the flush and list operations to take a protocol
to limit the commands scopes.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:20 -07:00
David Ahern 1a5141715e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:15 -07:00
Stephen Hemminger 98631f134d uapi: update bpf.h header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-29 16:20:21 -07:00
David Ahern efaef0be95 Merge branch 'devlink-trap' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@mellanox.com>

This patchset adds devlink-trap support in iproute2.

Patch #1 increases the number of options devlink can handle.

Patches #2-#3 gradually add support for all devlink-trap commands.

Patch #4 adds a man page for devlink-trap.

See individual commit messages for example usage and output.

Changes in v2:
* Remove report option and monitor command since monitoring is done
  using drop monitor

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:51:38 -07:00
Ido Schimmel a7a56f6f9d devlink: Add man page for devlink-trap
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:50:32 -07:00
Ido Schimmel 4ede9e9d56 devlink: Add devlink trap group set and show commands
These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:48 -07:00
Ido Schimmel ef12d6dafa devlink: Add devlink trap set and show commands
The trap set command allows the user to set the action of an individual
trap. Example:

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap

The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:

# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "blackhole_route",
                "type": "drop",
                "generic": true,
                "action": "trap",
                "group": "l3_drops",
                "metadata": [ "input_port" ],
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:27 -07:00
Ido Schimmel b83220db37 devlink: Increase number of supported options
Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.

Increase the limit to 64 options.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:52 -07:00
David Ahern e3af717a8d Update kernel headers
Update kernel headers to commit:
    d83d508b74c4 ("Merge branch 'stmmac-next'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:02 -07:00
David Ahern 7ad06c82e7 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:40:30 -07:00
Donald Sharp dba639a598 ip nexthop: Add space to display properly when showing a group
When displaying a nexthop group made up of other nexthops, the display
line shows this when you have additional data at the end:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74proto zebra

Modify code so that it shows:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74 proto zebra

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-15 13:21:15 -07:00
Stephen Hemminger 260dc56ae3 lib: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:21:10 -07:00
Stephen Hemminger 69df9bf981 tc: fix spelling errors
Minor spelling errors found by codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:18:51 -07:00
Stephen Hemminger 42a66ee5f3 uapi: update socket.h
Upstream change to resolve gcc-9 issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:58:49 -07:00
Ido Schimmel 395370035e tc: Fix block-handle support for filter operations
The revert of batchsize accidently reverted more than it should
and broke shared block functionality.  Fix this by restoring the
original functionality.

To reproduce:

	dst_ip 192.0.2.0/24 action drop
Unknown filter "block", hence option "10" is unparsable

Fixes: e991c04d64 ("Revert "tc: Add batchsize feature for filter and actions"")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:31:24 -07:00
Andrea Claudi a8360dd3f2 ip tunnel: add json output
Add json support on iptunnel and ip6tunnel.
The plain text output format should remain the same.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:58 -07:00
Gal Pressman 39307384ce rdma: Add driver QP type string
RDMA resource tracker now tracks driver QPs as well, add driver QP type
string to qp_types_to_str function.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:01 -07:00
David Ahern 74ddde9b5f Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 11:59:19 -07:00
Patrick Talbert 2d7cb22240 ss: sctp: Formatting tweak in sctp_show_info for locals
'locals' output does not include a leading space so it runs up against
skmem:() output. Add a leading space to fix it.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:50:11 -07:00
Patrick Talbert 18db049f6f ss: sctp: fix typo for nodelay
nodealy should be nodelay.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:49:53 -07:00
Jiri Pirko efd12cd2d9 devlink: finish queue.h to list.h transition
Loose the "q" from the names and name the structure fields in the same
way rest of the code does. Also, fix list_add arg order which leads
to segfault.

Fixes: 33267017fa ("iproute2: devlink: port from sys/queue.h to list.h")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-05 11:35:23 -07:00
Stephen Hemminger 4dd599fdb8 tc: fflush after each command in batch mode
Restore behaviour of tc batch mode.
Flush stdout after each command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:55 -07:00
Stephen Hemminger e991c04d64 Revert "tc: Add batchsize feature for filter and actions"
This reverts commit 485d0c6001.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:51 -07:00
Stephen Hemminger bfdda70d59 Revert "tc: fix batch force option"
This reverts commit b133392468.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:46 -07:00
Stephen Hemminger 350bc27cf3 Revert "tc: flush after each command in batch mode"
This reverts commit d66fdfda71.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:42 -07:00
Stephen Hemminger 11120881d9 Revert "tc: Remove pointless assignments in batch()"
This reverts commit 6358bbc381.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:36 -07:00
Yamin Friedman 432b21bec7 rdma: Document adaptive-moderation
Add document of setting the adaptive-moderation for the ib device.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Yamin Friedman 8a56ef325c rdma: Control CQ adaptive moderation (DIM)
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]

rdma dev show -d
0: mlx5_0: node_type ca fw 16.25.0319 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
caps: <BAD_PKEY_CNTR, BAD_QKEY_CNTR, AUTO_PATH_MIG, CHANGE_PHY_PORT,
PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, XRC,
MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, MEM_WINDOW_TYPE_2B,
RAW_IP_CSUM, CROSS_CHANNEL, MANAGED_FLOW_STEERING, SIGNATURE_HANDOVER,
ON_DEMAND_PAGING, SG_GAPS_REG, RAW_SCATTER_FCS, PCI_WRITE_END_PADDING>

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Stephen Hemminger 067925e2e1 json_print: drop extra semi-colons
The _PRINT_FUNC() macro expands to a function call.
Putting a semi-colon is unnecessary and causes warnings with -pedantic

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:45:32 -07:00
Kurt Kanzenbach c875433b14 utils: Fix get_s64() function
get_s64() uses internally strtoll() to parse the value out of a given
string. strtoll() returns a long long. However, the intermediate variable is
long only which might be 32 bit on some systems. So, fix it.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:44:20 -07:00
Stephen Hemminger ab45d91d6a iplink: document 'change' option to ip link
Add the command alias "change" to man page.
Don't show it on usage, since it is not commonly used.

Reported-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Matteo Croce <mcroce@redhat.com>
2019-07-29 08:43:24 -07:00
Antonio Borneo 36e584ad8a iplink_can: fix format output of clock with flag -details
The command
	ip -details link show can0
prints in the last line the value of the clock frequency attached
to the name of the following value "numtxqueues", e.g.
	clock 49500000numtxqueues 1 numrxqueues 1 gso_max_size
	 65536 gso_max_segs 65535

Add the missing space after the clock value.

Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
2019-07-26 15:05:20 -07:00
Sergei Trofimovich 33267017fa iproute2: devlink: port from sys/queue.h to list.h
sys/queue.h does not exist on linux-musl targets and fails build as:

    devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
       28 | #include <sys/queue.h>
          |          ^~~~~~~~~~~~~

The change ports to list.h API and drops dependency of 'sys/queue.h'.
The API maps one-to-one.

Build-tested on linux-musl and linux-glibc.

Bug: https://bugs.gentoo.org/690486
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: netdev@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-26 15:05:20 -07:00
Stephen Hemminger b89d6202c9 uapi: update kernel headers from 5.3-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-22 09:45:09 -07:00
Mark Zhang ca084842da rdma: Document counter statistic
Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a7137e517f rdma: Add default counter show support
Show default counter statistics, which are same through the sysfs
interface: /sys/class/infiniband/<dev>/ports/<port>/hw_counters/

Example:
$ rdma stat show link mlx5_2/1
link mlx5_2/1 rx_write_requests 8 rx_read_requests 4 rx_atomic_requests 0
out_of_buffer 0 out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0
packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0
resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0
resp_remote_access_errors 0 resp_cqe_flush_error 0 req_cqe_flush_error 0
rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0
np_cnp_sent 0 rx_icrc_encapsulated 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a6d0773ebe rdma: Add stat manual mode support
In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.

Examples:
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp bind link mlx5_2/1 lqpn 178 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang cbe10b4e44 rdma: Make get_port_from_argv() returns valid port in strict port mode
When strict_port is set, make get_port_from_argv() returns failure if
no valid port is specified.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 887fc739eb rdma: Add rdma statistic counter per-port auto mode support
With per-QP statistic counter support, a user is allowed to monitor
specific QPs categories, which are bound to/unbound from counters
dynamically allocated/deallocated.

In per-port "auto" mode, QPs are bound to counters automatically
according to common criteria. For example a per "type"(qp type)
scheme, where in each process all QPs have same qp type are bind
automatically to a single counter.
Currently only "type" (qp type) is supported. Examples:

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 1b2ca7ada7 rdma: Add get per-port counter mode support
Add an interface to show which mode is active. Two modes are supported:
- "auto": In this mode all QPs belong to one category are bind automatically
  to a single counter set. Currently only "qp type" is supported;
- "manual": In this mode QPs are bound to a counter manually.

Examples:
$ rdma statistic qp mode
0/1: mlx5_0/1: qp auto off
1/1: mlx5_1/1: qp auto off
2/1: mlx5_2/1: qp auto type on
3/1: mlx5_3/1: qp auto off

$ rdma statistic qp mode link mlx5_0
0/1: mlx5_0/1: qp auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 5937552b42 rdma: Add "stat qp show" support
This patch presents link, id, task name, lqpn, as well as all sub
counters of a QP counter.
A QP counter is a dynamically allocated statistic counter that is
bound with one or more QPs. It has several sub-counters, each is
used for a different purpose.

Examples:
$ rdma stat qp show
link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
req_remote_access_errors 0 resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0
    LQPN: <178>
$ rdma stat show link rocep1s0f5/1
link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
$ rdma stat show link rocep1s0f5/1 -p
link rocep1s0f5/1
    rx_write_requests 0
    rx_read_requests 0
    rx_atomic_requests 0
    out_of_buffer 0
    duplicate_request 0
    rnr_nak_retry_err 0
    packet_seq_err 0
    implied_nak_seq_err 0
    local_ack_timeout_err 0
    resp_local_length_error 0
    resp_cqe_error 0
    req_cqe_error 0
    req_remote_invalid_request 0
    req_remote_access_errors 0
    resp_remote_access_errors 0
    resp_cqe_flush_error 0
    req_cqe_flush_error 0
    rp_cnp_ignored 0
    rp_cnp_handled 0
    np_ecn_marked_roce_packets 0
    np_cnp_sent 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Stephen Hemminger 51a8f9f8fb uapi: fix bpf comment typo
From upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:49:36 -07:00
Ivan Delalande ed54f76484 json: fix backslash escape typo in jsonw_puts
Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:48:38 -07:00
Vedang Patel a794d05237 tc: taprio: Update documentation
Add documentation for the latest options, flags and txtime-delay, to the
taprio manpage.

This also adds an example to run tc in txtime offload mode.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:07 -07:00
Vedang Patel 1738a16de9 tc: etf: Add documentation for skip_sock_check.
Document the newly added option (skip_sock_check) on the etf man-page.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:02 -07:00
Vedang Patel a5e6ee3b34 taprio: add support for setting txtime_delay.
This adds support for setting the txtime_delay parameter which is useful
for the txtime offload mode of taprio.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:36 -07:00
Vinicius Costa Gomes ee000bf217 taprio: Add support for setting flags
This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:31 -07:00
Vedang Patel d9114263d0 etf: Add skip_sock_check
ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them.  So, add an option to skip this check.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:44:21 -07:00
David Ahern 86545eaffb Merge branch 'tc-conntrack' into next
Paul Blakey  says:

====================

This patch series add connection tracking capabilities in tc.
It does so via a new tc action, called act_ct, and new tc flower classifier matching.
Act ct and relevant flower matches, are still under review in net-next mailing list.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:42:13 -07:00
Paul Blakey 2fffb1c030 tc: flower: Add matching on conntrack info
Matches on conntrack state, zone, mark, and label.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:30 -07:00
Paul Blakey c8a494314c tc: Introduce tc ct action
New tc action to send packets to conntrack module, commit
them, and set a zone, labels, mark, and nat on the connection.

It can also clear the packet's conntrack state by using clear.

Usage:
   ct clear
   ct commit [force] [zone] [mark] [label] [nat]
   ct [nat] [zone]

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:02 -07:00
David Ahern f47081beff Import tc_act/tc_ct.h uapi file
Import include/uapi/linux/tc_act/tc_ct.h header from commit of last
kernel headers sync.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:40:07 -07:00
Paul Blakey 18aa9f5583 tc: add NLA_F_NESTED flag to all actions options nested block
Strict netlink validation now requires this flag on all nested
attributes, add it for action options.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:38:09 -07:00
Andrea Claudi ee713339d3 tunnel: factorize printout of GRE key and flags
print_tunnel() functions in ip6tunnel.c and iptunnel.c contains
the same code to print out GRE key and flags

This commit factorize the code in a helper function in tunnel.c

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 10:19:47 -07:00
Andrea Claudi d035cc1b4e ip tunnel: warn when changing IPv6 tunnel without tunnel name
Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.

$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.

This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:26:21 -07:00
Andrea Claudi ad04dbc5b4 Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
This reverts commit ba126dcad2.
It breaks tunnel creation when using 'dev' parameter:

$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff💯:2 local 2001:db8:ffff💯:1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists

dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.

Reported-by: Jianwen Ji <jiji@redhat.com>
Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:25:05 -07:00
Stephen Hemminger 78d3832335 uapi: rdma netlink.h update
From upstream 5.3-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:58:44 -07:00
Stephen Hemminger 03dafe13f4 uapi: update uapi/magic.h
From upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:56:58 -07:00
Aya Levin f359942a25 devlink: Remove enclosing array brackets binary print with json format
Keep pr_out_binary_value function only for printing. Inner relations
like array grouping should be done outside the function.

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin 1d05cca2fd devlink: Fix binary values print
Fix function pr_out_binary_value() to start printing the binary buffer
from offset 0 instead of offset 1. Remove redundant new line at the
beginning of the output

Example:
With patch:
 mlx5e_txqsq:
   05 00 00 00 05 00 00 00 01 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0
Without patch
  mlx5e_txqsq:

  00 00 00 05 00 00 00 01 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin b4d97ef57f devlink: Change devlink health dump show command to dumpit
Although devlink health dump show command is given per reporter, it
returns large amounts of data. Trying to use the doit cb results in
OUT-OF-BUFFER error. This complementary patch raises the DUMP flag in
order to invoke the dumpit cb. We're safe as no existing drivers
implement the dump health reporter option yet.

Fixes: 041e6e651a ("devlink: Add devlink health dump show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Matteo Croce 1f420318bd utils: don't match empty strings as prefixes
iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:48:48 -07:00
Andrea Claudi 6bc13e4a20 tc: util: constrain percentage in 0-100 interval
parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.

Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%

Using values above 100% we have instead:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%

This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.

With this check in place, we have:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:45:59 -07:00
Stephen Hemminger fda6f26e9b uapi: fix bpf.h link
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:36:29 -07:00
Stephen Hemminger d5ddb441a5 tc: print all error messages to stderr
Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.

Also, remove unnecessary initialization and places
where else is used after error return.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:35:07 -07:00
David Ahern 1f250b6c53 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:41:13 -07:00
David Ahern 25c567bdcd Merge branch 'tc-mpls-action' into next
John Hurley  says:

====================

Recent kernel additions to TC allows the manipulation of MPLS headers as
filter actions.

The following patchset creates an iproute2 interface to the new actions
and includes documentation on how to use it.

v1->v2:
- change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
- split long line in explain() message (David Ahern)
- use _SL_ instead of /n in print message (David Ahern)

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:07:42 -07:00
John Hurley 3b810b3b9a man: update man pages for TC MPLS actions
Add a man page describing the newly added TC mpls manipulation actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:36 -07:00
John Hurley fb57b0920f tc: add mpls actions
Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:32 -07:00
John Hurley 11d7087a4e lib: add mpls_uc and mpls_mc as link layer protocol names
Update the llproto_names array to allow users to reference the mpls
protocol ids with the names 'mpls_uc' for unicast MPLS and 'mpls_mc' for
multicast.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:28 -07:00
David Ahern 1827694858 Import tc_mpls.h uapi header
Import tc_mpls.h uapi header from kernel headers at commit:
        1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:05:19 -07:00
Parav Pandit 056399399a devlink: Introduce PCI PF and VF port flavour and attribute
Introduce PCI PF and VF port flavour and port attributes such as PF
number and VF number.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:57:19 -07:00
Vincent Bernat eb105e0d94 ip: bond: add peer notification delay support
Ability to tweak the delay between gratuitous ND/ARP packets has been
added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
specify a delay between peer notifications"), through
IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
value.

Example:

    $ ip -d link set bond0 type bond peer_notify_delay 1000
    $ ip -d link l dev bond0
    2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    state UP mode DEFAULT group default qlen 1000
        link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
        bond mode active-backup active_slave eth0 miimon 100 updelay 0
    downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
    arp_validate none arp_all_targets any primary eth0
    primary_reselect always fail_over_mac active xmit_hash_policy
    layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
    0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
    stable tlb_dynamic_lb 1 addrgenmode eu

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:54:09 -07:00
David Ahern 01db6c4174 Update kernel headers
Update kernel headers to commit:
    1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

import include/uapi/linux/const.h per new dependency in
include/uapi/linux/pkt_cls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:52:48 -07:00
Roman Mashak 26a49de4db tc: document 'mask' parameter in skbedit man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Roman Mashak 82f3df2028 tc: added mask parameter in skbedit action
Add 32-bit missing mask attribute in iproute2/tc, which has been long
supported by the kernel side.

v2: print value in hex with print_hex() as suggested by Stephen Hemminger.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Andrea Claudi 89ce8012d7 ip-route: fix json formatting for metrics
Setting metrics for routes currently lead to non-parsable
json output. For example:

$ ip link add type dummy
$ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
$ ip -j route | jq
parse error: ':' not as part of an object at line 1, column 319

Fixing this opening a json object in the metrics array and using
print_string() instead of fprintf().

This is the output for the above commands applying this patch:

$ ip -j route | jq
[
  {
    "dst": "192.168.2.0",
    "dev": "dummy0",
    "scope": "link",
    "metric": 100,
    "flags": [],
    "metrics": [
      {
        "mtu": 1000,
        "rto_min": 3
      }
    ]
  }
]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Fixes: 968272e791 ("iproute: refactor metrics print")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:30:06 -07:00
Parav Pandit 2eb23f3e7a devlink: Show devlink port number
Show devlink port number whenever kernel reports that attribute.

An example output for a physical port.
$ devlink port show
pci/0000:06:00.1/65535: type eth netdev eth1_p1 flavour physical port 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:56:14 -07:00
David Ahern a257456f96 ss: Change resolve_services to numeric
Commit ca697cee4c ("ip: add a new parameter -Numeric") changed
!resolve_services to numeric in ss.c.

A commit in master:
  d791e75d74 ("ss: in --numeric mode, print raw numbers for data rates")
added another reference to !resolve_services. Convert it to numeric.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:54:34 -07:00
David Ahern 830ac9abe6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:26:44 -07:00
Stephen Hemminger af2583437e v5.2.0 2019-07-08 11:09:59 -07:00
Andrea Claudi 90f0b587d8 tc: netem: fix r parameter in Bernoulli loss model
As the man page for tc netem states:

    To use the Bernoulli model, the only needed parameter is p while the
    others will be set to the default values r=1-p, 1-h=1 and 1-k=0.

However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.

Fixes: 3c7950af59 ("netem: add support for 4 state and GE loss model")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:17:22 -07:00
Tomasz Torcz d791e75d74 ss: in --numeric mode, print raw numbers for data rates
ss by default shows data rates in human-readable form - as Mbps/Gbps etc.
 Enhance --numeric mode to show raw values in bps, without conversion.

  Signed-of-by: Tomasz Torcz <tomasz.torcz@nordea.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:16:23 -07:00
Andrea Claudi c95e17dcba man: tc-netem.8: fix URL for netem page
URL for netem page on sources section points to a no more existent
resource. Fix this using the correct URL.

Fixes: cd72dcf13c ("netem: add man-page")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-02 17:44:51 -07:00
Denis Kirjanov 0f48f9f46a ipaddress: correctly print a VF hw address in the IPoIB case
Current code assumes that we print ethernet mac and
that doesn't work in the IPoIB case with SRIOV-enabled hardware

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state
disable,
    trust off, query_rss off
    ...

After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: updated kernel headers to uapi commit
v2->v3: fixed alignment
v3->v4: aligned print statements as used through the source

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ committer note: flipped argument order for print_vfinfo to keep fp first
  and fixed alignment issues ]
2019-06-28 16:20:12 -07:00
David Ahern ea985eb42d Update kernel headers
Update kernel headers to commit:
    5cdda5f1d6ad ("ipv4: enable route flushing in network namespaces")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:14:25 -07:00
Andrea Claudi 1e5746d5e1 utils: move parse_percent() to tc_util
As parse_percent() is used only in tc.

This reduces ip, bridge and genl binaries size:

$ bloat-o-meter -t bridge/bridge bridge/bridge.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=50973, After=50864, chg -0.21%

$ bloat-o-meter -t genl/genl genl/genl.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=30298, After=30189, chg -0.36%

$ bloat-o-meter ip/ip ip/ip.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=674164, After=674055, chg -0.02%

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:06:26 -07:00
Hoang Le 1df17579d8 tipc: support interface name when activating UDP bearer
Support for indicating interface name has an ip address in parallel
with specifying ip address when activating UDP bearer.
This liberates the user from keeping track of the current ip address
for each device.

Old command syntax:
$tipc bearer enable media udp name NAME localip IP

New command syntax:
$tipc bearer enable media udp name NAME [localip IP|dev DEVICE]

v2:
    - Removed initial value for fd
    - Fixed the returning value for cmd_bearer_validate_and_get_addr
      to make its consistent with using: zero or non-zero
v3: - Switch to use helper 'get_ifname' to retrieve interface name
v4: - Replace legacy SIOCGIFADDR by netlink
v5: - Fix leaky rtnl_handle

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:03:16 -07:00
Baruch Siach d0272f5404 devlink: fix libc and kernel headers collision
Since commit 2f1242efe9 ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:

In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
 struct sysinfo {
        ^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
                 from ../include/uapi/linux/netlink.h:5,
                 from ../include/uapi/linux/genetlink.h:6,
                 from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
 struct sysinfo {
        ^~~~~~~

Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Baruch Siach ee09370a72 devlink: fix format string warning for 32bit targets
32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:

devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
    pr_out("%s %lu", name, val);
           ^
devlink.c:59:21: note: in definition of macro ‘pr_out’
   fprintf(stdout, ##args);   \
                     ^~~~

Use uint64_t specific conversion specifiers in the format string to fix
that.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Andrea Claudi 68c46872ce ip address: do not set mngtmpaddr option for IPv4 addresses
'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set mngtmpaddr on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global mngtmpaddr dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.

Fixes: 5b7e21c417 ("add support for IFA_F_MANAGETEMPADDR")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi e4448b6c7d ip address: do not set home option for IPv4 addresses
'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set home on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global home dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi 8ae99cc46d ip address: do not set nodad option for IPv4 addresses
Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.

However it is possible to set nodad on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global nodad dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Stefano Brivio b5cf263670 iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:

  564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED
  ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries"
  3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()
  bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps
  1e47b4837f3b ipv6: Dump route exceptions if requested
  40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()

However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.

Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.

v2: Instead of creating a separate 'filter' function dealing with
    RTM_F_CACHED only, use the existing iproute_dump_filter() and get
    table and oif kernel filtering for free. Suggested by David Ahern.

Fixes: aba5acdfdb ("(Logical change 1.3)")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:27:00 -07:00
Hangbin Liu 5a403866f3 ip/iptoken: fix dump error when ipv6 disabled
When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like

$ ip token list
Dump terminated

which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:23:12 -07:00
Stephen Hemminger f799505372 devlink: replace print macros with functions
Using functions is safer, and printing is not performance
critical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 09:18:18 -07:00
Eyal Birger bfa757e02f tc: adjust xtables_match and xtables_target to changes in recent iptables
iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.

This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-24 16:12:17 -07:00
David Ahern f7eef91897 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/snmp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:59:24 -07:00
Jakub Kicinski b3cf1167e7 tc: q_netem: JSON-ify the output
Add JSON output support to q_netem.

The normal output is untouched.

In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.

v2: less macroes

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:51:35 -07:00
Nicolas Dichtel 6d77d9c6ae ip monitor: display interfaces from all groups
Only interface from group 0 were displayed.

ip monitor calls ipaddr_reset_filter() and there is no reason to not reset
the filter group in this function.

Fixes: c4fdf75d3d ("ip link: fix display of interface groups")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-21 12:59:50 -07:00
Matteo Croce b2e2922373 netns: make netns_{save,restore} static
The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce d81d4ba15d ip vrf: use hook to change VRF in the child
On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce 903818fbf9 netns: switch netns in the child when executing commands
'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:

	# ip -b - <<-'EOF'
		link add type vrf table 100
		link set vrf0 up
		link add type dummy
		link set dummy0 vrf vrf0 up
		netns add ns1
	EOF
	# ip -all -b - <<-'EOF'
		vrf exec vrf0 true
		netns exec setsid -f sleep 1h
	EOF
	# ip vrf pids vrf0
	  314  sleep
	# ps 314
	  PID TTY      STAT   TIME COMMAND
	  314 ?        Ss     0:00 sleep 1h

Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.

Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.

After this refactor some utility functions became unused, so remove them.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Pete Morici b16f525323 Add support for configuring MACsec gcm-aes-256 cipher type.
Signed-off-by: Pete Morici <pmorici@dev295.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:55:51 -07:00
Andrea Claudi 8063feebba Makefile: use make -C
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover on testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:52:58 -07:00
Stephen Hemminger 77a380379f uapi: update headers and add if_link.h and if_infiniband.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:48:21 -07:00
Michael Forney 578cadcc68 ipmroute: Prevent overlapping storage of `filter` global
This variable has the same name as `struct xfrm_filter filter` in
ip/ipxfrm.c, but overrides that definition since `struct rtfilter`
is larger.

This is visible when built with -Wl,--warn-common in LDFLAGS:

	/usr/bin/ld: ipxfrm.o: warning: common of `filter' overridden by larger common from ipmroute.o

Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:43:29 -07:00
Hangbin Liu ca697cee4c ip: add a new parameter -Numeric
Add a new parameter '-Numeric' to show the number of protocol, scope,
dsfield, etc directly instead of converting it to human readable name.
Do the same on tc and ss.

This patch is based on David Ahern's previous patch.

Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-18 08:37:47 -07:00
David Ahern e92d221022 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 07:29:40 -07:00
David Ahern 82cdb4d445 tools: Fix include path for generate_nlmsg
Compile of tools directory fails with:

make -C tools
    CC       generate_nlmsg
../../lib/libnetlink.c:28:27: fatal error: linux/nexthop.h: No such file or directory
 #include <linux/nexthop.h>
                           ^
compilation terminated.

Add local uapi to build path.

Fixes: 74829ca7dd ("libnetlink: Add helper to create nexthop dump request")
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:50:55 -07:00
Andrea Claudi 41bf0c69c0 Makefile: use make -C to change directory
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover in testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:44:39 -07:00
Stephen Hemminger b0a09ace39 testsuite: intent if/else in Makefile
Indent both arms of if/else equally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:48:33 -07:00
Moshe Shemesh c934da8aaa devlink: mnlg: Catch returned error value of dumpit commands
Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:43:14 -07:00
David Ahern 2357abbbfa Merge branch 'nexthop-objects' into next
David Ahern  says:

====================

This set adds support for nexthop objects to the ip command. The syntax
for nexthop objects is identical to the current 'ip route .. nexthop ...'
syntax making it easy to convert existing use cases.

v2
- Fixed header use in rtnl_nexthopdump_req as noted by roopa
- made rth_del static per Stephen's request and fixed coding style
- removed print_nh_gateway and exported print_rta_gateway to reuse
  the iproute.c code (keeps consistency in output)
- added examples to commit message
- fixed monitor use when specific groups requested
- fixed usage in 'ip nexthop'
- added manpage

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:32:07 -07:00
David Ahern e7cd93e7af ipmonitor: Add nexthop option to monitor
Add capability to ip-monitor to listen and dump nexthop messages.
Since the nexthop group = 32 which exceeds the max groups bit
field, 2 separate flags are needed - one that defaults on to indicate
nexthop group is joined by default and a second that indicates a
specific selection by the user (e.g, ip mon nexthop route).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-11 10:31:30 -07:00
David Ahern 12387e2c14 ip route: Add option to use nexthop objects
Add nhid option for routes to use nexthop objects by id.

Example:
  $ ip nexthop add id 1 via 10.99.1.2 dev veth1
  $ ip route add 10.100.1.0/24 nhid 1
  $ ip route ls
  ...
  10.100.1.0/24 nhid 1 via 10.99.1.2 dev veth1

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:28 -07:00
David Ahern 42cce67e71 ip: Add man page for nexthop command
Document 'ip nexthop' options in a man page with a few examples.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:06 -07:00
David Ahern 63df8e8543 Add support for nexthop objects
Add nexthop subcommand to ip. Implement basic commands for creating,
deleting and dumping nexthop objects. Syntax follows 'nexthop' syntax
from existing 'ip route' command.

Examples:
1. Single path
    $ ip nexthop add id 1 via 10.99.1.2 dev veth1
    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link

2. ECMP
    $ ip nexthop add id 2 via 10.99.3.2 dev veth3
    $ ip nexthop add id 1001 group 1/2
      --> creates a nexthop group with 2 component nexthops:
          id 1 and id 2 both the same weight

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2

3. Weighted multipath
    $ ip nexthop add id 1002 group 1,10/2,20
      --> creates a nexthop group with 2 component nexthops:
          id 1 with a weight of 10 and id 2 with a weight of 20

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2
    id 1002 group 1,10/2,20

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:58 -07:00
David Ahern 48a1e96d90 ip route: Export print_rt_flags, print_rta_if and print_rta_gateway
Export print_rt_flags and print_rta_if for use by the nexthop
command.

Change print_rta_gateway to take the family versus rtmsg struct and
export for use by the nexthop command.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:55 -07:00
David Ahern 74829ca7dd libnetlink: Add helper to create nexthop dump request
Add rtnl_nexthopdump_req to initiate a dump request of nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:53 -07:00
David Ahern 10631938f1 uapi: Import nexthop object API
Add nexthop.h from kernel with the uapi for nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:50 -07:00
David Ahern 9860becfe3 libnetlink: Add helper to add a group via setsockopt
groups > 31 have to be joined using the setsockopt. Since the nexthop
group is 32, add a helper to allow 'ip monitor' to listen for nexthop
messages.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:48 -07:00
David Ahern 7392401027 lwtunnel: Pass encap and encap_type attributes to lwt_parse_encap
lwt_parse_encap currently assumes the encap attribute is RTA_ENCAP
and the type is RTA_ENCAP_TYPE. Change lwt_parse_encap to take these
as input arguments for reuse by nexthop code which has the attributes
as NHA_ENCAP and NHA_ENCAP_TYPE.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:46 -07:00
David Ahern 2360b8cb21 libnetlink: Set NLA_F_NESTED in rta_nest
Kernel now requires NLA_F_NESTED to be set on new nested
attributes. Set NLA_F_NESTED in rta_nest.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:39 -07:00
Mahesh Bandewar ba126dcad2 ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds
Inclusion of 'dev' is allowed by the syntax but not handled
correctly by the command. It produces no output for show
command and falsely successful for change command but does
not make any changes.

can be verified with the following steps
  # ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none
  # ip -6 tunnel show ip6tnl1
  <correct output>
  # ip -6 tunnel show dev ip6tnl1
  <no output but correct output after this change>
  # ip -6 tunnel change dev ip6tnl1 local 2001🔢:1 remote 2001🔢:2 encaplimit none ttl 127 tos inherit allow-localremote
  # echo $?
  0
  # ip -6 tunnel show ip6tnl1
  <no changes applied, but changes are correctly applied after this change>

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:43:09 -07:00
Matteo Croce 80a931d41c ip: reset netns after each command in batch mode
When creating a new netns or executing a program into an existing one,
the unshare() or setns() calls will change the current netns.
In batch mode, this can run commands on the wrong interfaces, as the
ifindex value is meaningful only in the current netns. For example, this
command fails because veth-c doesn't exists in the init netns:

    # ip -b - <<-'EOF'
        netns add client
        link add name veth-c type veth peer veth-s netns client
        addr add 192.168.2.1/24 dev veth-c
    EOF
    Cannot find device "veth-c"
    Command failed -:7

But if there are two devices with the same name in the init and new netns,
ip will build a wrong ll_map with indexes belonging to the new netns,
and will execute actions in the init netns using this wrong mapping.
This script will flush all eth0 addresses and bring it down, as it has
the same ifindex of veth0 in the new netns:

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
           valid_lft 3598sec preferred_lft 3598sec

    # ip -b - <<-'EOF'
        netns add client
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
    4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
    5: veth-ns@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client

The same issue can be triggered by the netns exec subcommand with a
sligthy different script:

    # ip netns add client
    # ip -b - <<-'EOF'
        netns exec client true
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

Fix this by adding two netns_{save,reset} functions, which are used
to get a file descriptor for the init netns, and restore it after
each batch command.
netns_save() is called before the unshare() or setns(),
while netns_restore() is called after each command.

Fixes: 0dc34c7713 ("iproute2: Add processless network namespace support")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:42:14 -07:00
David Ahern 9a4f0ba478 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:32:07 -07:00
Kevin Darbyshire-Bryant d7f2bccd0f tc: add support for action act_ctinfo
ctinfo is a tc action restoring data stored in conntrack marks to
various fields.  At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.

It understands a number of parameters specific to this action in
additional to the usual action syntax.  Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]

DSCP mode

dscp enables copying of a DSCP stored in the conntrack mark into the
ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is located.  It must be 6
contiguous bits long. eg. 0xfc000000 would restore the DSCP from the
upper 6 bits of the conntrack mark.

The DSCP copying may be optionally controlled by a statemask.  The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask.  The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark ANDed with the statemask
yield a non zero result.

eg. dscp 0xfc000000 0x01000000 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
example.

CPMARK mode

cpmark enables copying of the conntrack mark to the packet skb mark.  In
this mode it is completely equivalent to the existing act_connmark
action.  Additional functionality is provided by the optional mask
parameter, whereby the stored conntrack mark is logically ANDed with the
cpmark mask before being stored into skb mark.  This allows shared usage
of the conntrack mark between applications.

eg. cpmark 0x00ffffff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]
where :
	dscp MASK is the bitmask to restore DSCP
	     STATEMASK is the bitmask to determine conditional restoring
	cpmark MASK mask applied to restored packet mark
	ZONE is the conntrack zone
	CONTROL := reclassify | pipe | drop | continue | ok |
		   goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:24:38 -07:00
David Ahern ed624243da uapi: Import tc_ctinfo uapi
Add tc_ctinfo.h uapi file from kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:23:32 -07:00
David Ahern b2f8eb7f8a Update kernel headers
Update kernel headers to commit:
    ad3a9ee0b623 ("ocelot: remove unused variable 'rc' in vcap_cmd()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 09:39:08 -07:00
Davide Caratti 0ee4d17954 tc: simple: don't hardcode the control action
the following TDC test case:

 b776 - Replace simple action with invalid goto chain control

checks if the kernel correctly validates the 'goto chain' control action,
when it is specified in 'act_simple' rules. The test systematically fails
because the control action is hardcoded in parse_simple(), i.e. it is not
parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
Because of that, the following command:

 # tc action add action simple sdata "test" drop index 7

installs an 'act_simple' rule that never drops packets, and whose 'index'
is the first IDR available, plus an 'act_gact' rule with 'index' equal to
7, that drops packets.

Use parse_action_control_dflt(), like we did on many other TC actions, to
make the control action configurable also with 'act_simple'. The expected
results of test b776 are summarized below:

 iproute2
   v       kernel->| 5.1-rc2 (and previous)  | 5.1-rc3 (and subsequent)
 ------------------+-------------------------+-------------------------
 5.1.0             | FAIL (bad IDR)          | FAIL (bad IDR)
 5.1.0(patched)    | FAIL (no rule/bad sdata)| PASS

Changes since v1:
 - reword commit message, thanks Stephen Hemminger

Fixes: 087f46ee4e ("tc: introduce simple action")
CC: Andrea Claudi <aclaudi@redhat.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:43:08 -07:00
Roman Mashak fa49588973 tc: Fix binding of gact action by index.
The following operation fails:
% sudo tc actions add action pipe index 1
% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact index 1

Bad action type index
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

However, passing a control action of gact rule during filter binding works:

% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact pipe index 1

Binding by reference, i.e. by index, has to consistently work with
any tc action.

Since tc is sensitive to the order of keywords passed on the command line,
we can teach gact to skip parsing arguments as soon as it sees 'gact'
followed by 'index' keyword.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:31 -07:00
Parav Pandit 2cc10ce81d devlink: Increase bus, device buffer size to 64 bytes
Device name on mdev bus is 36 characters long which follow standard uuid
RFC 4122.
This is probably the longest name that a kernel will return for a
device.

Hence increase the buffer size to 64 bytes.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:17 -07:00
Davide Caratti 4ae441e3d1 man: tc-skbedit.8: document 'inheritdsfield'
while at it, fix missing square bracket near 'ptype' and a typo in the
action description (it's -> its).

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-04 09:39:53 -07:00
David Ahern 339b14ab5e Merge branch 'rdma-net-namespace' into next
Parav Pandit  says:

====================

RDMA subsystem can be running in either of the modes.
(a) Sharing RDMA devices among multiple net namespaces or
(b) Exclusive mode where RDMA device is bound to single net namespace

This patch series adds
(1) query command to query rdma subsystem sharing mode
(2) set command to change rdma subsystem sharing mode
(3) assign rdma device to a net namespace

rdma tool examples:
(a) Query current rdma subsys net namespace sharing mode
$ rdma sys show
netns shared

(b) Change rdma subsys mode to exclusive mode
$ rdma sys set netns exclusive

$ rdma sys show
netns exclusive

(c) Assign rdma device to a specific newly created net namespace
$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:55 -07:00
Parav Pandit c2ffce5d39 rdma: Add man page for rdma dev set netns command
Add man page to describe additional set netns command
for rdma device.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:33 -07:00
Parav Pandit d17a0248a2 rdma: Add an option to set net namespace of rdma device
Enrich rdmatool with an option to set network namespace of RDMA
device. After successful execution of it, rdma device will
be accessible only in assigned network namespace.

rdma tool command examples and output.

First set netns mode to exclusive.

$ rdma system set netns exclusive

Now create network namespace and assign RDMA device to this
network namespace.

$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:32 -07:00
Parav Pandit e861272015 rdma: Add man pages for rdma system commands
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:31 -07:00
Parav Pandit c4572a465b rdma: Add an option to query,set net namespace sharing sys parameter
Enrich rdmatool with an option to query rdma subsystem parameter
whether rdma devices are shared among multiple network namespaces
or exclusive to single network namespace.

rdma tool command examples and output.

$ rdma system show
netns shared

$ rdma system set netns exclusive

$ rdma system show
netns exclusive

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:29 -07:00
Nicolas Dichtel c442234858 iplink: don't try to get ll addr len when creating an iface
It will obviously fail. This is a follow up of the
commit 757837230a ("lib: suppress error msg when filling the cache").

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:03:20 -07:00
Nikolay Aleksandrov a9661b8b0f bridge: mdb: restore text output format
While I fixed the mdb json output, I did overlook the text output.
This patch returns the original text output format:
 dev <bridge> port <port> grp <mcast group> <temp|permanent> <flags> <timer>
Example (old format, restored by this patch):
 dev br0 port eth8 grp 239.1.1.11 temp

Example (changed format after the commit below):
 23: br0  eth8  239.1.1.11  temp

We had some reports of failing scripts which were parsing the output.
Also the old format matches the bridge mdb command syntax which makes
it easier to build commands out of the output.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:01:53 -07:00
David Ahern 2761769472 Update kernel headers
Update kernel headers to commit:
    1167187f2759 ("Merge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-29 12:36:58 -07:00
Lukasz Czapnik 767b6fd620 tc: flower: fix port value truncation
sscanf truncates read port values silently without any error. As sscanf
man says:
(...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
do not specify the ERANGE error.

Replace sscanf with safer get_be16 that returns error when value is out
of range.

Example:
tc filter add dev eth0 protocol ip parent ffff: prio 1 flower ip_proto
tcp dst_port 70000 hw_tc 1

Would result in filter for port 4464 without any warning.

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:27:01 -07:00
Nicolas Dichtel 757837230a lib: suppress error msg when filling the cache
Before the patch:
$ ip netns add foo
$ ip link add name veth1 address 2a:a5:5c:b9:52:89 type veth peer name veth2 address 2a:a5:5c:b9:53:90 netns foo
RTNETLINK answers: No such device
RTNETLINK answers: No such device

But the command was successful. This may break script. Let's remove those
error messages.

Fixes: 55870dfe7f ("Improve batch and dump times by caching link lookups")
Reported-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:23:52 -07:00
Stephen Hemminger 1bb38f6c5e uapi: minor upstream btf.h header change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-24 15:51:06 -07:00
Paolo Abeni 6eccf7ecdb m_mirred: don't bail if the control action is missing
The mirred act admits an optional control action, defaulting
to TC_ACT_PIPE. The parsing code currently emits an error message
if the control action is not provided on the command line, even
if the command itself completes with no error.

This change shuts down the error message, using the appropriate
parsing helper.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-22 11:51:31 -07:00
Stephen Hemminger cd35c95423 man: fix macaddr section of ip-link
The formatting of setting mac address was confusing.
Break lines and fix highlighting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-21 11:27:14 -07:00
Matteo Croce 8589eb4efd treewide: refactor help messages
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:

        ip/ip:
        add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
        Total: Before=672591, After=667898, chg -0.70%
        ip/rtmon:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
        Total: Before=48879, After=48825, chg -0.11%
        tc/tc:
        add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
        Total: Before=351912, After=346661, chg -1.49%
        bridge/bridge:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
        Total: Before=70502, After=70043, chg -0.65%
        misc/lnstat:
        add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
        Total: Before=9960, After=9522, chg -4.40%
        tipc/tipc:
        add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
        Total: Before=79182, After=79138, chg -0.06%

While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:35:07 -07:00
David Ahern 2cc9b5f4fa Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:34:26 -07:00
Stephen Hemminger f99ea67684 rdma: update uapi headers
Based on 5.2-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:39 -07:00
Gal Pressman 7087f7c0ce rdma: Update node type strings
Fix typo in usnic_udp node type and add a string for the unspecified
node type.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:35 -07:00
Stephen Hemminger b60ed9a372 uapi: merge bpf.h from 5.2
Upstream commit to fix spelling errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:53:07 -07:00
Stephen Hemminger 2f31cb4fd6 uapi: add sockios.h
Forgot to add this to earlier commit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:51:15 -07:00
Stephen Hemminger 441f67de19 mailmap: map David's mail address
Cleans up multiple mail addresses in shortlog output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:50:42 -07:00
Stephen Hemminger a99b08624d mailmap: add myself
Put entries in for past commit mail addresses

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:30:47 -07:00
Stephen Hemminger afa588490b uapi: update headers to import asm-generic/sockios.h
import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:56:15 -07:00
Stephen Hemminger 0812dc7025 uapi: add include/linux/net.h
All kernel headers must come from this repo,
and ss is including linux/net.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:54:26 -07:00
David Ahern d53d7ce382 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 12:01:01 -07:00
David Ahern 5106aeaaf2 Update kernel headers and add asm-generic/sockios.h
Update kernel headers to commit
    b970afcfcabd ("Merge tag 'powerpc-5.2-1'")

and import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 10:06:41 -07:00
Stephen Hemminger f9339f8a8f uapi: update to elf-em header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-10 08:56:52 -07:00
Stephen Hemminger f9e2cf35eb Merge ../iproute2-next 2019-05-10 08:55:11 -07:00
Stephen Hemminger 3eea00d777 v5.1.0 2019-05-10 08:45:14 -07:00
Phil Sutter cd21ae4013 ip-xfrm: Respect family in deleteall and list commands
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.

Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 13:32:44 -07:00
Zhiqiang Liu 9bf2c538a0 ipnetns: use-after-free problem in get_netnsid_from_name func
Follow the following steps:
 # ip netns add net1
 # export MALLOC_MMAP_THRESHOLD_=0
 # ip netns list
then Segmentation fault (core dumped) will occur.

In get_netnsid_from_name func, answer is freed before
rta_getattr_u32(tb[NETNSA_NSID]), where tb[] refers to answer`s
content. If we set MALLOC_MMAP_THRESHOLD_=0, mmap will be adoped to
malloc memory, which will be freed immediately after calling free
func.  So reading tb[NETNSA_NSID] will access the released memory
after free(answer).

Here, we will call get_netnsid_from_name(tb[NETNSA_NSID]) before free(answer).

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reported-by: Huiying Kou <kouhuiying@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:36:18 -07:00
Ido Schimmel e73306048f devlink: Fix monitor command
The command is supposed to allow users to filter events related to
certain objects, but returns an error when an object is specified:

# devlink mon dev
Command "dev" not found

Fix this by allowing the command to process the specified objects.

Example:

# devlink/devlink mon dev &
# echo "10 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim10

# devlink/devlink mon port &
# echo "11 1" > /sys/bus/netdevsim/new_device
[port,new] netdevsim/netdevsim11/0: type notset flavour physical
[port,new] netdevsim/netdevsim11/0: type eth netdev eth1 flavour physical

# devlink/devlink mon &
# echo "12 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim12
[port,new] netdevsim/netdevsim12/0: type notset flavour physical
[port,new] netdevsim/netdevsim12/0: type eth netdev eth2 flavour physical

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:35:44 -07:00
Vinicius Costa Gomes 92f4b6032e taprio: Add support for cycle_time and cycle_time_extension
This allows a cycle-time and a cycle-time-extension to be specified.

Specifying a cycle-time will truncate that cycle, so when that instant
is reached, the cycle will start from its beginning.

A cycle-time-extension may cause the last entry of a cycle, just
before the start of a new schedule (the base-time of the "admin"
schedule) to be extended by at maximum "cycle-time-extension"
nanoseconds. The idea of this feauture, as described by the IEEE
802.1Q, is too avoid too narrow gate states.

Example:

tc qdisc change dev IFACE parent root handle 100 taprio \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      cycle-time-extension 100000 \
	      cycle-time 9000000 \
	      base-time 12345678900000000

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Vinicius Costa Gomes 602fae856d taprio: Add support for changing schedules
This allows for a new schedule to be specified during runtime, without
removing the current one.

For that, the semantics of the 'tc qdisc change' operation in the
context of taprio is that if "change" is called and there is a running
schedule, a new schedule is created and the base-time (let's call it
X) of this new schedule is used so at instant X, it becomes the
"current" schedule. So, in short, "change" doesn't change the current
schedule, it creates a new one and sets it up to it becomes the
current one at some point.

In IEEE 802.1Q terms, it means that we have support for the
"Oper" (current and read-only) and "Admin" (future and mutable)
schedules.

Example of creating the first schedule, then adding a new one:

(1)
tc qdisc add dev IFACE parent root handle 100 taprio \
      	      num_tc 1 \
	      map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	      queues 1@0 \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      base-time 100000000 \
	      clockid CLOCK_TAI

(2)
tc qdisc change dev IFACE parent root handle 100 taprio \
	      base-time 7500000000000 \
	      sched-entry S 0x0 5000000 \
              sched-entry S 0x1 5000000 \

It was necessary to fix a bug, so the clockid doesn't need to be
specified when changing the schedule.

Most of the changes are related to make it easier to reuse the same
function for printing the "admin" and "oper" schedules.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Paolo Abeni c865c52365 tc: add support for plug qdisc
sch_plug can be used to perform functional qdisc unit tests
controlling explicitly the queuing behaviour from user-space.

Plug support lacks since its introduction in 2012. This change
introduces basic support, to control the tc status.

v1 -> v2:
 - use the SPDX identifier

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:14 -07:00
David Ahern fd6580972b Update kernel headers
Update kernel headers to commit
   a734d1f4c2fc ("net: openvswitch: return an error instead of doing BUG_ON()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:13:26 -07:00
David Ahern 420b36a874 uapi: wrap SIOCGSTAMP and SIOCGSTAMPNS in ifndef
These warnings:
    ../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
    ../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are from kernel commit 0768e17073dc5 ("net: socket: implement 64-bit
timestamps"). This commit moved the definitions of SIOCGSTAMP and
SIOCGSTAMPNS from include/asm-generic/sockios.h to
include/uapi/linux/sockios.h. Older OS'es already define them in
/usr/include/asm-generic/sockios.h resulting in ugly compile errors now:

In file included from ll_types.c:24:0:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
 #define SIOCGSTAMP SIOCGSTAMP_OLD

In file included from /usr/include/x86_64-linux-gnu/asm/sockios.h:1:0,
                 from /usr/include/asm-generic/socket.h:5,
                 from /usr/include/x86_64-linux-gnu/asm/socket.h:1,
                 from /usr/include/x86_64-linux-gnu/bits/socket.h:368,
                 from /usr/include/x86_64-linux-gnu/sys/socket.h:38,
                 from ll_types.c:17:
/usr/include/asm-generic/sockios.h:11:0: note: this is the location of the previous definition
 #define SIOCGSTAMP 0x8906  /* Get stamp (timeval) */

so wrap them in #ifndef.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-03 08:23:08 -07:00
Josh Hunt 296b5de724 ss: add option to print socket information on one line
Multi-line output in ss makes it difficult to search for things with
grep. This new option will make it easier to find sockets matching
certain criteria with simple grep commands.

Example without option:
$ ss -emoitn
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port
ESTAB      0      0      127.0.0.1:13265              127.0.0.1:36743               uid:1974 ino:48271 sk:1 <->
	 skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.245/16.616 ato:40 mss:65483 cwnd:10 bytes_acked:41865496 bytes_received:21580440 segs_out:242496 segs_in:351446 data_segs_out:242495 data_segs_in:242495 send 511.3Mbps lastsnd:2383 lastrcv:2383 lastack:2342 pacing_rate 1022.6Mbps rcv_rtt:92427.6 rcv_space:43725 minrtt:0.007

Example with new option:
$ ss -emoitnO
State    Recv-Q Send-Q          Local Address:Port            Peer Address:Port
ESTAB    0      0                   127.0.0.1:13265              127.0.0.1:36743 uid:1974 ino:48271 sk:1 <-> skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.067/16.429 ato:40 mss:65483 pmtu:65535 rcvmss:536 advmss:65483 cwnd:10 bytes_sent:41868244 bytes_acked:41868244 bytes_received:21581866 segs_out:242512 segs_in:351469 data_segs_out:242511 data_segs_in:242511 send 520.4Mbps lastsnd:14355 lastrcv:14355 lastack:14314 pacing_rate 1040.7Mbps delivery_rate 74837.7Mbps delivered:242512 app_limited busy:1861946ms rcv_rtt:92427.6 rcv_space:43725 rcv_ssthresh:43690 minrtt:0.007

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-02 16:06:06 -07:00
Ido Schimmel 517ea57c6d devlink: Increase column size for larger shared buffers
With current number of spaces the output is mangled if the shared buffer
is congested.

Before:

# devlink sb occupancy show swp25
swp25:
  pool: 0:    33384960/39344256 1:          0/0       2:          0/0       3:          0/0
        4:          0/720     5:          0/0       6:          0/0       7:          0/0
        8:          0/288     9:          0/0      10:          0/0
  itc:  0(0): 33272064/39344256 1(0):       0/0       2(0):       0/0       3(0):       0/0
        4(0):       0/0       5(0):       0/0       6(0):       0/0       7(0):       0/0
  etc:  0(4):       0/720     1(4):       0/0       2(4):       0/0       3(4):       0/0
        4(4):       0/0       5(4):       0/0       6(4):       0/0       7(4):       0/0
        8(8):       0/288     9(8):       0/0      10(8):       0/0      11(8):       0/0
       12(8):       0/0      13(8):       0/0      14(8):       0/0      15(8):       0/0

After:

# devlink sb occupancy show swp25
swp25:
  pool: 0:      39070080/39344256   1:             0/0          2:             0/0          3:             0/0
        4:             0/720        5:             0/0          6:             0/0          7:             0/0
        8:             0/288        9:             0/0         10:             0/0
  itc:  0(0):   39062016/39344256   1(0):          0/0          2(0):          0/0          3(0):          0/0
        4(0):          0/0          5(0):          0/0          6(0):          0/0          7(0):          0/0
  etc:  0(4):          0/720        1(4):          0/0          2(4):          0/0          3(4):          0/0
        4(4):          0/0          5(4):          0/0          6(4):          0/0          7(4):          0/0
        8(8):          0/288        9(8):          0/0         10(8):          0/0         11(8):          0/0
       12(8):          0/0         13(8):          0/0         14(8):          0/0         15(8):          0/0

v2:
* Increase number of spaces to make the change more future-proof

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-30 11:23:08 -07:00
Nikolay Aleksandrov 09e0528cf9 ip: mroute: add fflush to print_mroute
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

After this patch ip monitor mroute &>log works properly.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-29 15:04:18 -07:00
Lucas Siba 2019-04-20 11:40 UTC 4d9e90f36b Update tc-bpf.8 man page examples
This patch updates the tc-bpf.8 example application for changes to the
struct bpf_elf_map definition. In it's current form, things compile, but
the resulting object file is rejected by the verifier when attempting to
load it through tc.

Signed-off-by: Lucas Siba <lucas.siba@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ dropped the unnecessary flags initialization on commit ]
2019-04-26 14:05:47 -07:00
David Ahern 10fb5faec1 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:13:54 -07:00
Mike Manning 3f2e457ae4 iplink_vlan: add support for VLAN bridge binding flag
This patch adds support for the VLAN bridge binding flag that is
provided in net-next kernel by the series merged by 1ab839281cf7
("net-support-binding-vlan-dev-link-state-to-vlan-member-bridge-ports")

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:12:58 -07:00
David Ahern 70de8a7fa7 Update kernel headers
Update kernel headers to commit
    148f025d41a8 ("Merge branch 'hns3-next'")

Note, these warnings:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are due to kernel commit
    0768e17073dc5 ("net: socket: implement 64-bit timestamps")

which moved the definitions from include/asm-generic/sockios.h
to include/uapi/linux/sockios.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:11:03 -07:00
Stephen Hemminger 38983334f6 tc/ematch: fix deprecated yacc warning
Newer versions of Bison deprecated some directives.

    YACC     emp_ematch.yacc.c
emp_ematch.y:11.1-14: warning: deprecated directive, use ‘%define parse.error verbose’ [-Wdeprecated]
 %error-verbose
 ^~~~~~~~~~~~~~
emp_ematch.y:12.1-22: warning: deprecated directive, use ‘%define api.prefix {ematch_}’ [-Wdeprecated]
 %name-prefix "ematch_"

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:10:22 -07:00
Thomas Haller 62de07faf7 iprule: always print realms keyword for rule
# rule add priority 10 realms 1/0xF
    # rule add priority 10 realms 0/0xF
    # ip rule
    10:     from all lookup main 15
    10:     from all lookup main realms 1/15

The previous behavior was there since the beginning.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 927632d4da iprule: refactor print_rule() to use leading space before printing attribute
When printing the actions, we avoid adding the trailing space after the
attribute. Possibly because we expect the action to be the last output
on the line and not end with a space.

But for FR_ACT_TO_TBL nothing is printed. That means, we add double
spaces if a protocol is printed as well:

    # ip rule add priority 10 protocol 10 type 1

will be printed as

    10:     from all lookup 1  proto mrt

The only visible effect of the patch is to avoid the double-space and
avoid a trailing space if the action is FR_ACT_TO_TBL.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 461f0405f3 iprule: avoid trailing space in print_rule() after printing protocol
It seems print_rule() tries to avoid a trailing space at the end
of the line. At least, when printing details about the actions,
they no longer append the space. Probably expecting to be the
last attribute that will be printed.

Don't let the protocol add the trailing space. The space at the end
of the line should be printed consistently (or not).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 6f87b544ca iprule: avoid printing extra space after gateway for nat action
For all other actions we avoid the trailing space, so do it here
as well.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Kristian Evensen 112112b8eb ip fou: Support binding FOU ports
This patch adds support for binding FOU ports using iproute2.
Kernel-support was added in 1713cb37bf67 ("fou: Support binding FoU
socket").

The parse function now handles new arguments for setting the
binding-related attributes, while the print function writes the new
attributes if they are set. Also, the man page has been updated.

v2->v3:
* Remove redundant ll_init_map()-calls (thanks David Ahern).

v1->v2 (all changes suggested by David Ahern):
* Fix reverse Christmas tree ordering.
* Remove redundant peer_port_set-variable, it is enough to check
peer_port.
* Add proper error handling of invalid local/peer addresses.
* Use interface name and not index.
* Remove updating fou-header file, it is already done.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-22 11:42:54 -07:00
Nikolay Aleksandrov 90306a1440 iplink: bridge: add support for vlan_stats_per_port
Add support for manipulating and showing the vlan_stats_per_port bridge
option which can be toggled only when there are no port VLANs
configured. Also update the man page with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:47:39 -07:00
Ido Schimmel 185ba5e2d4 ipneigh: Print neighbour offload indication
Print the offload indication in case it is set on the neighbour.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:23:23 -07:00
Nikolay Aleksandrov 6e982e7b9b bridge: vlan: fix standard stats output
Each of the commits below broke the vlan stats output in a different
way:
- 45fca4ed94 ("bridge: fix vlan show stats formatting")
 Added a second print of an interface name (e.g. eth4eth4)
- c7c1a1ef51 ("bridge: colorize output and use JSON print library")
 Broke normal vlan stats output by not printing a new line after them
 Also printed interfaces without any vlans when printing stats

This fix is not pretty but it brings back the previous behaviour.

Before this fix:
$ bridge -s vlan show
port             vlan id
br0br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetseth4eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetsroot@debian:~/

After this fix:
$ bridge -s vlan show
port             vlan id
br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
                 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets

Fixes: 45fca4ed94 ("bridge: fix vlan show stats formatting")
Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:30:05 -07:00
Nikolay Aleksandrov b11b495e7c bridge: mdb: restore valid json output
Since the commit below mdb's json output has been invalid and also with
changed format. Restore it to a valid json like the previous format.
Also takes care of a double "Deleted" print when monitoring for changes.

Example bridge -p -d -j mdb show:
 [ {
        "mdb": [ {
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::1:fffb:1939",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::1:ffd0:f61f",
                "state": "temp",
                "flags": [ ]
            } ],
        "router": {
            "virbr0": [ {
                    "port": "vnet1"
                },{
                    "port": "vnet0"
                } ],
            "virbr1": [ {
                    "port": "vnet5"
                } ]
        }
    } ]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:27:06 -07:00
Beniamino Galvani d6abae5a7a ip: add missing space after 'external' in detailed mode
Add a missing space after the 'external' keyword in the detailed mode
of tunnel links output:

 # ip -d link
 79: geneve1: <BROADCAST,MULTICAST> mtu 65465 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether da:e9:e4:2b:f9:d4 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65465
     geneve externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 80: vxlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 7a:a8:19:07:da:01 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     vxlan externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 84: gre1@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/none 00:00:00:00 brd 00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0
     gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 87: ip6gre1@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/gre6 :: brd :: promiscuity 0 minmtu 0 maxmtu 0
     ip6gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 88: ip6tnl1@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65407
     ip6tnl externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 90: ipip1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 0 maxmtu 0
     ipip externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 00ff4b8e31 ("ip/tunnel: Be consistent when printing tunnel collect metadata")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:26:31 -07:00
David Ahern 188c7fe6ea Update kernel headers
Update kernel headers to commit
    6b0a7f84ea1f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 14:07:48 -07:00
David Ahern 43de4ef694 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 13:59:44 -07:00
Eyal Birger aed63ae1ac ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies
The XFRMA_IF_ID attribute is set in policies/states for them to be
associated with an XFRM interface (4.19+).

Add support for setting / displaying this attribute.

Note that 0 is a valid value therefore set XFRMA_IF_ID if any value
was provided in command line.

Tested-by: Antony Antony <antony@phenome.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:26:43 -07:00
Ralf Baechle 8391023680 ip: display netrom link type
For a NETROM "ip link show dev nr0" will show

4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/generic 88:98:6a:a4:84:40:0a brd 00:00:00:00:00:00:00

But rather link/netrom is expected to be displayed.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:25:50 -07:00
Matt Ellison 286446c1e8 ip: support for xfrm interfaces
Interfaces take a 'if_id' which is an interface id which can be set on
an xfrm policy as its interface lookup key (XFRMA_IF_ID).

Signed-off-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:05:00 -07:00
Toke Høiland-Jørgensen d5d27f27d8 q_cake: Add support for setting the fwmark option
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:01:31 -07:00
Stephen Hemminger 41fc3fa04c uapi: update bpf.h
Updated bpf.h from 5.1-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:00:48 -07:00
David Ahern 806a553c81 Merge branch 'rdma-dynamic-link-create' into next
Steve Wise  says:

====================

This series adds rdmatool support for creating/deleting rdma links.
This will be used, mainly, by soft rdma drivers to allow adding/deleting
rdma links over netdev interfaces.  It provides the user side for
the following kernel changes merged in linux-5.1.

Changes since v2:

- move checks for required parameters in the parameter handlers
- move final 'link add' processing to link_add_netdev()
- added reviewed-by tags

Changes since v1:

- move error receive checking from rd_sendrecv_msg() to rd_recv_msg().
- Add rd->suppress_errors to allow control over whether errors when
  reading a response should be ignored.  Namely: resource queries can
  get errors like "none found" when querying for resources, and this
  error should not be displayed.  So on a rd object basis, error
  suppression can be controlled.
- Rebased on rdma/for-next UABI (no need to sync rdma_netlink.h now)
- use chains of struct rd_cmd and rd_exec_cmd vs open coding the parsing
  for the 'link add' command.
- minor nit resolution
- added .mailmap file.  If this is not desired for iproute2, then please
  drop the patch.

Changes since RFC:

- add rd_sendrecv_msg() and make use of it in dev_set as well
  as the new link commands.
- fixed problems with the man pages
- changed the command line to use "netdev" as the keyword
  for the network device, do avoid confused with the ib_device
  name.
- got rid of the "type" parameter for link delete.  Also pass
  down the device index instead of the name, using the common
  rd services for validating the device name and fetching the
  index.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:05:17 -07:00
Steve Wise 1d45bf724e rdma: man page update for link add/delete
Update the 'rdma link' man page with 'link add/delete' info.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:33 -07:00
Steve Wise 4336c5821a rdma: add 'link add/delete' commands
Add new 'link' subcommand 'add' and 'delete' to allow binding a soft-rdma
device to a netdev interface.

EG:

rdma link add rxe_eth0 type rxe netdev eth0
rdma link delete rxe_eth0

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:30 -07:00
Steve Wise 8f5cfd23cd rdma: add helper rd_sendrecv_msg()
This function sends the constructed netlink message and then
receives the response.

Change rd_recv_msg() to display any error messages.

Change 'rdma dev set' to use rd_sendrecv_msg().

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:25 -07:00
Steve Wise 65147bbe8f Add .mailmap file
.mailmap allows tracking multiple email addresses to the proper user name.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:00 -07:00
Leslie Monis 519ace17f9 tc: pie: update man page
Update man page to reflect the changes made in Linux.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Leslie Monis 492ec9558b tc: pie: change maximum integer value of tc_pie_xstats->prob
tc_pie_xstats->prob has a maximum value of (2^64 - 1).

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Stephen Hemminger 6754e1d978 ip: fix typo in iplink_vlan usage message
Need to use bar "|" rather than slash to indicate alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-27 07:56:07 -07:00
Hoang Le 35114a4cfe tipc: add link broadcast man page
Add a man page describing tipc link broadcast command get and set

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:21 -07:00
Hoang Le 5027f233e3 tipc: add link broadcast get
The command prints the actually method that multicast
is running in the system.
Also 'ratio' value for AUTOSELECT method.

A sample usage is shown below:
$tipc link get broadcast
BROADCAST

$tipc link get broadcast
AUTOSELECT ratio:30%

$tipc link get broadcast -j -p
[ {
        "method": "AUTOSELECT"
    },{
        "ratio": 30
    } ]

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:16 -07:00
Hoang Le 0ea46945bc tipc: add link broadcast set method and ratio
The command added here makes it possible to forcibly configure the
broadcast link to use either broadcast or replicast, in addition to
the already existing auto selection algorithm.

A sample usage is shown below:
$tipc link set broadcast BROADCAST
$tipc link set broadcast AUTOSELECT ratio 25

$tipc link set broadcast -h
Usage: tipc link set broadcast PROPERTY

PROPERTIES
 BROADCAST         - Forces all multicast traffic to be
                     transmitted via broadcast only,
                     irrespective of cluster size and number
                     of destinations

 REPLICAST         - Forces all multicast traffic to be
                     transmitted via replicast only,
                     irrespective of cluster size and number
                     of destinations

 AUTOSELECT        - Auto switching to broadcast or replicast
                     depending on cluster size and destination
                     node number

 ratio SIZE        - Set the AUTOSELECT criteria, percentage of
                     destination nodes vs cluster size

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:49 -07:00
David Ahern cdeb2674aa Update kernel headers
Update kernel headers to
    fa7e428c6b7e ("openvswitch: add seqadj extension when NAT is used.")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:05 -07:00
Stephen Hemminger f76ad635f2 man: break long lines in man page sources
No impact for output, just easier to edit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 10:05:31 -07:00
Tobias Jungel b5a754b1db ip: bridge: add mcast to unicast config flag
This adds configuration for the IFLA_BRPORT_MCAST_TO_UCAST flag that
allows multicast packets to be replicated as unicast packets.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 09:44:49 -07:00
David Ahern 3efdd43667 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-20 00:38:08 -07:00
Stephen Hemminger dd4a2b6833 uapi: bpf add set_ce
New api from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:55 -07:00
Stephen Hemminger 828132fdd1 uapi: in6.h add router alert isolate
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:28 -07:00
Stephen Hemminger a7cd7baded uapi: add CAKE FWMARK
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:36:56 -07:00
Stephen Hemminger 8e1554625d rdma: update uapi headers from 5.1-rc1
Update from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:34:32 -07:00
Stephen Hemminger 50cf634899 Merge branch 'master' of ../iproute2-next 2019-03-19 10:32:45 -07:00
Stephen Hemminger 93d0d92f61 v5.0.0 2019-03-19 10:06:19 -07:00
Matteo Croce a0a639d9c0 ip route: get: print JSON output when -j is given
The ip -j option to print output as JSON is ignored when using 'route get':

    $ ip -j route get 127.0.0.1
    local 127.0.0.1 dev lo src 127.0.0.1 uid 1000
        cache <local>

Enable JSON output in iproute_get(), and don't let print_cache_flags() close
the JSON output, as it's not always the last called JSON function.

Tested on different route types:

    $ ip -j -p route get 127.0.0.1
    [ {
            "type": "local",
            "dst": "127.0.0.1",
            "dev": "lo",
            "prefsrc": "127.0.0.1",
            "flags": [ ],
            "uid": 1000,
            "cache": [ "local" ]
        } ]

    $ ip -d -j -p route get 192.0.2.1
    [ {
            "type": "unicast",
            "dst": "192.0.2.1",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "table": "main",
            "prefsrc": "192.168.85.2",
            "flags": [ ],
            "uid": 1000,
            "cache": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:50:01 -07:00
Matteo Croce 0736617738 ip route: print route type in JSON output
ip route generates an invalid JSON if the route type has to be printed,
eg. when detailed mode is active, or the type is different that unicast:

    $ ip -d -j -p route show
    [ {"unicast",
            "dst": "192.168.122.0/24",
            "dev": "virbr0",
            "protocol": "kernel",
            "scope": "link",
            "prefsrc": "192.168.122.1",
            "flags": [ "linkdown" ]
        } ]

    $ ip -j -p route show
    [ {"unreachable",
            "dst": "192.168.23.0/24",
            "flags": [ ]
        },{"prohibit",
            "dst": "192.168.24.0/24",
            "flags": [ ]
        },{"blackhole",
            "dst": "192.168.25.0/24",
            "flags": [ ]
        } ]

Fix it by printing the route type as the "type" attribute:

    $ ip -d -j -p route show
    [ {
            "type": "unicast",
            "dst": "default",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "protocol": "dhcp",
            "scope": "global",
            "metric": 600,
            "flags": [ ]
        },{
            "type": "unreachable",
            "dst": "192.168.23.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "prohibit",
            "dst": "192.168.24.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "blackhole",
            "dst": "192.168.25.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:36 -07:00
Kevin 'ldir' Darbyshire-Bryant ef1e02e6ac tc: m_connmark: fix action error messages
action m_connmark returns error messages identifying itself as the
'simple' action instead of 'connmark' action. e.g.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
simple: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

In what is most likely a copy/paste error from the simple action example
code, fix connmark error messages to identify themselves as coming from
connmark.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
connmark: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

While we're here also fixup the 'Illegal "Zone"' error code to say
'Illegal "zone"' instead of 'Illegal "index"'

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:07 -07:00
David Ahern 7081aa6556 Merge branch 'bond-bridge-xstats-json' into next
Nikolay Aleksandrov  says:

====================

This set adds json output support to the xstats API (patch 01) and then
adds json support to the bridge xstats output (patch 02) and adds xstats
output support (both plain text and json) for the bonding (patch 03).
It doesn't change the bridge's plain text output, but it fixes an
inconsistency that could happen if new bridge xstats attributes were
added (print the interface name once for each group of xstats attrs).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 14:01:35 -07:00
Nikolay Aleksandrov 440c5075d6 ip: bond: add xstats support
Add bond and bond_slave xstats support with optional json output.
Example:
- Plain text:
$ ip link xstats type bond 802.3ad
 bond0
                    LACPDU Rx 2017
                    LACPDU Tx 2038
                    LACPDU Unknown type Rx 0
                    LACPDU Illegal Rx 0
                    Marker Rx 0
                    Marker Tx 0
                    Marker response Rx 0
                    Marker response Tx 0
                    Marker unknown type Rx 0

- JSON:
$ ip -j -p link xstats type bond 802.3ad
  [ {
        "ifname": "bond0",
        "802.3ad": {
            "lacpdu_rx": 219,
            "lacpdu_tx": 241,
            "lacpdu_unknown_rx": 0,
            "lacpdu_illegal_rx": 0,
            "marker_rx": 0,
            "marker_tx": 0,
            "marker_response_rx": 0,
            "marker_response_tx": 0,
            "marker_unknown_rx": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:16 -07:00
Nikolay Aleksandrov a9bc23a792 ip: bridge: add xstats json support
Add json support for bridge's xstats output.
The plain text output format should remain the same.
Note that this patch pulls the interface out of the attribute
loop, this was an oversight when the set was upstreamed. This does not
change the output format, but fixes it when new xstats attributes show
up.

Example:
$ ip -p -j link xstats type bridge
  [ {
        "ifname": "br0",
        "multicast": {
            "igmp_queries": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_reports": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_leaves": {
                "rx": 0,
                "tx": 0
            },
            "igmp_parse_errors": 0,
            "mld_queries": {
                "rx_v1": 33,
                "rx_v2": 0,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_reports": {
                "rx_v1": 66,
                "rx_v2": 2,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_leaves": {
                "rx": 0,
                "tx": 0
            },
            "mld_parse_errors": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:09 -07:00
Nikolay Aleksandrov 8ff3d1d3a3 ip: xstats: add json output support
This adds only initial object support if json argument is specified.
Later patches convert the current xstats users to json.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:55:57 -07:00
Stephen Hemminger f36f8fe535 ipaddress: print error message on stderr
Convention is to print error messages only on stderr.
Helps when scripting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-15 08:30:26 -07:00
Thomas Haller 546109a7cf iprule: fix printing hint about unresolved iifname and oifname
was displayed as

    10:     from all iif eth1 [detached] goto 10000unresolved proto mrt

now:

    10:     from all iif eth1 [detached] goto 10000 [unresolved] proto mrt

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-07 16:14:09 -08:00
David Ahern be029b3a58 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:55:05 -08:00
Roopa Prabhu c5b176e5ba bridge: fdb: add support for src_vni option
We already print src_vni for a fdb entry when present.
This patch adds the ability to set src_vni on a fdb
entry. When not specified, kernel will use vni specified
on the vxlan device. This can be used on a vxlan fdb entry
when the vxlan device is in external or collect metadata
mode.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:52:34 -08:00
Petr Vorel 8b5f9338e5 man: Document COLORFGBG environment variable
Default colors are not contrast enough on dark backround
and this functionality, which uses more suitable colors
is hidden in the code.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
2019-03-01 11:09:18 -08:00
Dmytro Linkin 2f103545a5 tc/pedit: Fix wrong pedit ipv6 structure id
Tc pedit action with more than two ip6 munge in a row cause infinite
loop.

Example:

$ tc filter add dev eth0 protocol ipv6 parent ffff: \
flower ip_proto sctp \
    action pedit ex \
        munge ip6 hoplimit set 0x1 \
        munge ip6 src set 2001:0db8:0:f101::1 \
        munge that cause infinite loop

The example command never returns, instead of failing with parse error
as expected. Pedit ipv6 structure has wrong id, which leads to the
creation linked list with one node in tc/m_pedit.c:get_pedit_kind(),
referring to itself. This node is created if command have two ip6 munge
in a row, and any third ip6 munge will cause infinite loop.
Changing this id from "ipv6" to "ip6" solves the problem.

Fixes: f3e1b2448a ("pedit: Introduce ipv6 support")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-01 11:05:00 -08:00
David Ahern 83a8b8d9f1 Merge branch 'devlink-health' into next
Aya Levin  says:

====================

This series adds support for devlink health commands:
 devlink health show     [ DEV reporter REPORTER_NAME ]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME { grace_period | auto_recover } { msec | boolean }

The first patch refactors the validation of input parameters, which
grow way too long. Second and third patches fix bugs that were
discovered during the devlink health development. The forth patch adds
helper functions which enable output of value and labels separately.
Patches 5-10 add the devlink health functionality by command, the last
is the man page.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 08:00:19 -08:00
Aya Levin 3147e0d372 devlink: Add devlink-health man page
Add a man page describing devlink health's command set. Also add a
reference link from devlink main man page.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:21 -08:00
Aya Levin b18d89195b devlink: Add devlink health set command
Add devlink set command which enables the user to configure parameters
related to the devlink health mechanism per reporter.
1) grace_period [msec] time interval between auto recoveries.
2) auto_recover [true/false] whether the devlink should execute automatic
recover on error.
Add a helper function to retrieve a boolean value as an input parameter.
Example:
$ devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$ devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:16 -08:00
Aya Levin 04b583d0ff devlink: Add devlink health dump clear command
Add devlink dump clear command which deletes the last saved dump file.
Clearing the last saved dump enables a new dump file to be saved.
Example:
$ devlink health dump clear pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:10 -08:00
Aya Levin 041e6e651a devlink: Add devlink health dump show command
Add devlink dump show command which displays the last saved dump.
Devlink health saves a single dump. If a dump is not already stored
by the devlink for this reporter, devlink generates a new dump. The dump
can be generated automatically when a reporter reports on an
error or manually by user's request.
The dump's output is defined by the reporter. The command uses the
infra structure for flexible format output introduced in previous patch.
Example:
$ devlink health dump show pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:04 -08:00
Aya Levin 7b8baf834d devlink: Add devlink health diagnose command
Add devlink health diagnose command: enabling retrieval of diagnostics data
by the user on a reporter on a device. The command's output is a
free text defined by the reporter.

This patch also introduces an infra structure for flexible format
output. This allow the command to display different data fields
according to the reporter.
Example:
$ devlink health diagnose pci/0000:00:0a.0 reporter tx
SQs:
  sqn: 4403 HW state: 1 stopped: false
  sqn: 4408 HW state: 1 stopped: false
  sqn: 4413 HW state: 1 stopped: false
  sqn: 4418 HW state: 1 stopped: false
  sqn: 4423 HW state: 1 stopped: false

$ devlink health diagnose pci/0000:00:0a.0 reporter tx -jp
{
 "SQs":[
      {
       "sqn":4403,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4408,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4413,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4418,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4423,
       "HW state":1,
       "stopped":false
     }
   ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:57 -08:00
Aya Levin 9083a1344d devlink: Add devlink health recover command
Add devlink health recover command which enables the user to initiate a
recovery on a reporter (if a recovery cb was supplied by the reporter).
This operation will increment the recoveries counter displayed in the
show command.
Example:
$ devlink health recover pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:50 -08:00
Aya Levin 2f1242efe9 devlink: Add devlink health show command
Add devlink health show command which displays status and configuration
info on a specific reporter on a device or dump the info on all
reporters on all devices. Add helper functions to display status and
dump's time stamp.
Example:
$ devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
 name tx
  state healthy error 0 recover 1 last_dump_date 2019-02-14 last_dump_time 10:10:10 grace_period 600 auto_recover true
$ devlink health show pci/0000:00:09.0 reporter tx -jp
{
 "health":{
  "pci/0000:00:0a.0":[
     {
     "name":"tx",
     "state":"healthy",
     "error":0,
     "recover":1,
     "last_dump_date":"2019-Feb-14",
     "last_dump_time":"10:10:10",
     "grace_period":600,
     "auto_recover":true
    }
  ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:41 -08:00
Aya Levin 844a61764c devlink: Add helper functions for name and value separately
Add a new helper functions which outputs only values (without name
label) for different types: boolean, uint, uint64, string and binary.
In addition add a helper function which prints only the name label.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:35 -08:00
Aya Levin 8257e6c49c devlink: Fix boolean JSON print
This patch removes the inverted commas from boolean values in JSON
format: true/false instead of "true"/"false".

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:30 -08:00
Aya Levin 86648a1960 devlink: Fix print of uint64_t
This patch prints uint64_t with its corresponding format and avoid implicit
cast to uint32_t.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:54:32 -08:00
Aya Levin ae72e65518 devlink: Refactor validation of finding required arguments
Introducing argument's metadata structure matching a bitmap flag per
required argument and an error message if missing. Using this static
array to refactor validation of finding required arguments in devlink
command line and to ease further maintenance.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:53:36 -08:00
Leon Romanovsky 359f69f76f rdma: Add the prefix for driver attributes
There is a need to distinguish between driver vs. general exposed
attributes. The most common use case is to expose some internal
garbage under extremely common and sexy name, e.g. pi, ci e.t.c

In order to achieve that, we will add "drv_" prefix to all strings
which were received through RDMA_NLDEV_ATTR_DRIVER_* attributes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>a
Tested-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:47 -08:00
Jakub Kicinski d326d79c22 devlink: add support for updating device flash
Add new command for updating flash of devices via devlink API.
Example:

$ cp flash-boot.bin /lib/firmware/
$ devlink dev flash pci/0000:05:00.0 file flash-boot.bin

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:14 -08:00
David Ahern 41fda879a1 Update kernel headers
Update kernel headers to commit:
    ff8285f81822 ("net: sched: pie: fix 64-bit division")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:23:22 -08:00
David Ahern 65a94784fb Merge branch 'rdma-object-ids' into next
Leon Romanovsky says:

====================

This series adds ability to present and query all known to rdmatool
object by their respective, unique IDs (e.g. pdn. mrn, cqn e.t.c).
All objects which have "parent" object has this information too.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:13:21 -08:00
Leon Romanovsky 78728b7ee0 rdma: Provide and reuse filter functions
Globally replace all filter function in safer variants of those
is_filtered functions, which take into account the availability/lack
of netlink attributes.

Such conversion allowed to fix a number of places in the code, where
the previous implementation didn't honor filter requests if netlink
attribute wasn't present.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:24 -08:00
Leon Romanovsky 5a823593d6 rdma: Perform single .doit call to query specific objects
If user provides specific index, we can speedup query
by using .doit callback and save full dump and filtering
after that.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:15 -08:00
Leon Romanovsky 127ff95610 rdma: Unify netlink attribute checks prior to prints
Place check if netlink attribute available in general place,
instead of doing the same check in many paces.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:09 -08:00
Leon Romanovsky 6da9d2517c rdma: Move QP code to separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:05 -08:00
Leon Romanovsky f9a73796d1 rdma: Place PD parsing print routine into separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:00 -08:00
Leon Romanovsky 46695227d6 rdma: Move MR code to be suitable for per-line parsing
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:46 -08:00
Leon Romanovsky 83ea72289e rdma: Refactor CQ prints
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:40 -08:00
Leon Romanovsky 7d06b31f0e rdma: Simplify CM_ID print code
Refactor our the CM_ID print code.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:33 -08:00
Leon Romanovsky 05846c9cd3 rdma: Simplify code to reuse existing functions
Remove duplicated functions in favour general res_print_uint() call.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:28 -08:00
Leon Romanovsky 835d83216b rdma: Properly mark RDMAtool license
RDMA subsystem is dual-licensed with "GPL-2.0 OR Linux-OpenIB" proper
license and Mellanox submission are supposed to have this type of license.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:19 -08:00
Leon Romanovsky 687daf98f9 rdma: Move resource QP logic to separate file
Logically separate resource QP logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:56 -08:00
Leon Romanovsky 438fac3a25 rdma: Move out resource CM-ID logic to separate file
Logically separate resource CM-ID logic to separate file,
in order to make CM-ID specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:48 -08:00
Leon Romanovsky fcdd2e0c68 rdma: Move out resource CQ logic to separate file
Logically separate resource CQ logic to separate file,
in order to make CQ specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:40 -08:00
Leon Romanovsky 42ed283e4a rdma: Refactor out resource MR logic to separate file
Logically separate resource MR logic to separate file,
in order to make MR specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:25 -08:00
Leon Romanovsky cc6131276c rdma: Move resource PD logic to separate file
Logically separate resource PD logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:12 -08:00
Leon Romanovsky 96f59e7fdc rdma: Provide parent context index for all objects except CM_ID
Allow users to correlate allocated object with relevant parent

[leonro@server ~]$ rdma res show pd
dev mlx5_0 users 5 pid 0 comm [ib_core] pdn 1
dev mlx5_0 users 7 pid 0 comm [ib_ipoib] pdn 2
dev mlx5_0 users 0 pid 0 comm [mlx5_ib] pdn 3
dev mlx5_0 users 2 pid 548 comm ibv_rc_pingpong ctxn 0 pdn 4

[leonro@server ~]$ rdma res show cq cqn 0-100
dev mlx5_0 cqe 2047 users 6 poll-ctx UNBOUND_WORKQUEUE pid 0 comm [ib_core] cqn 2
dev mlx5_0 cqe 255 users 2 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 3
dev mlx5_0 cqe 511 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 4
dev mlx5_0 cqe 255 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 5
dev mlx5_0 cqe 255 users 0 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 6
dev mlx5_0 cqe 511 users 2 pid 548 comm ibv_rc_pingpong cqn 7 ctxn 0

[leonro@server ~]$ rdma res show mr
dev mlx5_0 mrlen 4096 pid 548 comm ibv_rc_pingpong mrn 4 pdn 0

[leonro@nps-server-14-015 ~]$ /images/leonro/src/iproute2/rdma/rdma res show qp
link mlx5_0/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 8 type UD state RTS sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_0/1 lqpn 9 pdn 4 rqpn 0 type RC state INIT rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 548 comm ibv_rc_pingpong

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:05:43 -08:00
Leon Romanovsky 1dc035865d rdma: Provide unique indexes for all visible objects
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:04:02 -08:00
Leon Romanovsky beac6a3990 rdma: Remove duplicated print code
There is no need to keep same print functions for
uint32_t and uint64_t, unify them into one function.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:58 -08:00
Leon Romanovsky a985cc06bd rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
f2a0e45f36b0 RDMA/nldev: Don't expose number of not-visible entries

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:14 -08:00
David Ahern 55870dfe7f Improve batch and dump times by caching link lookups
ip route uses ll_name_to_index and ll_index_to_name to convert between
device names and indices. At the moment both use for the ioctl based glibc
functions if_nametoindex and if_indextoname and does not cache the result.
When using a batch file or dumping large number of routes this means the
same device lookups can be done repeatedly adding unnecessary overhead
(socket + ioctl + close for each device lookup).

Add a new function, ll_link_get, to send a netlink based RTM_GETLINK. If
successful, cache the result in idx_head and name_head so future lookups
can re-use the entry. Update ll_name_to_index and ll_index_to_name to use
ll_link_get and only fallback to the glibc functions if it fails.

With this change the time to install 720,022 routes with 2 ecmp nexthops
where the nexthop device is given is reduced from 31.4 seconds to 19.2
seconds. A dump of those routes drops from 13.3 to 2.8 seconds.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:20 -08:00
David Ahern db1aafd883 ip link: Drop cache entry on any changes
Remove any entry from the link cache when the link is modified.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:18 -08:00
David Ahern 25c6339b22 ll_map: Add function to remove link cache entry by index
Add ll_drop_by_index to remove an entry from the link cache.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:15 -08:00
David Ahern 9f78e995a8 Merge branch 'iproute2-master' into next
Conflicts:
	misc/ss.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:50:39 -08:00
Stefano Brivio aa5bd6a252 ss: Render buffer to output every time a number of chunks are allocated
Eric reported that, with 10 million sockets, ss -emoi (about 1000 bytes
output per socket) can easily lead to OOM (buffer would grow to 10GB of
memory).

Limit the maximum size of the buffer to five chunks, 1M each. Render and
flush buffers whenever we reach that.

This might make the resulting blocks slightly unaligned between them, with
occasional loss of readability on lines occurring every 5k to 50k sockets
approximately. Something like (from ss -tu):

[...]
CLOSE-WAIT   32       0           192.168.1.50:35232           10.0.0.1:https
ESTAB        0        0           192.168.1.50:53820           10.0.0.1:https
ESTAB       0        0           192.168.1.50:46924            10.0.0.1:https
CLOSE-WAIT  32       0           192.168.1.50:35228            10.0.0.1:https
[...]

However, I don't actually expect any human user to scroll through that
amount of sockets, so readability should be preserved when it matters.

The bulk of the diffstat comes from moving field_next() around, as we now
call render() from it. Functionally, this is implemented by six lines of
code, most of them in field_next().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:45:45 -08:00
Thomas De Schampheleire 9700927a00 ss: fix compilation under glibc < 2.18
Commit c759116a0b introduced support for
AF_VSOCK. This define is only provided since glibc version 2.18, so
compilation fails when using older toolchains.

Provide the necessary definitions if needed.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:40:52 -08:00
Stephen Hemminger 6f618a6a82 uapi: update inet_diag_info.h
Upstream changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:24:07 -08:00
Vivien Didelot 02723cf230 bridge: make mcast_flood description consistent
This patch simply changes the description of the mcast_flood flag
with "flood" instead of "be flooded with" to avoid confusion, and be
consistent with the description of the flooding flag, which "Controls
whether a given port will *flood* unicast traffic for which there is
no FDB entry."

At the same time, fix the documentation for the "flood" flag which
is incorrectly described as "flooding on" or "flooding off".

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:22:05 -08:00
Jiri Pirko 0e7e181945 devlink: relax dpipe table show dependency on resources
Dpipe table show command has a depencency on getting resources.
If resource get command is not supported by the driver, dpipe table
show fails. However, resource is only additional information
in dpipe table show output. So relax the dependency and let
the dpipe tables be shown even if resources get command fails.

Fixes: ead180274c ("devlink: Add support for resource/dpipe relation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:21:19 -08:00
Phil Sutter d7cf2416fc ip-address: Use correct max attribute value in print_vf_stats64()
IFLA_VF_MAX is larger than the highest valid index in vf array.

Fixes: a1b99717c7 ("Add displaying VF traffic statistics")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:16:08 -08:00
Thomas Haller f5f8e96953 ip-rule: fix json key "to_tbl" for unspecific rule action
The key should not be called "to_tbl" because it is exactly
not a FR_ACT_TO_TBL action. Change it to "action".

    # ip rule add blackhole
    # ip -j rule | python -m json.tool
    ...
    {
        "priority": 0,
        "src": "all",
        "to_tbl": "blackhole"
    },

This is an API break of JSON output as it was added in v4.17.0.
Still change it as the API is relatively new and unstable.

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:21:06 -08:00
Luca Boccassi c2f9dc14c4 ip route: get: allow zero-length subnet mask
A /0 subnet mask is theoretically valid, but ip route get doesn't allow
it:

$ ip route get 1.0.0.0/0
need at least a destination address

Change the check and remember whether we found an address or not, since
according to the documentation it's a mandatory parameter.

$ ip/ip route get 1.0.0.0/0
1.0.0.0 via 192.168.1.1 dev eth0 src 192.168.1.91 uid 1000
    cache

Reported-by: Clément Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:19:31 -08:00
Matteo Croce 619765fe14 iplink: document XDP subcommand to force the XDP mode.
When attaching an eBPF program to a device, ip link can force the XDP mode
by using the xdp{generic,drv,offload} keyword instead of just 'xdp'.
Document this behaviour also in the help output.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Fixes: 14683814 ("bpf: add xdpdrv for requesting XDP driver mode")
Fixes: 1b5e8094 ("bpf: allow requesting XDP HW offload")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 14:02:44 -08:00
Konstantin Khlebnikov 0f3f0ca3a2 ss: add option --tos for requesting ipv4 tos and ipv6 tclass
Also show socket class_id/priority used by classful qdisc.
Kernel report this together with tclass since commit
("inet_diag: fix reporting cgroup classid and fallback to priority")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:59:11 -08:00
Eric Dumazet bb5ae621d0 lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.

Commit 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.

See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.

Fixes: 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:51:44 -08:00
Davide Caratti ca81444303 use print_{,h}hu instead of print_uint when format specifier is %{,h}hu
in this way, a useless cast to unsigned int is avoided in bpf_print_ops()
and print_tunnel().

Tested with:
 # ./tdc.py -c bpf

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-10 19:00:59 -08:00
Marcos Antonio Moraes 9e46c5c206 tc: use bits not mbits/sec in rate percent
As /sys/class/net/<iface>/speed indicates a value in Mbits/sec, the
conversion is necessary to create the correct limits.

This guarantees the same result for the following commands in an
1000Mbit/sec device:

tc class add ... htb rate 500Mbit
tc class add ... htb rate 50%

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Marcos Antonio Moraes <marcos.antonio@digirati.com.br>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-08 09:59:45 -08:00
Stephen Hemminger 817204d0b0 tc: avoid problems with hard coded rate string length
The parse_percent_rate function assumed the buffer was 20 characters.
Better to pass length in case the size ever changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:49:47 -08:00
Stephen Hemminger 2d603d55a8 tc: fix memory leak in error path
If value passed to parse_percent was not valid, it would
leak the dynamic allocation from sscanf.

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:41:58 -08:00
Jakub Kicinski 05bc89e95e devlink: add info subcommand
Add support for reading the device serial number, driver name
and various versions.  Example:

$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
  driver nfp
  serial_number 16240145
  versions:
      fixed:
        board.id AMDA0081-0001
        board.rev 15
        board.vendor SMA
        board.model hydrogen
      running:
        fw.mgmt 010181.010181.0101d4
        fw.cpld 0x1030000
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536
      stored:
        fw.mgmt 010181.010181.0101d4
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536

$ devlink -jp dev info pci/0000:82:00.0
{
    "info": {
        "pci/0000:82:00.0": {
            "driver": "nfp",
            "serial_number": "16240145",
            "versions": {
                "fixed": {
                    "board.id": "AMDA0081-0001",
                    "board.rev": "15",
                    "board.vendor": "SMA",
                    "board.model": "hydrogen"
                },
                "running": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.cpld": "0x1030000",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                },
                "stored": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                }
            }
        }
    }
}

v5:
 - remove spurious new line.
v4:
 - more commit message improvements.
v3:
 - show up-to-date output in the commit message.
v2 (Jiri):
 - remove filtering;
 - add example in the commit message.
RFCv2:
 - make info subcommand of dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:50 -08:00
Jakub Kicinski 9d886c6609 devlink: report cell size
Print the value of DEVLINK_ATTR_SB_POOL_CELL_SIZE, if reported.

Example:
pci/0000:82:00.0:
  sb 1 pool 0 type egress size 40945664 thtype static cell_size 2048
  sb 2 pool 0 type egress size 258867200 thtype static cell_size 10240
...

v3: - don't double space.
v2: - fix spelling.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:22 -08:00
David Ahern 6b2d60bdfc Update kernel headers
Update kernel headers to commit:
bfbae2eafe05 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:45:41 -08:00
Yonghong Song 3da6d055d9 bpf: add btf func and func_proto kind support
The issue is discovered for bpf selftest test_skb_cgroup.sh.
Currently we have,
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  Object has unknown BTF type: 13!
  [PASS]

In the above the BTF type 13 refers to BTF kind
BTF_KIND_FUNC_PROTO.
This patch added support of BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC during type parsing.
With this patch, I got
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  [PASS]

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:29:20 -08:00
Ido Schimmel 264be1d887 bridge: fdb: Fix FDB dump with strict checking disabled
While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.

Fix this by using RTM_GETNEIGH instead.

Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002

After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent

Fixes: 05880354c2 ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:27:28 -08:00
Chris Mi 17ed56fdf3 libnetlink: linkdump_req: AF_PACKET family also expects ext_filter_mask
Without this fix, the VF info can't be showed using command
"ip link".

146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
    vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")

Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:25:43 -08:00
Davide Caratti e8a3d76919 tc: add 'kind' property to 'csum' action
unlike other TC actions already supporting JSON printout, 'csum' does not
print the value of TCA_KIND in the 'kind' property: remove 'csum' word
from 'csum' property, and add a separate 'kind' property containing the
action name. The human-readable printout is preserved.

Tested with:
 # ./tdc.py -c csum

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:38 -08:00
Davide Caratti 52d57f6bbd tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.

Example using eBPF:

 # tc actions flush action bpf
 # tc action add action bpf object bpf/action.o section 'action-ok'
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bpf_name": "action.o:[action-ok]",
         "prog": {
           "id": 33,
           "tag": "a04f5eef06a7f555",
           "jited": 1
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 1,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Example using cBPF:

 # tc actions flush action bpf
 # a=$(mktemp)
 # tcpdump -ddd not ether proto 0x888e >$a
 # tc action add action bpf bytecode-file $a index 42
 # rm $a
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bytecode": {
           "length": 4,
           "insns": [
             {
               "code": 40,
               "jt": 0,
               "jf": 0,
               "k": 12
             },
             {
               "code": 21,
               "jt": 0,
               "jf": 1,
               "k": 34958
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 0
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 262144
             }
           ]
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 42,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Tested with:
 # ./tdc.py -c bpf

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:10 -08:00
Björn Töpel 2abc3d76e3 ss: add AF_XDP support
AF_XDP is an address family that is optimized for high performance
packet processing.

This patch adds AF_XDP support to ss(8) so that sockets can be queried
and monitored.

Example:
$ sudo ss --xdp -e -p -m
Recv-Q      Send-Q           Local Address:Port             Peer Address:Port

0           0                   enp134s0f0:q20                          *
 users:(("xdpsock",pid=17787,fd=3)) ino:39424 sk:4
        rx(entries:2048)
        tx(entries:2048)
        umem(id:1,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:7,
qid:20,zc:0,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)
0           0                    enp24s0f0:q0                           *
 users:(("xdpsock",pid=17780,fd=3)) ino:37384 sk:5
        rx(entries:2048)
        tx(entries:2048)
        umem(id:0,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:6,
qid:0,zc:1,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-30 20:57:45 -08:00
David Ahern f79b7733b4 Update kernel headers and add xdp_diag.h
Update kernel headers to commit:
c829f5f52db9 ("cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()")

and import xdp_diag.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:34:40 -08:00
Matteo Croce e3dbcb2a12 netns: add subcommand to attach an existing network namespace
ip tracks namespaces with dummy files in /var/run/netns/, but can't see
namespaces created with other tools.
Creating the dummy file and bind mounting the correct procfs entry will
make ip aware of that namespace.
Add an ip netns subcommand to automate this task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:18:03 -08:00
Stephen Hemminger 6f1940da8e tc: replace left side comparison
The kernel (and iproute2) don't use the if (NULL == x) style
and instead prefer if (!x)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-28 08:51:03 -08:00
Hans Dedecker 2874714662 f_flower: fix build with musl libc
XATTR_SIZE_MAX requires the usage of linux/limits.h; let's include it

Signed-off-by: Hans Dedecker <dedeckeh@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:20:03 +13:00
wenxu 3d65cefbef iproute: Set ip/ip6 lwtunnel flags
ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

User can set flags with key, csum, seq
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 key csum dev gretap

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:17:27 +13:00
David Ahern b45664e064 Merge 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-22 08:30:38 -08:00
Jakub Kicinski 8513f4a926 ip route: get: only set RTM_F_LOOKUP_TABLE flag for IPv4
Kernel ignores the RTM_F_LOOKUP_TABLE flag for all families
but IPv4.  Don't set it, otherwise it may fall foul of
strict checking policies.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:13 +13:00
Adi Nissim dc0332b1e8 tc: m_tunnel_key: Allow key-less tunnels
Change the id parameter of the tunnel_key set action from mandatory to
optional.

Some tunneling protocols (e.g. GRE) specify the id as an optional field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:07 +13:00
Stephen Hemminger 3bc2dc7668 uapi: in.h change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:03:31 +13:00
Benedict Wong a6af9f2e61 xfrm: add option to hide keys in state output
ip xfrm state show currently dumps keys unconditionally. This limits its
use in logging, as security information can be leaked.

This patch adds a nokeys option to ip xfrm ( state show | monitor ), which
prevents the printing of keys. This allows ip xfrm state show to be used
in logging without exposing keys.

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:31:20 -08:00
Cong Wang b0ca46a1f8 tc: add hit counter for matchall
Cc: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:30:07 -08:00
David Ahern dad02ef478 Update kernel headers
Update kernel headers to commit
28f9d1a3d4fe ("Merge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:29:26 -08:00
Leon Romanovsky b058f969df rdma: Add unbound workqueue to list of poll context types
Kernel commit f794809a7259 ("IB/core: Add an unbound WQ type to the new CQ API")
added new CQ poll context type, reflect this change in rdmatool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:19:44 -08:00
Leon Romanovsky 54a0ade6d4 clang-format: add configuration file
The codebase of iproute2 follows Linux kernel coding style,
so it will be very helpful to reuse existing clang configuration
file to reliably format code.

For more information see kernel commit d4ef8d3ff005
("clang-format: add configuration file").

Updated upto commit v5.0-rc1 with small number of ForEachMacros.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-17 13:38:23 -08:00
Luca Boccassi 0cf061183e Makefile: check manpages for syntax errors
Pass the same parameters Lintian uses in Debian.

$ make check
<...>
Checking manpages for syntax errors...
<standard input>:48: warning: macro `Q' not defined
Error in tc-taprio.8
Makefile:27: recipe for target 'check' failed

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi 8242808ced man: tc-taprio.8: fix syntax error
.Q does not exist so groff complains and the "queues" word is actually
not displayed.

Fixes: 579acb4bc5 ("taprio: Add manpage for tc-taprio(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi cffeeb3946 man: ss.8: more line breaks
groff stiff complains about unbreakable lines:
  96: warning [p 2, 3.0i]: can't break line

Indent it some more.

Fixes: 7f5047524c ("man: ss.8: break and indent long line")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
David Ahern 3d14706e54 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-07 16:30:13 -08:00
Dmitry V. Levin db4ad742e1 configure: fix typo in check_xt_old_internal_h
Fixes: 377a09902a ("configure: Minor code cleanup")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 14:42:01 -08:00
Stephen Hemminger 80e5ddec14 rdma: update uapi headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 11:41:39 -08:00
Stephen Hemminger e1ccc46bdd uapi: update headers from 4.21-rc1
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2019-01-07 11:39:26 -08:00
Stephen Hemminger 724ec5aeb0 Merge ../iproute2-next 2019-01-07 11:36:41 -08:00
Stephen Hemminger 97864a5af3 v4.20.0 2019-01-07 10:24:02 -08:00
Tobias Jungel c9159af51a ipneigh: print dst for AF_BRIDGE
In case a neighbour message is of family AF_BRIDE the NDA_DST attribute
was not printed so far. With this patch the family is evaluated to pass
the correct family to format_host_rta.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
2019-01-07 10:22:03 -08:00
David Ahern 97b44d571d libnetlink: linkdump_req is done for AF_BRIDGE as well
The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-07 08:36:58 -08:00
David Ahern dfa2c3787f Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iprule.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:22:47 -08:00
David Ahern 6267d9533b Merge branch 'strict-updates' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:19:37 -08:00
David Ahern 05880354c2 bridge: fdb: Fix filtering with strict checking disabled
Older kernels expect an ifinfomsg struct as the ancillary header, and
after kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header") can handle either ifinfomsg or ndmsg. Strict data checking only
allows ndmsg.

Use the new RTNL_HANDLE_F_STRICT_CHK flag to know which header to send.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-04 12:17:19 -08:00
David Ahern 285033bfeb libnetlink: Add RTNL_HANDLE_F_STRICT_CHK flag
Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.

Extracted from patch from Ido to fix filtering with strict checking
enabled.

Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:17 -08:00
David Ahern 66b4199f22 bridge: Update fdb show to use rtnl_neighdump_req
Add fdb_dump_filter to set filter attributes in dump request
and convert fdb_show to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:15 -08:00
David Ahern 101ec10a76 ip neigh: Convert do_show_or_flush to use rtnl_neighdump_req
Add ipneigh_dump_filter to add filter attributes to the neighbor
dump request and update do_show_or_flush to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:13 -08:00
David Ahern f255ab1225 libnetlink: Add filter function to rtnl_neighdump_req
Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:11 -08:00
Leon Romanovsky f0cabaca38 rdma: Fix incorrectly handled NLA validation
mnl_attr_type_valid() receives maximum attribute type, which means that
we were supposed to supply the latest valid netlink attribute and not
the number of attributes. Such coding mistake caused to failures while
NLA attributes were extended.

Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 22:15:13 -08:00
wenxu cb65a9cb81 iprule: Add tun_id filed in the selector
ip rule add from all iif gretap tun_id 2000 lookup 200

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 22:13:13 -08:00
Eric Dumazet 72cdb77d1a nstat: fix load_ugly_table() limits
A recent change reduced max line length from 4096 to 2048 bytes,
but we already have lines above the 2048 threshold, and we keep
adding more SNMP counters in linux.

Switch to getline() and do not worry about future kernel changes.

Fixes: da8034a019 ("misc: avoid snprintf warnings in ss and nstat")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 21:45:53 -08:00
Ido Schimmel 66e8e73edc bridge: fdb: Use 'struct ndmsg' for FDB dumping
Since commit aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on
socket") iproute2 uses strict checking on kernels that support it. This
causes FDB dumping to fail [1], as iproute2 uses 'struct ifinfomsg'
whereas the kernel expects 'struct ndmsg'.

Note that with this change iproute2 continues to work on old kernels
that do not support strict checking, but contain the fix introduced in
kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header").

[1]
# bridge fdb show
[ 5365.137224] netlink: 4 bytes leftover after parsing attributes in process `bridge'.
Error: bytes leftover after parsing attributes.
Dump terminated

Fixes: aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-30 16:56:34 -08:00
Michael Guralnik 40fc8c2cec rdma: Add print of link CapabilityMask2 flags
CapabilityMask2 is defined in IBTA spec as a member of PortInfo.
Add translation to string of new CapabilityMask2 expansion of link caps.

The flags are concatenated to current caps print as seen in this example
printing EXT_INFO flag:

root@server-22 $ rdma -d link
1/1: mlx5_0/1: subnet_prefix fe80:0000:0000:0000 lid 2 sm_lid 2 lmc 0
	state ACTIVE physical_state LINK_UP
caps: <SM, TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS,
	CAP_MASK2, CM, DEVICE_MGMT, VENDOR_CLASS, CAP_MASK_NOTICE,
	CLIENT_REG, OTHER_LOCAL_CHANGES, MULT_PKER_TRAP, EXT_INFO>

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:41:21 -08:00
David Ahern 0c187b7f24 Merge branch 'strict-dumps' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:44 -08:00
David Ahern 6b83edc061 neighbor: Add support for protocol attribute
Add support to set protocol on neigh entries and to print the protocol
on dumps.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:12 -08:00
David Ahern 8d4f35de17 ip route: Rename do_ipv6 to dump_family
do_ipv6 is really the preferred dump family. Rename it to make
that apparent.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:51 -08:00
David Ahern aea41afcfd ip bridge: Set NETLINK_GET_STRICT_CHK on socket
iproute2 has been updated for the new strict policy in the kernel. Add a
helper to call setsockopt to enable the feature. Add a call to ip.c and
bridge.c

The setsockopt fails on older kernels and the error can be safely ignored
- any new fields or attributes are ignored by the older kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:29 -08:00
David Ahern 8847097850 ip address: Set device index in dump request
Add a filter function to rtnl_addrdump_req to set device index in the
address dump request if the user is filtering addresses by device. In
addition, add a new ipaddr_link_get to do a single RTM_GETLINK request
instead of a device dump yet still store the data in the linfo list.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:49 -08:00
David Ahern 7ca9cee8d8 ip address: Split ip_linkaddr_list into link and addr functions
Split ip_linkaddr_list into one function that generates a list of devices
and a second that generates the list of addresses.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:14 -08:00
David Ahern e41ede8939 mroute: Add table id attribute for kernel side filtering
Similar to 'ip route' add the table id to the dump request for
kernel side filtering if it is supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:50 -08:00
David Ahern 98ce99273f mroute: fix up family handling
Only ipv4 and ipv6 have multicast routing. Set family
accordingly and just return for other cases.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:28 -08:00
David Ahern c7e6371bc4 ip route: Add protocol, table id and device to dump request
Add protocol, table id and device to dump request if set in filter. If
kernel side filtering is supported it is used to reduce the amount of
data sent to userspace.

Older kernels do not parse attributes on a route dump request, so these
are silently ignored and ip will do the filtering in userspace.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:59 -08:00
David Ahern 43fd93ae46 ip route: Remove rtnl_rtcache_request
Add a filter option to rtnl_routedump_req and use it to set rtm_flags
removing the need for rtnl_rtcache_request for dump requests.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:34 -08:00
David Ahern d97b16b2c9 libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask
Only AF_UNSPEC handled by rtnl_dump_ifinfo expects an ext_filter_mask
on a dump request. Update the linkdump request functions to only set
and send ext_filter_mask for AF_UNSPEC.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:05 -08:00
David Ahern 92e03242c4 libnetlink: Use NLMSG_LENGTH to set nlmsg_len
Change nlmsg_len from sizeof(req) to use NLMSG_LENGTH on the header.
2 of the inner headers are not 4-byte aligned, so add a 0-length buf
after the header with the __aligned(NLMSG_ALIGNTO) to ensure the size
of the request is large enough. Use NLMSG_ALIGN in NLMSG_LENGTH to set
nlmsg_len.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:57 -08:00
David Ahern 2750252d7e libnetlink: dump extack string in done message
Print any extack message that has been appended to a NLMSG_DONE message.
To avoid duplication, move the existing print code to a new helper.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:31 -08:00
David Ahern fdce94d0d1 Update kernel headers
Update kernel headers to commit
ce28bb445388 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-22 07:36:52 -08:00
Petr Vorel 261a5290dd testsuite: Fix colorize
bash and dash require for escape sequence to use 'echo -e' or printf
(but working on zsh). Choosing printf as it's implementation is IMHO
more portable than echo implementations.
dash also require to use \033[0; as escape sequence instead of \e[0;

NOTE: \e[0; kept in lib/color.c as it's not problematic for C code
(working when run ip on various shells).

Fixes: 7e2f71b4 ("testsuite: colorize test result output")

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-20 20:16:28 -08:00
Stephen Hemminger c579ec14a7 uapi/iptunnel: make TUNNEL_FLAGS available
ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap Key example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

In the lwtunnel situation, some TUNNEL_FLAGS should can be set by
userspace

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:19:33 -08:00
Stephen Hemminger 2db63d290b uapi/netlink.h: rename NETLINK_DUMP_STRICT_CHK -> NETLINK_GET_STRICT_CHK
NETLINK_DUMP_STRICT_CHK can be used for all GET requests,
dumps as well as doit handlers.  Replace the DUMP in the
name with GET make that clearer.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:18:29 -08:00
Stephen Hemminger ef7f9fae2e uapi/in.h: Allow class-e address assignment
While most distributions long ago switched to the iproute2 suite
of utilities, which allow class-e (240.0.0.0/4) address assignment,
distributions relying on busybox, toybox and other forms of
ifconfig cannot assign class-e addresses without this kernel patch.

While CIDR has been obsolete for 2 decades, and a survey of all the
open source code in the world shows the IN_whatever macros are also
obsolete... rather than obsolete CIDR from this ioctl entirely, this
patch merely enables class-e assignment, sanely.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:17:05 -08:00
David Ahern 17689d3075 Update kernel headers
Update kernel headers to commit
   055722716c39 ("tipc: fix uninitialized value for broadcast retransmission")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:47:29 -08:00
Stephen Hemminger db9af1f1e3 testsuite: drop unrunnable test
The classifier testbed test never worked and was always being
skipped. It depended on some files it tests/cls which never made
it into the iproute2 git repository.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:10:36 -08:00
Stephen Hemminger e5cd5a51f9 doc: remove trailing whitespace
Run whitespace scrubbing script to remove unnecessary trailing
blanks at end of line and end of file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:38 -08:00
David Ahern 6065ddfaa7 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:17 -08:00
Petr Vorel 8b2ea19276 examples: Remove cbq.init-v0.7.3
This script is obsolete.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel bb955fd127 examples: Remove dhcp-client-script
This script is obsolete.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel 377a09902a configure: Minor code cleanup
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel fce84d6450 configure: Remove non-posix shell expansion
+ change shebang to /bin/sh

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel 3de834e6e2 configure: Remove unused function check_prog()
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel ec7cac05ff tests: Use /bin/sh shebang
Bashisms for tests were removed in ecd44e68 ("tests: Remove
bashisms (s/source/.)"), so no need to use bash shebang.

+ remove trailing whitespace.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel ee32695387 man: rtpr: Rename s/bash/shell/
ip/rtpr mentioned in man as bash script is actually posix shell script
(doesn't require to use bash).

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Luca Boccassi 85bcb524a2 testsuite: remove gre kmods if the test loads them
The tunnel test leaves behind link devices created by the GRE kernel
modules:

$ ip -br link
...
gre0@NONE    DOWN 0.0.0.0 <NOARP>
gretap0@NONE DOWN 00:00:00:00:00:00 <BROADCAST,MULTICAST>
erspan0@NONE DOWN 00:00:00:00:00:00 <BROADCAST,MULTICAST>
ip6tnl0@NONE DOWN :: <NOARP>
ip6gre0@NONE DOWN 00:00:00:00:

$ lsmod | grep gre
ip6_gre      40960  0
ip6_tunnel   40960  1 ip6_gre
ip_gre       32768  0
ip_tunnel    24576  1 ip_gre
gre          16384  2 ip6_gre,ip_gre

Check beforehand if the gre kernel module is loaded, and if not unload
them all at the end of the test. This should avoid causing problems if
a user is already using GRE for other purposes.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi eaed928b64 testsuite: delete dummy interface after default route test
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi 61f9ade9fb testsuite: declare dependency between $(TESTS) and generate_nlmsg
Parallel make from the top level directory fails since tests are at the
same time as generate_nlmsg:

$ make check -j4

...

cd testsuite && make && make alltests
echo "Entering iproute2" && cd iproute2 && make configure && cd ..;
Entering iproute2
make -C tools
Removing results dir ...
make[1]: ./tools/generate_nlmsg: Command not found
make[1]: ./tools/generate_nlmsg: Command not found
Makefile:64: recipe for target 'ip/netns/set_nsid_batch.t' failed
make[1]: *** [ip/netns/set_nsid_batch.t] Error 127
make[1]: ./tools/generate_nlmsg: Command not found
make[1]: *** Waiting for unfinished jobs....
Makefile:64: recipe for target 'ip/netns/set_nsid.t' failed
make[1]: *** [ip/netns/set_nsid.t] Error 127
Makefile:64: recipe for target 'ip/link/show_dev_wo_vf_rate.t' failed
make[1]: *** [ip/link/show_dev_wo_vf_rate.t] Error 127
    CC       generate_nlmsg
Makefile:123: recipe for target 'check' failed
make: *** [check] Error 2

Add an explicit dependency in testuite/Makefile's $(TESTS) rule so
that the tool correctly gets compiled before any test runs.

Fixes: 3537633dcf ("testsuite: Generate generate_nlmsg when needed")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi 0115d55e9f Makefile: have check target depend on all
Otherwise it will simply fail immediately from a just-cleaned
workspace:

$ make check -j1
cd testsuite && make && make alltests
echo "Entering iproute2" && cd iproute2 && make configure && cd ..;
Entering iproute2
make -C tools
Makefile:3: ../../config.mk: No such file or directory
make[2]: *** No rule to make target '../../config.mk'.  Stop.

Fixes: 8804a8c0d3 ("Makefile: Add check target")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Masatake YAMATO cec6b03124 man: ss: fix typos about wscale
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 07:50:41 -08:00
Stephen Hemminger 738aebe52b drop support for DECnet
DECnet belongs in the history museum of dead protocols along
with Appletalk and IPX.

Linux support has outlived its natural life and the time has
come to remove it from iproute2. Dead code is a source
of bugs and exploits.

If anyone actually has DECnet running on some old distribution
they can just keep to the old version of iproute2.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-13 12:50:01 -08:00
Syrone Wong 6ddb36c3a9 tc: fix xtables incorrect usage of LDFLAGS
The incorrect setting of LDFLAGS causes error below:

> em_ipt.o: In function `em_ipt_print_epot':
> em_ipt.c:(.text.em_ipt_print_epot+0x2e): undefined reference to
> `xtables_init_all'

em_ipt.c gets involved when TC_CONFIG_XT=y, which requires xtables,
while tc/Makefile doesn't pass flags correctly. It adds '-lxtables'
to LDFLAGS instead of LDLIBS.

Fixes: dd296215 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")

Signed-off-by: Syrone Wong <wong.syrone@gmail.com>
Acked-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:38:43 -08:00
Stephen Hemminger 3a1f602ade remove redundant long int
Using unsigned long is sufficient no need to be more
verbose and use unsigned long int.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:36:59 -08:00
Leon Romanovsky 378dd31b4b rdma: Fix broken 32-bit compilation
Allow compilation of rdmatool on 32-bits platforms.

rdma
    CC       rdma.o
    CC       utils.o
    CC       dev.o
    CC       link.o
In file included from rdma.h:26:0,
                 from dev.c:12:
dev.c: In function 'dev_caps_tostr':
../include/utils.h:269:38: warning: left shift count >= width of type [-Wshift-count-overflow]
 #define BIT(nr)                 (1UL << (nr))
                                      ^
rdma.h:32:61: note: in expansion of macro 'BIT'
 #define RDMA_BITMAP_ENUM(name, bit_no) RDMA_BITMAP_##name = BIT(bit_no),
                                                             ^~~

Fixes: 40df8263a0 ("rdma: Add dev object")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:34:44 -08:00
Stephen Hemminger 90c5c969f0 fix print_0xhex on 32 bit
The argument to print_0xhex is converted to unsigned long long
so the format string give for normal printout has to be some
variant of %llx. Otherwise, bogus values will be printed on
32 bit platforms.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 14:20:32 -08:00
Stephen Hemminger 33fde2b600 lib/bpf: fix build warning if no elf
Function was not used unlesss HAVE_ELF causing:

bpf.c:105:13: warning: ‘bpf_map_offload_neutral’ defined but not used [-Wunused-function]

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 13:50:17 -08:00
Stephen Hemminger 79940533c0 ipmacsec: fix warning on 32bit platform
On some 32 bit platforms, the printf was causing warning:
ipmacsec.c: In function ‘getattr_u64’:
ipmacsec.c:655:47: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘unsigned int’ [-Wformat=]
   fprintf(stderr, "invalid attribute length %lu\n",

Resolve by computing length as size_t first.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 13:47:58 -08:00
Stephen Hemminger 028766aed2 uapi: update bpf header
Changes from 4.20-rc6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 09:22:23 -08:00
David Ahern fbe7da2306 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:02:08 -08:00
Shalom Toledo a463fd4fa4 devlink: Add support for 'fw_load_policy' generic parameter
Add string to uint conversion for 'fw_load_policy' generic parameter.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:40 -08:00
Shalom Toledo 2557dca2b0 devlink: Add string to uint{8,16,32} conversion for generic parameters
Allow setting u{8,16,32} generic parameters as a well defined strings in
devlink user space tool.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:38 -08:00
Stephen Hemminger a9c49b8f8f rdma: align uapi headers with 4.20-rc5
Upstream headers were updated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:25:59 -08:00
Hoang Le 853adffe13 tipc: fix misalignment printout in non-JSON output
In the commit 1304f50a5b ("tipc: JSON support for showing nametable"),
introduced misalignment in the columns of the printout in non-JSON mode
compare to the list header. Add one space per column to make alignment
with the list header.

before:
$tipc name show
Type       Lower      Upper      Scope    Port       Node
1         1         1         node    4071367628

after:
$tipc name show
Type       Lower      Upper      Scope    Port       Node
1          1          1          node     4071367628

Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:24:01 -08:00
Martin Jeřábek 2e320d8b7e ip: iplink_can.c: fix json formatting
Previously the CAN state was always printed in human-readable txt format,
resulting in invalid JSON.

Signed-off-by: Martin Jeřábek <martin.jerabek01@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:22:29 -08:00
Petr Machata 4ac4217464 testsuite: Add a test for batch processing
Test that when a second or following command in a batch fails, tc
reports it correctly. This is a test for the previous patch.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:28:31 -08:00
Petr Machata 0951cbcddf libnetlink: Process further iovs on no error
When no error is reported in the first iov, do not prematurely return,
but process further iovs. This fixes batch processing.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:28:31 -08:00
Emeric Dupont a7a7e45017 iproute2: Installation errors without libmnl
When performing make install in iproute2 (current git master),
     if $(HAVE_MNL) is not selected, some Makefiles try to call
     install with an empty target, which causes a non-critical make error.

Signed-off-by: Emeric Dupont <emeric.dupont@zii.aero>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:27:08 -08:00
Stephen Hemminger f3188cfa39 devlink: don't need to call pkg-config twice
pkg-config for libmnl is already done in config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:19:33 -08:00
Amritha Nambiar 8930840e67 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v6:
Modified to change json output format as object for sport/dport.

 "dst_port":{
           "start":2000,
           "end":6000
 },
 "src_port":{
           "start":50,
           "end":60
 }

v5:
Simplified some code and used 'sscanf' for parsing. Removed
space in output format.

v4:
Added man updates explaining filtering based on port ranges.
Removed 'range' keyword.

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:02:58 -08:00
David Ahern dd7d522a67 Revert "tc: flower: Classify packets based port ranges"
This reverts commit e20e50b0c1.

Inadvertently pushed v3 of this patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:01:07 -08:00
David Ahern fb417073a3 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 15:39:29 -08:00
Andrea Claudi b876b7e2b4 l2tp: Fix printing of cookie and peer_cookie values
print_cookie() invocations miss %s format specifier.
While at it, align printout to the previous lines.

Fixes: 98453b6580 ("ip/l2tp: add JSON support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:35:58 -08:00
Eric Dumazet 3adcbf3757 tc: add a missing space between rate estimator and backlog
When a rate estimator is active, "tc -s qd" displays
something like :

rate 12616bit 11ppsbacklog 0b 0p requeues 2

instead of :

rate 12616bit 11pps backlog 0b 0p requeues 2

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:34:05 -08:00
Phil Sutter 6495bca92e ssfilter: Fix for inverted last expression
When fixing for shift/reduce conflicts, possibility to invert the last
expression by prefixing with '!' or 'not' was accidentally removed.

Fix this by allowing for expr to be an inverted expr so that any
reference to it in exprlist accepts the inverted prefix.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: b2038cc0b2 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:33:19 -08:00
Eric Dumazet 5eead6270a ss: add support for bytes_sent, bytes_retrans, dsack_dups and reord_seen
Wei Wang added these fields in linux-4.19

Tested:

ss -ti ...

	ts sack cubic wscale:8,8 rto:7 rtt:2.678/0.267 mss:1428 pmtu:1500
    rcvmss:536 advmss:1428 cwnd:91 ssthresh:65
(*) bytes_sent:17470606104 bytes_retrans:2856
    bytes_acked:17470483297
    segs_out:12234320 segs_in:622983
    data_segs_out:12234318 send 388.2Mbps lastrcv:986784 lastack:1
    pacing_rate 465.8Mbps delivery_rate 162.7Mbps
    delivered:12234235 delivered_ce:3669056
    busy:986784ms unacked:84 retrans:0/2
(*) dsack_dups:2
    rcv_space:14280 rcv_ssthresh:65535 notsent:2016336 minrtt:0.183

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-29 10:49:46 -08:00
Phil Sutter 7ab8f249aa man: ip-route.8: Fix ENCAP references in synopsis
The different encapsulation types are described in ENCAP_*
non-terminals, but ENCAP definition lists them without the ENCAP_
prefix. Fix this for consistency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 16:00:18 -08:00
Roopa Prabhu f38e278b84 bridge: make -c match -compressvlans first instead of -color
commit c7c1a1ef51 ("bridge: colorize output and use JSON print library")
broke previous use of -c to represent compressvlans. This restores
previous use of -c to represent compressvlans. Understand the original
motivation to use -c to represent color consistently everywhere but
there are apps and network interface managers out there that are already
using -c to prepresent compressed vlans.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:59:21 -08:00
Eric Dumazet 2f4d834b99 ss: add support for delivered and delivered_ce fields
Kernel support was added in linux-4.18 in commit feb5f2ec6464
("tcp: export packets delivery info")

Tested:

ss -ti
...
ESTAB   0 2270520      [2607:f8b0:8099:e16::]:47646   [2607:f8b0:8099:e18::]:38953
	 ts sack cubic wscale:8,8 rto:7 rtt:2.824/0.278 mss:1428
     pmtu:1500 rcvmss:536 advmss:1428 cwnd:89 ssthresh:62 bytes_acked:2097871945
    segs_out:1469144 segs_in:65221 data_segs_out:1469142 send 360.0Mbps lastsnd:2
    lastrcv:99231 lastack:2 pacing_rate 431.9Mbps delivery_rate 246.4Mbps
(*) delivered:1469099 delivered_ce:424799
    busy:99231ms unacked:44 rcv_space:14280 rcv_ssthresh:65535
    notsent:2207688 minrtt:0.228

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:57:30 -08:00
Phil Sutter b2ec8f4314 man: rdma: Add reference to rdma-resource.8
All rdma-related man pages list each other in SEE ALSO section, only
rdma-resource.8 is missing. Add it for the sake of consistency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:56:31 -08:00
Eric Dumazet 6d03d6f7d9 man: tc: update man page for fq packet scheduler
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-25 09:40:36 -08:00
Eric Dumazet 55e106c480 tc: fq: support ce_threshold attribute
Kernel commit 48872c11b772 ("net_sched: sch_fq: add dctcp-like marking")
added support for TCA_FQ_CE_THRESHOLD attribute.

This patch adds iproute2 support for it.

It also makes sure fq_print_xstats() can deal with smaller tc_fq_qd_stats
structures given by older kernels.

Usage :

FQATTRS="ce_threshold 4ms"
TXQS=8

for ETH in eth0
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
  tc qd add dev $ETH parent 1:$i fq $FQATTRS
 done
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:30:24 -08:00
Stephen Hemminger ce5071eda6 drop support for IPX
IPX has been depracted then removed from upstream kernels.
Drop support from ip route as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:27:56 -08:00
David Ahern 27f4e51003 Merge branch 'tc-gred-json-perf-vq-config' into iproute2-next
Jakub Kicinski  says:

====================

This set brings GRED support up to date with recent kernel changes.
In particular the new netlink attributes for more fine-grained stats
and per-virtual queue flags.

To make GRED usable in modern deployments the patch set starts with
adding JSON output.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:17:30 -08:00
Jakub Kicinski f7a8749aff tc: gred: allow controlling and dumping per-DP RED flags
Kernel now support setting ECN and HARDDROP flags per-virtual
queue.  Allow users to tweak the settings, and print them on
dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:40 -08:00
Jakub Kicinski 2d7c564a1e tc: gred: support controlling RED flags
Kernel GRED qdisc supports ECN marking, and the harddrop flag
but setting and dumping this flag is not possible with iproute2.
Add the support.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:36 -08:00
Jakub Kicinski fdaff63c6a tc: gred: use extended stats if available
Use the extended attributes with extra and better stats, when
possible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:19 -08:00
Jakub Kicinski c3e1cd28c1 tc: gred: separate out stats printing
Printing GRED statistics is long and deserves a function on its own.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:09 -08:00
Jakub Kicinski 6475e6a580 tc: gred: jsonify GRED output
Make GRED dump JSON-compatible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:04 -08:00
Jakub Kicinski 33021752cd tc: move RED flag printing to helper
Number of qdiscs use the same set of flags to control shared RED
implementation.  Add a helper for printing those flags.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:10:58 -08:00
Jakub Kicinski b640e85d2d json: add %hhu helpers
Add helpers for printing char-size values.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:09:53 -08:00
Jakub Kicinski c8f201e3d2 tc: gred: remove unclear comment
The comment about providing a proper message seems similar to
the comment in the kernel which says:

    /* hack -- fix at some point with proper message
       This is how we indicate to tc that there is no VQ
       at this DP */

it's unclear what that message would be, and whether it's needed.
Remove the confusing comment.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:08:16 -08:00
David Ahern 6ae54b1326 Revert "rdma: make local functions static"
This reverts commit e99c4443ae.

Patch added to iproute2-master breaks builds of -next because of a
more recent patch in -next that relies on the exports. Revert the
offending patch. Unfortunately this leaves a window where builds
break.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:17 -08:00
David Ahern 0868c8ab07 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:11 -08:00
Quentin Monnet 1a7d3ad8a5 bpf: initialise map symbol before retrieving and comparing its type
In order to compare BPF map symbol type correctly in regard to the
latest LLVM, commit 7a04dd84a7 ("bpf: check map symbol type properly
with newer llvm compiler") compares map symbol type to both NOTYPE and
OBJECT. To do so, it first retrieves the type from "sym.st_info" and
stores it into a temporary variable.

However, the type is collected from the symbol "sym" before this latter
symbol is actually updated. gelf_getsym() is called after that and
updates "sym", and when comparison with OBJECT or NOTYPE happens it is
done on the type of the symbol collected in the previous passage of the
loop (or on an uninitialised symbol on the first passage). This may
eventually break map collection from the ELF file.

Fix this by assigning the type to the temporary variable only after the
call to gelf_getsym().

Fixes: 7a04dd84a7 ("bpf: check map symbol type properly with newer llvm compiler")
Reported-by: Ron Philip <ron.philip@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-21 09:36:30 -08:00
Nicolas Dichtel ebe3ce2fcc ipnetns: parse nsid as a signed integer
Don't confuse the user, nsid is a signed integer, this kind of command
should return an error: 'ip netns set foo 0xffffffff'.

Also, a valid value is a positive value. To let the kernel chooses a value,
the keyword 'auto' must be used.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-21 09:35:37 -08:00
Amritha Nambiar e20e50b0c1 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:34:56 -08:00
David Ahern 8d42678dfb Update kernel headers
Update kernel headers to
  b1a200484143 ("net-next/hinic: fix a bug in rx data flow")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:33:09 -08:00
Stephen Hemminger e99c4443ae rdma: make local functions static
Several functions only used inside utils.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 946a135c58 tc/pedit: use structure initialization
The pedit callback structure table should be iniatialized using
structure initialization to avoid structure changes problems.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 9e96e71594 tc/action: make variables static
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 42d9eed451 tc/meta: make meta_table static and const
The mapping table is only used by em_meta.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 9455bec52a tc/util: make local functions static
The tc util library parse/print has functions only used locally
(and some dead code removed).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 33043dfc9c tc/ematch: make local functions static
The print handling is only used in tc/m_ematch.c

Remove unused function to print_ematch_tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 7527b221d6 tc/pedit: make functions static
The parse and pack functions are only used by the pedit routines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 3f7bd0fd90 ss: make local variables static
Several variables only used in this code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger a38fadf401 tc/police: make print_police static
print_police function only used by m_police.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 7e569d92a9 tc/class: make filter variables static
Only used in this file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger f63c3f9a81 tipc: make cmd_find static
Function only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger babc56b68c tc: drop unused name_to_id function
Not used in current code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger fa92d8cb09 ipxfrm: make local functions static
Make functions only used in ipxfrm.c static.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 3e4b255ca9 ipmonitor: make local variable static
prefix_banner only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 086277b591 ip: make flag names const/static
The table of filter flags is only used in ipaddress

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger d63786c642 bridge: make local variables static
enable_color and set_color_palette only used here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 13530c46b5 genl: remove dead code
The function genl_ctrl_resolve_family is defined but never used
in current code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 1d2fac4145 libnetlnk: unused and local functions cleanup
rntl_talk_extack and parse_rtattr_index not used in current code.
rtnl_dump_filter_l is only used in this file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger cc5b7e37ac lib/ll_map: make local function static
ll_idx_a2n is only used in ll_map.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger f7bf88dfd5 lib/color: make local functions static
color_enable etc, only used here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger b8795a3208 lib/utils: make local functions static
Some of the print/parsing is only used internally.
Drop unused get_s8/get_s16.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 07b20a6197 lib/ll_addr: whitespace and indent cleanup
Run old ll_addr through kernel Lindent.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 57ddc275f5 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/shemminger/iproute2 2018-11-19 11:40:37 -08:00
Phil Sutter 133db49b49 ip-address: Fix filtering by negated address flags
When disabling a flag, one needs to AND with the inverse not the flag
itself. Otherwise specifying for instance 'home -nodad' will effectively
clear the flags variable.

While being at it, simplify the code a bit by merging common parts of
negated and non-negated case branches. Also allow for the "special
cases" to be inverted, too.

Fixes: f73ac674d0 ("ip: change flag names to an array")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:38:24 -08:00
Stephen Hemminger 87e3ec0e2f testsuite: drop old kernel configs
The testsuite directory had several ancient kernel configs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:37:08 -08:00
Phil Sutter 05d978e085 ip-route: Fix nexthop encap parsing
When parsing nexthop parameters, a buffer of 4k bytes is provided. Yet,
in lwt_parse_encap() and some functions called by it, buffer size was
assumed to be 1k despite the actual size was provided. This led to
spurious buffer size errors if the buffer was filled by previous nexthop
parameters to exceed that 1k boundary.

Fixes: 1e5293056a ("lwtunnel: Add encapsulation support to ip route")
Fixes: 5866bddd9a ("ila: Add support for ILA lwtunnels")
Fixes: ed67f83806 ("ila: Support for checksum neutral translation")
Fixes: 86905c8f05 ("ila: support for configuring identifier and hook types")
Fixes: b15f440e78 ("lwt: BPF support for LWT")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:18:59 -08:00
Stephen Hemminger 7e2f71b431 testsuite: colorize test result output
When running testsuite it is easy as a human to miss failure.
Add symbol colorizing to SKIPED/PASS/FAIL output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:18:03 -08:00
Phil Sutter 6cd959bb12 man: ip-route.8: Document nexthop limit
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:13:24 -08:00
Stefano Brivio 6d1af9abc5 testsuite: ss: Fix spacing in expected output for ssfilter.t
Since commit 00240899ec ("ss: Actually print left delimiter for
columns") changes spacing in ss output, we also need to adjust for that in
the ss filter test.

Fixes: 00240899ec ("ss: Actually print left delimiter for columns")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-12 08:30:34 -08:00
Luca Boccassi ed27d27a5c testsuite: correctly use CC macros for generate_nlmsg
It's $(QUIET_CC)$(CC) not $(QUIET_CC), copy-paste error. CI does
verbose build so it slipped through.

Fixes: 6e7d347aab ("testsuite: build generate_nlmsg with QUIET_CC")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:58:55 -08:00
Stefano Brivio 64dbd03ea1 iplink_geneve: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:47 -08:00
Stefano Brivio 3d98eba4fe iplink_vxlan: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:12 -08:00
David Ahern 3a7246dce4 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:50:50 -08:00
Stephen Hemminger 9abb69f8a0 uapi: sctp header change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:18:26 -08:00
Jakub Kicinski 9c5f4251d6 tc: f_u32: allow skip_hw and skip_sw flags to be last
u32 uses NEXT_ARG() incorrectly when parsing skip_hw and skip_sw
flags.  NEXT_ARG() ensures there is another argument on the command
line, and is used in handling <keyword> <value> syntax to move past
<keyword> and ensure there is a <value> to read.

Commit 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw
flags") seems to have copy pasted the handling from the previous
command - "police", which needs an extra parameter and is kind of
special due to the use of parse_police() helper.

The combination of NEXT_ARG() and continue worked fine as long as
skip_sw/skip_hw wasn't last, e.g.:

$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
    u32 match ip6 priority 0xa0 0xe0 skip_hw action pass

But would fail if it was last:

$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
    u32 match ip6 priority 0xa0 0xe0 flowid :1 skip_hw
Command line is not complete. Try option "help"

Remove the NEXT_ARG()s and the continues, and let the argc--; argv++;
at the end of the loop do its job.

Fixes: 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw flags")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:12:29 -08:00
Luca Boccassi 1a03ac6b05 Pass CPPFLAGS to the compiler
When building Debian packages pre-processor flags are passed via
CPPFLAGS, as the convention indicates. Specifically, the hardening
-D_FORTIFY_SOURCE=2 flag is used.
Pass CPPFLAGS to all calls of QUIET_CC together with CFLAGS.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:07:18 -08:00
Luca Boccassi 6e7d347aab testsuite: build generate_nlmsg with QUIET_CC
Follow the standard pattern, and respect user's verbosity setting.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:07:18 -08:00
Alex Vesker 995015be31 devlink: Add missing region option to devlink man page
The region field was not added to the devlink man page.

Fixes: 8b4fbf0bed ("devlink: Add support for devlink-region access")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:04:33 -08:00
Luca Boccassi a6bb5b9e7c Fix warning in tc-skbprio.8 manpage
". If" gets interpreted as a macro, so move the period to the previous
line:

  33: warning: macro `If' not defined

Fixes: 141b55f854 ("Add SKB Priority qdisc support in tc(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:03:40 -08:00
Luca Boccassi 7f5047524c man: ss.8: break and indent long line
Fixes groff warning:
  ss.8 92: warning [p 2, 2.8i]: can't break line

And makes the line also more readable.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:02:43 -08:00
Roopa Prabhu a795211fe5 bridge: fdb: remove redundant dev string in show output
After commit 4abb8c723a ("bridge: fdb: Fix for missing
keywords in non-JSON output"), I am seeing a double print for dev
in bridge fdb show. eg:
"44:38:39:00:6a:82 dev dev bridge vlan 1 master bridge permanent"

this patch removes the redundant print.

Fixes: 4abb8c723a ("bridge: fdb: Fix for missing keywords in non-JSON output")
CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 07:50:01 -08:00
Leon Romanovsky e89feffae3 rdma: Document IB device renaming option
[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-05 19:12:03 -08:00
Luca Boccassi 6d2fd4a53f Include bsd/string.h only in include/utils.h
This is simpler and cleaner, and avoids having to include the header
from every file where the functions are used. The prototypes of the
internal implementation are in this header, so utils.h will have to be
included anyway for those.

Fixes: 508f3c231e ("Use libbsd for strlcpy if available")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:38:32 -08:00
Stephen Hemminger 39776a8665 uapi: update headers to 4.20-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:37:41 -08:00
Leon Romanovsky 4ee770eec9 rdma: Refresh help section of resource information
After commit 4060e4c0d2 ("rdma: Add PD resource tracking
information"), the resource information shows PDs and MRs,
but help pages didn't fully reflect it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:36:36 -08:00
David Ahern cde2f706aa Merge branch 'rdma-dev-rename' into iproute2-next
Leon Romanovsky  says:

====================

An example:

[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:41:12 -07:00
Leon Romanovsky 4443c9c6a0 rdma: Add an option to rename IB device interface
Enrich rdmatool with an option to rename IB devices,
the command interface follows Iproute2 convention:
"rdma dev set [OLD-DEVNAME] name NEW-DEVNAME"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:56 -07:00
Leon Romanovsky a14ceed325 rdma: Introduce command execution helper with required device name
In contradiction to various show commands, the set command explicitly
requires to use device name as an argument. Provide new command
execution helper which enforces it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:18 -07:00
Leon Romanovsky 3fb00075d9 rdma: Update kernel include file to support IB device renaming
Bring kernel header file changes upto commit 05d940d3a3ec
("RDMA/nldev: Allow IB device rename through RDMA netlink")

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:15 -07:00
David Ahern b2e8bf1584 ip rule: Add ipproto and port range to filter list
Allow ip rule dumps and flushes to filter based on ipproto, sport
and dport. Example:

$ ip ru ls ipproto udp
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001
$ ip ru ls dport 53
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:37:14 -07:00
David Ahern 70bc1f6550 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:34:18 -07:00
David Ahern 2380120926 ip rule: Require at least one argument for add
'ip rule add' with no additional arguments just adds another rule
for the main table - which exists by default. Require at least
1 argument similar to delete.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:49:48 -07:00
David Ahern b65b4c0870 ip rule: Honor filter arguments on flush
'ip ru flush' currently removes all rules with priority > 0 regardless
of any other command line arguments passed in. Update flush_rule to
call filter_nlmsg to determine if the rule should be flushed or not.
This enables rule flushing such as 'ip ru flush table 1001' and
'ip ru flush pref 99'.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:49:48 -07:00
Luca Boccassi 508f3c231e Use libbsd for strlcpy if available
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:47:03 -07:00
Yonghong Song 7a04dd84a7 bpf: check map symbol type properly with newer llvm compiler
With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
  -bash-4.4$ cat t.c
  __attribute__((section("maps"))) int g;
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 2 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 g

The following llvm commit enables BPF target to generate
proper symbol type and size.
  commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
  Author: Yonghong Song <yhs@fb.com>
  Date:   Wed Sep 19 16:04:13 2018 +0000

      [bpf] Symbol sizes and types in object file

      Clang-compiled object files currently don't include the symbol sizes and
      types.  Some tools however need that information.  For example, ctfconvert
      uses that information to generate FreeBSD's CTF representation from ELF
      files.
      With this patch, symbol sizes and types are included in object files.

      Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
      Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>

Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but OBJECT.
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 3 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS t.c
       2: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 g

This patch makes sure bpf library accepts both NOTYPE and OBJECT types
of global map symbols.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-31 08:27:07 -07:00
Stefano Brivio 00240899ec ss: Actually print left delimiter for columns
While rendering columns, we use a local variable to keep track of the
field currently being printed, without touching current_field, which is
used for buffering.

Use the right pointer to access the left delimiter for the current column,
instead of always printing the left delimiter for the last buffered field,
which is usually an empty string.

This fixes an issue especially visible on narrow terminals, where some
columns might be displayed without separation.

Reported-by: YoyPa <yoann.p.public@gmail.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: YoyPa <yoann.p.public@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-31 08:11:11 -07:00
Tobias Jungel 45fca4ed94 bridge: fix vlan show stats formatting
The output of -statistics vlan show was broken previous change for json
output. This aligns the format to vlan show.

v2: fixed too greedy deletion that caused a -Wmaybe-uninitialized

Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-29 09:58:03 -07:00
Peter Korsgaard f900a21611 utils.h: provide fallback CLOCK_TAI definition
q_{etf,taprio}.c uses CLOCK_TAI, which isn't exposed by glibc < 2.21 or
uClibc, breaking the build. Provide a fallback definition like it is done
for IPPROTO_MPLS and others.

Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-29 09:54:52 -07:00
David Ahern 6e221408e6 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-23 10:55:09 -07:00
Hangbin Liu 35b857f9c6 ip/geneve: fix ttl inherit behavior
Currently when we add geneve with "ttl inherit", we only set ttl to 0, which
is actually use whatever default value instead of inherit the inner protocol's
ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_GENEVE_TTL_INHERIT in kernel commit 52d0d404d39dd ("geneve: add ttl
inherit support"). Now let's use "ttl inherit" to inherit the inner
protocol's ttl, and use "ttl auto" to means "use whatever default value",
the same behavior with ttl == 0.

v2:
1) remove IFLA_GENEVE_TTL_INHERIT defination in if_link.h as it's already
   updated.
2) Still use addattr8() so we can enable/disable ttl inherit, as Michal
   suggested.

v3: Update man page

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-23 10:53:16 -07:00
Stephen Hemminger 2808241af5 v4.19.0 2018-10-23 10:14:57 -07:00
Phil Sutter 737b8258b3 tc: htb: Print default value in hex
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

As requested, also change JSON output to print the value as hex string.

Fixes: f354fa6aa5 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-23 10:07:10 -07:00
Phil Sutter 6358bbc381 tc: Remove pointless assignments in batch()
All these assignments are later overwritten without reading in between,
so just drop them.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 8d05f33a38 tipc: Drop unused variable 'genl'
Although initialized by call to libmnl, the variable is used only in a
call to sizeof(). Drop it and call sizeof with its type instead.

Fixes: f043759dd4 ("tipc: add new TIPC configuration tool")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 3b5c5ef0a7 ip-route: Fix parse_encap_seg6() srh parsing
In case caller did not specify 'segs' parameter, parse_srh() would read
garbage while iterating over 'segbuf'. Avoid this by initializing
'segbuf' to an empty string.

Fixes: e8493916a8 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter cdefe1d8e4 rdma: Don't pass garbage to rd_check_is_filtered()
Variables 'src_port' and 'dst_port' are initialized only if attributes
RDMA_NLDEV_ATTR_RES_SRC_ADDR or RDMA_NLDEV_ATTR_RES_DST_ADDR are
present. Make sure to pass them over to rd_check_is_filtered() only if
that is the case.

Fixes: 9a362cc71a ("rdma: Add CM_ID resource tracking information")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter e5da392ff8 ip-route: Fix for memleak in error path
If call to rta_addattr_l() failed, parse_encap_seg6() would leak memory.
Fix this by making sure calls to free() are not skipped.

Fixes: bd59e5b151 ("ip-route: Fix segfault with many nexthops")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 3b0070f6b1 rdma: Fix for ineffective check in add_filter()
With 'name' field defined as array in struct filters, it will always
contain a value irrespective of whether a name was assigned or not.

Fix this by turning the field into a const char pointer.

Fixes: 1174be72d1 ("rdma: Add filtering infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
Phil Sutter b1ffc1f465 devlink: Fix error reporting in cmd_resource_set()
resource_path_parse() returns either zero or a negative error code,
hence the negated value must be passed to strerror().

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
Petr Vorel 92885e1973 testsuite: Fix make check when need build generate_nlmsg
make check from top level Makefile defines several flags which break
building generate_nlmsg:

$ make check
make -C tools
gcc  -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -I../include/uapi -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -DNETNS_RUN_DIR=\"/var/run/netns\" -DNETNS_ETC_DIR=\"/etc/netns\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DHAVE_SETNS -DHAVE_SELINUX -DHAVE_ELF -DHAVE_LIBMNL -I/usr/include/libmnl -DNEED_STRLCPY -DHAVE_LIBCAP ../lib/libutil.a ../lib/libnetlink.a -lselinux -lelf -lmnl -lcap  -I../../include -include../../include/uapi/linux/netlink.h -o generate_nlmsg generate_nlmsg.c ../../lib/libnetlink.c -lmnl
gcc: error: ../lib/libutil.a: No such file or directory
gcc: error: ../lib/libnetlink.a: No such file or directory
make[2]: *** [Makefile:5: generate_nlmsg] Error 1
make[1]: *** [Makefile:40: generate_nlmsg] Error 2

To fix it reset CFLAGS in sub Makefile and remove LDLIBS entirely (as
required -lmnl flag was specified in 5dc2204c ("testsuite: add libmnl").

Fixes: 8804a8c0 ("Makefile: Add check target")

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
David Ahern 260137e24d iplink: Remove flags argument from iplink_get
iplink_get has 1 caller and the flags arg is 0, so just remove it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-22 09:45:25 -07:00
David Ahern cd554f2c2f Tree wide: Drop sockaddr_nl arg
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 09:43:48 -07:00
David Ahern 9d16a1de1f Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-22 09:43:33 -07:00
Stephen Hemminger 95debca728 util: spelling fix 2018-10-18 13:23:38 -07:00
Stephen Hemminger c60683e246 tipc: spelling fix 2018-10-18 13:23:30 -07:00
Stephen Hemminger 94b0c90152 ip: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:23:11 -07:00
Stephen Hemminger f5a398bf17 tc: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:22:51 -07:00
Stephen Hemminger 5cc4639471 config: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:22:25 -07:00
Stephen Hemminger ab7318f9d4 examples: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:18:56 -07:00
Stephen Hemminger 9d715cf65a doc/man: spelling fixes
Use ispell and codespell to find/fix spelling errors in documentation
and man pages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:15:45 -07:00
Phil Sutter 0b9b0d08c2 ip-addrlabel: Fix printing of label value
Passing the return value of RTA_DATA() to rta_getattr_u32() is wrong
since that function will call RTA_DATA() by itself already.

Fixes: a7ad1c8a68 ("ipaddrlabel: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-16 11:51:05 -07:00
Lorenzo Bianconi c7a3b22961 utils: fix get_rtnl_link_stats_rta stats parsing
iproute2 walks through the list of available tunnels using netlink
protocol in order to get device info instead of reading
them from proc filesystem. However the kernel reports device statistics
using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
IFLA_PROTINFO one but iproutes expects these info in
IFLA_STATS64/IFLA_STATS attributes.
The issue can be triggered with the following reproducer:

$ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
$ip -6 -d -s tunnel show ip6d0
ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
Dump terminated

Fix the issue introducing IFLA_INET6_STATS attribute parsing

Fixes: 3e95393871 ("iptunnel/ip6tunnel: Use netlink to walk through
tunnels list")

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
2018-10-15 09:40:15 -07:00
Lorenzo Bianconi 9e030e77f2 uapi: add snmp header file
Introduce snmp header file. It will be used in subsequent patch in
order to parse device statistics reported in
IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS netlink attributes

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:37:42 -07:00
Sabrina Dubroca 9b45f8ec13 macsec: fix off-by-one when parsing attributes
I seem to have had a massive brainfart with uses of
parse_rtattr_nested(). The rtattr* array must have MAX+1 elements, and
the call to parse_rtattr_nested must have MAX as its bound. Let's fix
those.

Fixes: b26fc590ce ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:35:48 -07:00
Sabrina Dubroca 45ec4771d4 json: make 0xhex handle u64
Stephen converted macsec's sci to use 0xhex, but 0xhex handles
unsigned int's, not 64 bits ints. Thus, the output of the "ip macsec
show" command is mangled, with half of the SCI replaced with 0s:

# ip macsec show
11: macsec0: [...]
    cipher suite: GCM-AES-128, using ICV length 16
    TXSC: 0000000001560001 on SA 0

# ip -d link show macsec0
11: macsec0@ens3: [...]
    link/ether 52:54:00:12:01:56 brd ff:ff:ff:ff:ff:ff promiscuity 0
    macsec sci 5254001201560001 [...]

where TXSC and sci should match.

Fixes: c0b904de62 ("macsec: support JSON")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:32:18 -07:00
Phil Sutter 4abb8c723a bridge: fdb: Fix for missing keywords in non-JSON output
While migrating to JSON print library, some keywords were dropped from
standard output by accident. Add them back to unbreak output parsers.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:23:55 -07:00
David Ahern 0d30c1f8d4 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:31:37 -07:00
Nikolay Aleksandrov d13d52d0d5 bridge: add support for backup port
This patch adds support for the new backup port option that can be set
on a bridge port. If the port's carrier goes down all of the traffic
gets redirected to the configured backup port. We add the following new
arguments:
$ ip link set dev brport type bridge_slave backup_port brport2
$ ip link set dev brport type bridge_slave nobackup_port

$ bridge link set dev brport backup_port brport2
$ bridge link set dev brport nobackup_port

The man pages are updated respectively.
Also 2 minor style adjustments:
- add missing space to bridge man page's state argument
- use lower starting case for vlan_tunnel in ip-link man page (to be
consistent with the rest)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:26:46 -07:00
Roopa Prabhu 4c45b684f9 ipneigh: support for NTF_EXT_LEARNED flag on neigh entries
Adds new option extern_learn to set NTF_EXT_LEARNED flag
on neigh entries.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:24:45 -07:00
Stephen Hemminger bfb3bf189f libnetlink: use local variable
Now that err->error is in local variable, use it consistently.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-09 09:46:11 -07:00
Vlad Buslov 8c50b728b2 libnetlink: fix use-after-free of message buf
In __rtnl_talk_iov() main loop, err is a pointer to memory in dynamically
allocated 'buf' that is used to store netlink messages. If netlink message
is an error message, buf is deallocated before returning with error code.
However, on return err->error code is checked one more time to generate
return value, after memory which err points to has already been
freed. Save error code in temporary variable and use the variable to
generate return value.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-09 09:41:03 -07:00
Jakub Kicinski 650a10e032 tc: jsonify output of q_fifo
Print limits correctly in JSON context.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-08 09:22:22 -07:00
David Ahern f8805f567e Merge branch 'taprio-scheduler' into iproute2-next
Vinicius Costa Gomes  says:

====================

This is the iproute2 side of the taprio v1 series.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:37:57 -07:00
Vinicius Costa Gomes 579acb4bc5 taprio: Add manpage for tc-taprio(8)
This documents the parameters and provides an example of usage.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:32:16 -07:00
Vinicius Costa Gomes 0dd1644935 tc: Add support for configuring the taprio scheduler
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
          num_tc 3 \
	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	  queues 1@0 1@1 2@2 \
	  base-time 1528743495910289987 \
	  sched-entry S 01 300000 \
	  sched-entry S 02 300000 \
	  sched-entry S 04 300000 \
	  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

     sched-entry <CMD> <GATE MASK> <INTERVAL>

The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

 	      base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:32:08 -07:00
Jesus Sanchez-Palencia d791f3ad86 libnetlink: Add helper for getting a __s32 from netlink msgs
This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:31:35 -07:00
Vinicius Costa Gomes de63cd9044 include: Add helper to retrieve a __s64 from a netlink msg
This allows signed 64-bit integers to be retrieved from a netlink
message.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:30:30 -07:00
Vinicius Costa Gomes a066bac8a2 utils: Implement get_s64()
Add this helper to read signed 64-bit integers from a string.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:30:28 -07:00
David Ahern 720a44a751 Update kernel headers
update kernel headers to commit:
72438f8cef4e ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:29:25 -07:00
Vlad Buslov f6b498f957 tc: flower: expose hardware offload count
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-10-07 10:14:09 -07:00
Hangbin Liu 952a7a1931 vxlan: show correct ttl inherit info
We should only show ttl inherit when IFLA_VXLAN_TTL_INHERIT supplied.
Otherwise show the ttl number, or auto when it is 0.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-04 09:20:45 -07:00
David Ahern 169e667c93 Merge branch 'hdrs-for-dump-req' into iproute2-next
David Ahern says:

====================

iproute2 currently uses ifinfomsg as the header for all dumps using the
wilddump headers. This is wrong as each message type actually has its own
header type. While the kernel has traditionally let it go as it for the
most part only uses the family entry, the use of kernel side filters is
increasing to alter what is returned on a request. The kernel side filters
really need to use the proper header type.

To that end, fix iproute2 to use the proper header struct for the GET type.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:41:52 -07:00
David Ahern 56eeeda978 libnetlink: Rename rtnl_wilddump_stats_req_filter to rtnl_statsdump_req_filter
rtnl_wilddump_stats_req_filter only takes RTM_GETSTATS as the type argument
so rename to rtnl_statsdump_req_filter for consistency with other request
functions and hardcode the type argument.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:36 -07:00
David Ahern 31ae2912f7 libnetlink: Rename rtnl_wilddump_* to rtnl_linkdump_*
Rename rtnl_wilddump_req_filter to rtnl_linkdump_req_filter,
rtnl_wilddump_request to rtnl_linkdump_req and
rtnl_wilddump_req_filter_fn to rtnl_linkdump_req_filter_fn.

In all cases drop the type argument which at this point is only
RTM_GETLINK and hardcode in the functions.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:08 -07:00
David Ahern efb0b383d9 libnetlink: Convert GETNSID dumps to use rtnl_nsiddump_req
Add rtnl_nsiddump_req for namespace id dumps using the proper rtgenmsg
as the header. Convert existing RTM_GETNSID dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:04 -07:00
David Ahern ff41db8a75 libnetlink: Convert GETNEIGHTBL dumps to use rtnl_neightbldump_req
Add rtnl_neightbldump_req for neighbor table dumps using the proper ndtmsg
as the header. Convert existing RTM_GETNEIGHTBL dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:02 -07:00
David Ahern 9e0ab19c4d libnetlink: Convert GETNEIGH dumps to use rtnl_neighdump_req
Add rtnl_neighdump_req for neighbor dumps using the proper ndmsg
as the header. Convert existing rtnl_wilddump_request for RTM_GETNEIGH
to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:59 -07:00
David Ahern b05d9a3d58 libnetlink: Convert GETRULE dumps to use rtnl_ruledump_req
Add rtnl_ruledump_req for fib fule dumps using the proper fib_rule_hdr
as the header. Convert existing RTM_GETRULE dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:56 -07:00
David Ahern ddee16bc96 libnetlink: Convert GETNETCONF dumps to use rtnl_netconfdump_req
Add rtnl_netconfdump_req for netconf dumps using the proper netconfmsg
as the header. Convert existing RTM_GETNETCONF dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:34 -07:00
David Ahern 9dbe6df411 libnetlink: Convert GETMDB dumps to use rtnl_mdbdump_req
Add rtnl_mdbdump_req for mdb dumps using the proper br_port_msg as
the header. Convert existing RTM_GETMDB dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:31 -07:00
David Ahern 393600231a libnetlink: Convert GETADDRLABEL dumps to use rtnl_addrlbldump_req
Add rtnl_addrlbldump_req for address label dumps using the proper
ifaddrlblmsg as the header. Convert existing RTM_GETADDRALBEL dumps
to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:29 -07:00
David Ahern bfb27dfaac libnetlink: Convert GETROUTE dumps to use rtnl_routedump_req
Add rtnl_routedump_req for route dumps using the proper rtmsg
as the header. Convert existing RTM_GETROUTE dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:27 -07:00
David Ahern 46917d0895 libnetlink: Convert GETADDR dumps to use rtnl_addrdump_req
Add rtnl_addrdump_req for address dumps using the proper ifaddrmsg
as the header. Convert existing RTM_GETADDR dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:21 -07:00
Eelco Chaudron 5ac138324e tc_util: Add support for showing TCA_STATS_BASIC_HW statistics
Add support for showing hardware specific counters to easy
troubleshooting hardware offload.

$ tc -s filter show dev enp3s0np0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
        action order 1: mirred (Egress Redirect to device eth1) stolen
        index 1 ref 1 bind 1 installed 0 sec used 0 sec
        Action statistics:
        Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 187542 bytes 4077 pkt
        Sent hardware 534697200 bytes 8911620 pkt
        backlog 0b 0p requeues 0
        cookie 89173e6a44447001becfd486bda17e29

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:45:33 -07:00
Pieter Jansen van Vuuren 56155d4df8 tc: f_flower: add geneve option match support to flower
Allow matching on options in Geneve tunnel headers.

The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.

e.g.
 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev geneve0 ingress
 # tc filter add dev geneve0 protocol ip parent ffff: \
     flower \
       enc_src_ip 10.0.99.192 \
       enc_dst_ip 10.0.99.193 \
       enc_key_id 11 \
       geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
       ip_proto udp \
       action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:39:55 -07:00
Roopa Prabhu 51eb02254b ipneigh: update man page and help for router
While at it also add missing text for proxy in the man page.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-01 17:36:35 -07:00
Nikolay Aleksandrov c3ded6e4a0 bridge: fdb: add support for sticky flag
Add support for the new sticky flag that can be set on fdbs and update the
man page.

CC: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 10:52:22 -07:00
David Ahern d9c0be4e97 Update kernel headers
Update kernel headers to commit
a804e5e218754 ("selftests: forwarding: test for bridge sticky flag")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 10:51:15 -07:00
Roopa Prabhu c2cd14acc7 ipneigh: support setting of NTF_ROUTER on neigh entries
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 09:53:08 -07:00
David Ahern 7b2e200679 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 09:52:41 -07:00
Stephen Hemminger b45e300024 libnetlink: don't return error on success
Change to error handling broke normal code.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 10:08:48 +02:00
Stephen Hemminger 5dc2204c01 testsuite: add libmnl
Supporting external ack requires libmnl now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:59:37 +02:00
Petr Vorel 8804a8c0d3 Makefile: Add check target
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:56:40 +02:00
Lorenzo Bianconi c1360e3b48 iplink_vxlan: take into account preferred_family creating vxlan device
Take into account the configured preferred_family if neither saddr or
daddr are provided since otherwise vxlan kernel module will use IPv4 as
default remote inet family neglecting the one provided by userspace.
This behaviour was originally in commit 97d564b90c ("vxlan: use
preferred address family when neither group or remote is specified").
The issue can be triggered with the following reproducer:

$ip -6 link add vxlan1 type vxlan id 42 dev enp0s2 \
     proxy nolearning l2miss l3miss
$bridge fdb add 46:47:1f:a7:1c:25 dev vxlan1 dst 2000::2
RTNETLINK answers: Address family not supported by protocol

Fixes: 1e9b8072de ("iplink_vxlan: Get rid of inet_get_addr()")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:52:56 +02:00
Hangbin Liu fa1e658e84 iplink: fix incorrect any address handling for ip tunnels
After commit d42c7891d2 ("utils: Do not reset family for default, any,
all addresses"), when call get_addr() for any/all addresses, we will set
addr->flags to ADDRTYPE_INET_UNSPEC if family is AF_INET/AF_INET6, which
makes is_addrtype_inet() checking passed and assigns incorrect address
to kernel. The ip link cmd will return error like:

]# ip link add ipip1 type ipip local any remote 1.1.1.1
RTNETLINK answers: Numerical result out of range

Fix it by using is_addrtype_inet_not_unspec() to avoid unspec addresses.

geneve, vxlan are not affected as they use AF_UNSPEC family when call
get_addr()

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: d42c7891d2 ("utils: Do not reset family for default, any, all addresses")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 11:28:33 -07:00
Stephen Hemminger 11152f0a0d Makefile: add help target
Add help target to Makefile

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 09:15:26 -07:00
Petr Vorel 133c1a6c87 testsuite: Warn about empty $(IPVERS)
alltests target requires having symlink created by configure target
(default target). Without that there is no test being run.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Petr Vorel 3537633dcf testsuite: Generate generate_nlmsg when needed
Commit 886f2c43 added generate_nlmsg.c. Running alltests
target, which uses the binary required to run 'make -C tools' before.

Fixes: 886f2c43 testsuite: Generate nlmsg blob at runtime

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Petr Vorel f15836faec testsuite: Fix missing generate_nlmsg
Commit ad23e152 caused generate_nlmsg to be always missing:

$ make alltests
make: ./tools/generate_nlmsg: Command not found

Create testclean: to remove only results directory.

Fixes: ad23e152 testsuite: remove all temp files and implement make clean

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Hangbin Liu 88272775e2 iplink: add ipvtap support
IPVLAN and IPVTAP are using the same functions and parameters. So we can
just add a new link_util with id ipvtap. Others are the same.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-20 17:53:56 -07:00
David Ahern 34212c73b7 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iproute_lwtunnel.c

In addition to merge conflict between bd59e5b151 and 94a8722f2f,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-20 17:53:27 -07:00
Leon Romanovsky d090fbf33b rdma: Fix representation of PortInfo CapabilityMask
The port capability mask represents IBTA PortInfo specification,
but as it is written in description of kernel commit 2f944c0fbf58
("RDMA: Fix storage of PortInfo CapabilityMask in the kernel"),
the bit 26 was mistakenly overwritten.

The rdmatool followed it too and mislead users by presenting wrong
value. Since it never showed proper value, we update the whole
port_cap_mask to comply with IBTA and show real HW values.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-17 08:59:13 -07:00
Stephen Hemminger c60389e4f9 libnetlink: fix leak and using unused memory on error
If an error happens in multi-segment message (tc only)
then report the error and stop processing further responses.
This also fixes refering to the buffer after free.

The sequence check is not necessary here because the
response message has already been validated to be in
the window of the sequence number of the iov.

Reported-by: Mahesh Bandewar <mahesh@bandewar.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Mahesh Bandewar <maheshb@google.com>
2018-09-17 08:58:21 -07:00
Toke Høiland-Jørgensen 2153e01f36 q_cake: Also print nonat, nowash and no-ack-filter keywords
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-14 11:32:46 -07:00
Hangbin Liu 92bba4ed40 bridge/mdb: fix missing new line when show bridge mdb
The bridge mdb show is broken on current iproute2. e.g.
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp 34: br0  veth0_br  224.1.1.1  temp

After fix:
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp
34: br0  veth0_br  224.1.1.1  temp

v2: Use json print lib as Stephen suggested.
v3: No need to use is_json_context() as print_string() could handle both cases.
v4: use new function print_nl() to print new line in non-json mode.

Reported-by: Ying Xu <yinxu@redhat.com>
Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-13 16:02:33 -07:00
Toke Høiland-Jørgensen b914fe5f1c q_cake: Add printing of no-split-gso option
When the GSO splitting was turned into dual split-gso/no-split-gso options,
the printing of the latter was left out. Add that, so output is consistent
with the options passed.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-12 12:59:38 -07:00
Stephen Hemminger b85076cd74 lib: introduce print_nl
Common pattern in iproute commands is to print a line seperator
in non-json mode. Make that a simple function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-11 08:29:33 -07:00
Phil Sutter bd59e5b151 ip-route: Fix segfault with many nexthops
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:

| for i in `seq 37`; do
| 	nhs="nexthop via 1111::$i "$nhs
| done
| ip -6 route add 3333::/64 $nhs

The related code was broken in multiple ways:

* parse_one_nh() assumed that rta points to 4kB of storage but caller
  provided just 1kB. Fixed by passing 'len' parameter with the correct
  value.

* Error checking of rta_addattr*() calls in parse_one_nh() and called
  functions was completely absent, so with above fix in place output
  flood would occur due to parser looping forever.

While being at it, increase message buffer sizes to 4k. This allows for
at most 144 nexthops.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 12:14:50 -07:00
Caleb Raitto 40c2916fda tc/mqprio: Print extra info on invalid args.
Print the name of the argument that wasn't understood.

Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 12:14:00 -07:00
Stephen Hemminger ae775666cf genl: remove unnecessary extern
extern not necessary on function prototype.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:53:07 -07:00
Stephen Hemminger ad618b7984 tc/fifo: remove unnecessary prototype
The prototype for prio_print_opt is already in tc_util.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:50:22 -07:00
Stephen Hemminger 0f36267485 bridge: fix vlan show formatting
The output of vlan show was broken previous change to use json_print.
Clean the code up and return to original format.

Note: the JSON syntax has changed to make the bridge vlan
show more like other outputs (e.g. ip -j li show).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:48:06 -07:00
Stephen Hemminger 2ed82667b8 bridge: use print_json for some outputs
Rather than using is_json_context(), use the print_string functions
which handle both cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Stephen Hemminger f5fc738736 bridge: minor change to mdb print
Get port ifname once rather than on both sides of if(is_json_context).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Caleb Raitto 781ee3270d man: Change numtc to num_tc
The argument parser only accepts num_tc:

https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/tc/q_mqprio.c#n55

Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Stephen Hemminger 27886a1241 uapi: update ib_verbs
Merge current uapi from 4.19-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-31 15:03:49 -07:00
David Ahern b555ff737a Merge branch 'netem-slot-param' into iproute2-next
Yousuk Seung  says:

====================

This series adds support for the new "slot" netem parameter for
slotting. Slotting is an approximation of shared media that gather up
packets within a varying delay window before delivering them nearly at
once.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:08:43 -07:00
Yousuk Seung 588dd51e2c q_netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.

Syntax:
   slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
      [bytes MAX_BYTES]

The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.

Examples:
  Normal distribution delay with mean = 800us and stdev = 100us.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us

  Optionally set the max slot size in bytes and/or packets.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us bytes 64k packets 42

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:08:19 -07:00
Dave Taht b6268fbd58 q_netem: support delivering packets in delayed time slots
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
        slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:07:46 -07:00
Dave Taht abf70ef494 tc: support conversions to or from 64 bit nanosecond-based time
Using a 32 bit field to represent time in nanoseconds results in a
maximum value of about 4.3 seconds, which is well below many observed
delays in WiFi and LTE, and barely in the ballpark for a trip past the
Earth's moon, Luna.

Using 64 bit time fields in nanoseconds allows us to simulate
network diameters of several hundred light-years. However, only
conversions to and from ns, us, ms, and seconds are provided.

The iproute2 64 bit api uses signed values for time. Being able to
represent positive or negative time allows us to calculate +/- deltas
between, for example, the CLOCK_TAI and CLOCK_REALTIME clocks.

Time related utility functions in tc_util.c are moved to lib/utils.c.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:04:38 -07:00
David Ahern c4e0ea8e9b Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:04:05 -07:00
Florent Fourcot 2bfe28710e tc/htb: remove unused variable
Since introduction of htb module, this variable has never been used.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 08:00:45 -07:00
Mahesh Bandewar 5d5586b058 iproute: make clang happy
These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:58:09 -07:00
Mahesh Bandewar a5aaca9be2 ipmaddr: use preferred_family when given
When creating socket() AF_INET is used irrespective of the family
that is given at the command-line (with -4, -6, or -0). This change
will open the socket with the preferred family.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:57:11 -07:00
Stephen Hemminger 0ebb420929 uapi: update bpf headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:55:49 -07:00
Cong Wang 0bab7630e3 ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
make them available in ss -e output.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:53:39 -07:00
Stefan Bader 1a75322c5a iprule: Fix destination prefix output
When adding support for JSON output the new code for printing
the destination prefix adds a stray blank character before
the bitmask. This causes some user-space parsing to fail.

Current output:
  ...: from x.x.x.x/l to y.y.y.y /l
Previous output:
  ...: from x.x.x.x/l to y.y.y.y/l

Fixes: 0dd4ccc5 "iprule: add json support"
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:51:00 -07:00
Toke Høiland-Jørgensen 6526e604cf q_cake: Add description of the tc filter override mechanism to man page
Since CAKE now has three different settings that can be overridden by tc
filters (priority and host and flow hashes), documenting how they work is
probably a good idea.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:15:03 -07:00
Luca Boccassi 88ecd4873b testsuite: run dmesg with sudo
Some distributions like Debian nowadays restrict the dmesg command to
root-only. Run it with sudo in the testsuite.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Luca Boccassi 012895ce4e testsuite: let make compile build the netlink helper
The generate_nlmsg binary is required but make -C testsuite compile
does not build it. Add the necessary includes and C*FLAGS to the tools
Makefile and have the compile target build it.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Luca Boccassi ad23e152b8 testsuite: remove all temp files and implement make clean
Some generated test files were not removed, including one executable in
the testsuite/tools directory.
Ensure make clean from the top level directory works for the testsuite
subdirs too, and that all the files are removed.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Stefan Bader 1019364964 testsuite: Handle large number of kernel options
Once there are more than a certain number of kernel config options
set (this happened for us with kernel 4.17), the method of passing
those as command line arguments exceeds the maximum number of
arguments the shell supports. This causes the whole testsuite to
fail.
Instead, create a temporary file and modify its contents so that
the config option variables are exported. Then this file can be
sourced in before running the tests.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:13:26 -07:00
Stephen Hemminger a8e9f4ae14 tc: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:01:31 -07:00
Stephen Hemminger 51070e8f18 genl: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:01:01 -07:00
Stephen Hemminger cf7fe23859 bridge: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:00:38 -07:00
Stephen Hemminger 84fb55ede1 ip: drop extern from function prototype
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 15:58:50 -07:00
Phil Sutter 515a766cd2 lib: Make check_enable_color() return boolean
As suggested, turn return code into true/false although it's not checked
anywhere yet.

Fixes: 4d82962ccc ("Merge common code for conditionally colored output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 08:55:16 -07:00
Phil Sutter ff1ab8edf8 Make colored output configurable
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 08:54:06 -07:00
Shmulik Ladkani 94a8722f2f iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'
This allows the user to specify the LWTUNNEL_IP_SRC/LWTUNNEL_IP6_SRC
when setting an lwtunnel encapsulation route.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-17 13:17:03 -07:00
Phil Sutter 644b9c238c ip: Add missing -M flag to help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-17 09:19:54 -07:00
Stephen Hemminger bb616be341 Merge branch 'ipmonitor' 2018-08-16 10:30:10 -07:00
Stephen Hemminger 3e5f8e0ab6 genl: code cleanup
Run through checkpatch and cleanup line wraps etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:28:13 -07:00
Phil Sutter d559db725c man: ss.8: Describe --events option
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 6417c06b59 rtmon: List options in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 71170d854e man: rtacct.8: Fix nstat options
Add missing --pretty and --json options, correct --zero to --zeros and
correct the mess around --scan/--interval including broken man page
formatting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter a486d25b9c man: ifstat.8: Document --json and --pretty options
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter d94974bc91 genl: Fix help text
The '| help' part was misleading: In fact, 'genl help' does not work but
'genl <OBJECT> help' does. Fix the help text to make that clear.

In addition to that, list -Version and -help flags as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 29b1430ba9 man: devlink.8: Document -verbose option
This was the only bit missing in comparison to devlink help text.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter bb75b9bf2f devlink: trivial: Make help text consistent
Typically the part of the flag in brackets completes the leading part
instead of repeating it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter f9ff0cd69c bridge: trivial: Make help text consistent
Change curly braces into brackets for -json option in help text to be
consistent with the rest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter 05758f5c7b man: bridge.8: Document -oneline option
Copied the description from ip.8.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Stephen Hemminger e09d21f675 ipmonitor: decode DELNETCONF message
When device is deleted DELNETCONF is sent, but ipmonitor
was unable to decode it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 09:50:34 -07:00
Stephen Hemminger 40443f49b3 ip: convert monitor to switch
The decoding of netlink message types is natural for a C
switch statement.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 09:49:00 -07:00
David Ahern db71144c0c Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 14:32:10 -07:00
Phil Sutter d67eb4fbf8 testsuite: Add a first ss test validating ssfilter
This tests a few ssfilter expressions by selecting sockets from a TCP
dump file. The dump was created using the following command:

| ss -ntaD testsuite/tests/ss/ss1.dump

It is fed into ss via TCPDIAG_FILE environment variable.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 744bd07662 testsuite: Prepare for ss tests
This merges the shared bits from ts_tc() and ts_ip() into a common
function for being wrapped by the first ones and adds a third ts_ss()
for testing ss commands.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 38d209ecf2 ss: Review ssfilter
The original problem was ssfilter rejecting single expressions if
enclosed in braces, such as:

| sport = 22 or ( dport = 22 )

This is fixed by allowing 'expr' to be an 'exprlist' enclosed in braces.
The no longer required recursion in 'exprlist' being an 'exprlist'
enclosed in braces is dropped.

In addition to that, a few other things are changed:

* Remove pointless 'null' prefix in 'appled' before 'exprlist'.
* For simple equals matches, '=' operator was required for ports but not
  allowed for hosts. Make this consistent by making '=' operator
  optional in both cases.

Reported-by: Samuel Mannehed <samuel@cendio.se>
Fixes: b2038cc0b2 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 8a03a2f36f man: ip-route: Clarify referenced versions are Linux ones
Versioning scheme of Linux and iproute2 is similar, therefore the
referenced kernel versions are likely to confuse readers. Clarify this
by prefixing each kernel version by 'Linux' prefix.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:23:48 -07:00
Stephen Hemminger a84639fcb2 Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2018-08-15 14:21:45 -07:00
David Ahern ea9f9b910e Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:56:30 -07:00
Phil Sutter 4d82962ccc Merge common code for conditionally colored output
Instead of calling enable_color() conditionally with identical check in
three places, introduce check_enable_color() which does it in one place.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:55:27 -07:00
Phil Sutter 5332148deb bridge: Fix check for colored output
There is no point in calling enable_color() conditionally if it was
already called for each time '-color' flag was parsed. Align the
algorithm with that in ip and tc by actually making use of 'color'
variable.

Fixes: e9625d6aea ("Merge branch 'iproute2-master' into iproute2-next")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:54:51 -07:00
Phil Sutter 0d0e0e0bef tc: Fix typo in check for colored output
The check used binary instead of boolean AND, which means colored output
was enabled only if the number of specified '-color' flags was odd.

Fixes: 2d165c0811 ("tc: implement color output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:54:32 -07:00
Nishanth Devarajan 141b55f854 Add SKB Priority qdisc support in tc(8)
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.

Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-14 07:06:43 -07:00
Stephen Hemminger fa6b90904a Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2018-08-13 12:17:53 -07:00
Stephen Hemminger 31ad498a01 v4.18.0 2018-08-13 12:11:32 -07:00
Tobias Klauser b3b7c2a71b tc: bpf: update list of archs with eBPF support in manpage
Update the list of architectures supporting eBPF JIT as of Linux 4.18.
Also mention the Linux version where support for a particular
architecture was introduced. Finally, reformat the list of architectures
as a bullet list in order to make it more readable.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-13 12:09:11 -07:00
David Ahern c044be6b34 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:47:21 -07:00
Toke Høiland-Jørgensen 23a67b008a sch_cake: Make gso-splitting configurable
This patch makes sch_cake's gso/gro splitting configurable
from userspace.

To disable breaking apart superpackets in sch_cake:

tc qdisc replace dev whatever root cake no-split-gso

to enable:

tc qdisc replace dev whatever root cake split-gso

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:41:44 -07:00
Stephen Hemminger d97e266e5d ip: show min and max mtu
Add min/max MTU to the link details

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:24:31 -07:00
David Ahern 74eb09ad56 Update kernel headers
Update kernel headers to commit
78cbac647e61 (Merge branch 'ip-faster-in-order-IP-fragments'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:23:31 -07:00
Guillaume Nault bbc1cd0d27 l2tp: drop lns_mode
This option is never set.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Guillaume Nault 6022f4dd38 l2tp: drop mtu
This option can't be set by user and is never printed.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Guillaume Nault 99d6ff2101 l2tp: drop data_seq
This option can't be set by user and is never printed. Furthermore,
L2TP_ATTR_DATA_SEQ has always been a noop in Linux.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Keara Leibovitz e8bd395508 tc: fix bugs for tcp_flags and ip_attr hex output
Fix hex output for both the ip_attr and tcp_flags print functions.

Sample usage:

$ $TC qdisc add dev lo ingress
$ $TC filter add dev lo parent ffff: prio 3 proto ip flower ip_tos 0x8/32
$ $TC fitler add dev lo parent ffff: prio 5 proto ip flower ip_proto tcp \
	tcp_flags 0x909/f00

$ $TC filter show dev lo parent ffff:

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 0x8/32
  not_in_hw
filter protocol ip pref 5 flower chain 0
filter protocol ip pref 5 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  tcp_flags 0x909/f00
  not_in_hw

$ $TC -j filter show dev lo parent ffff:

[{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_tos":"0x8/32"
    },
    "not_in_hw":true
    }
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_proto":"tcp",
	    "tcp_flags":"0x909/f00"
	},
	"not_in_hw":true
    }
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:04:00 -07:00
Matteo Croce d56c7dde9d ip link: don't stop batch processing
When 'ip link show dev DEVICE' is processed in a batch mode, ip exits
and stop processing further commands.
This because ipaddr_list_flush_or_save() calls exit() to avoid printing
the link information twice.
Replace the exit with a classic goto out instruction.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-08 09:24:47 -07:00
Stephen Hemminger d66fdfda71 tc: flush after each command in batch mode
After each command flush output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-08 09:23:48 -07:00
Lubomir Rintel 3655f788d3 lib/namespace: avoid double-mounting a /sys
This partly reverts 8f0807023d, bringing
back the umount(/sys) attempt.

In a LXC container we're unable to umount the sysfs instance, nor mount
a read-write one. We still are able to create a new read-only instance.

Nevertheless, it still makes sense to attempt the umount() even though
the sysfs is mounted read-only. Otherwise we may end up attempting to
mount a sysfs with the same flags as is already mounted, resulting in
an EBUSY error (meaning "Already mounted").

Perhaps this is not a very likely scenario in real world, but we hit
it in NetworkManager test suite and makes netns_switch() somewhat more
robust. It also fixes the case, when /sys wasn't mounted at all.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:40:12 -07:00
Stephen Hemminger e5faf729cb ip: show min and max mtu
Add min/max MTU to the link details

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:30:19 -07:00
Stephen Hemminger c8f7a754ed ip/address: fix bracketing in help message
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:26:21 -07:00
David Ahern a0bc57e1ef Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	include/uapi/linux/bpf.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:08:04 -07:00
Jiri Pirko afcd06991d tc: introduce support for chain templates
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:00:28 -07:00
Eran Ben Elisha 8c7acf3a7a ip: Add violation counters to VF statisctics
Extend VFs statistics by receive and transmit violation counters.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       2
    TX: bytes  packets  errors  dropped carrier collsns
    1406       17       0       0       0       0
    vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off
    RX: bytes  packets  mcast   bcast   dropped
    1666       29       14         32      0
    TX: bytes  packets   dropped
    2880       44       2412

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 09:59:36 -07:00
David Ahern 8b099da560 Update kernel headers
Update kernel headers to commit
aea5f654e6b7 ("net/sched: add skbprio scheduler")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 09:58:00 -07:00
Stephen Hemminger 7327f78565 rdam: uapi update ib_user_verbs.h
Merge in latest santized kernel header.
Put sanitized version of current ib_user_verbs.h.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:49:20 -07:00
Stephen Hemminger 7c16a8da6b uapi: fix tcp.h repair
Upstream define for TCP_REPAIR changed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:47:22 -07:00
David Ahern 7f57c8b726 devlink: CTRL_ATTR_FAMILY_ID is a u16
CTRL_ATTR_FAMILY_ID is a u16, not a u32. Update devlink accordingly.

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:44:36 -07:00
David Ahern 5f9c8c6a16 Merge branch 'tc-tunnels-tos-ttl' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:43 -07:00
Or Gerlitz 761ec9e29f tc/flower: Add match on encapsulating tos/ttl
Add matching on tos/ttl of the IP tunnel headers.

For example, here's decap rule that matches on the tunnel tos:

tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
   enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
   src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50  \
   action tunnel_key unset \
   action mirred egress redirect dev eth0_0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:11 -07:00
Or Gerlitz 9f89b0cc0e tc/act_tunnel_key: Enable setup of tos and ttl
Allow to set tos and ttl for the tunnel.

For example, here's encap rule that sets tos to the tunnel:

tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
   src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
   action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
   action mirred egress redirect dev vxlan_sys_4789

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:58:31 -07:00
David Ahern 204db84eb8 Update kernel headers
Update kernel headers to
a3eed83a1895 ("Merge branch 'qed-Add-support-for-phy-module-query'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:57:23 -07:00
Toke Høiland-Jørgensen 77c9fbd06e q_cake: Rename autorate_ingress parameter to use dash as word separator
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:46:42 -07:00
Jesus Sanchez-Palencia b625e36108 tc: Do not use addattr_nest_compat on mqprio and netem
Here we are partially reverting commit c14f9d92ee
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .

As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.

This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.

Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.

As an example, this can be reproduced by the following commands when
this patch is not applied:

 1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
	num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	queues 1@0 1@1 2@2 hw 0

RTNETLINK answers: Numerical result out of range

 2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
	distribution normal latency 1 1

$ tc -s qdisc

(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us
 Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

With this patch applied, the tc -s qdisc command above for netem instead
reads:

(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us \
	rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

[1] https://patchwork.ozlabs.org/patch/867860/#1893405

Fixes: c14f9d92ee ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-19 15:50:07 -07:00
Toke Høiland-Jørgensen 714444c0cb Add support for CAKE qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:23:46 -07:00
Alex Vesker 8b4fbf0bed devlink: Add support for devlink-region access
Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.

This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.

A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.

The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.

New commands added:
 devlink region show [ DEV/REGION ]
 devlink region delete DEV/REGION snapshot SNAPSHOT_ID
 devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
 devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
                                address ADDRESS length length

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:20:15 -07:00
Qiaobin Fu 697dce7b3a net:sched: add action inheritdsfield to skbedit
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
* Make tc use netlink helper functions

v3:
* Make flag represented in JSON output as a null value

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:17:56 -07:00
Mathieu Xhonneux 04cb3c0d43 ip: add support for seg6local End.BPF action
This patch adds support for the End.BPF action of the seg6local
lightweight tunnel. Functions from the BPF lightweight tunnel are
re-used in this patch. Example:

$ ip -6 route add fc00::18 encap seg6local action End.BPF endpoint
obj my_bpf.o sec my_func dev eth0

$ ip -6 route show fc00::18
fc00::18  encap seg6local action End.BPF endpoint my_bpf.o:[my_func]
dev eth0 metric 1024 pref medium

v2: - re-use of print_encap_bpf_prog instead of fprintf
    - introduction of "endpoint" keyword for more consistency with
      others parameters

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:56:18 -07:00
Serhey Popovych 8df708afd6 ipaddress: Fix and make consistent label match handling
Since commit 9516823051 ("ipaddress: Improve print_linkinfo()") we
return -1 instead of 0 when ip-address(8) label does not match network
device name as we did before change. This causes regression when trying
to output ip address matching label:

     # ip addr add 192.168.192.1/24 dev lo label lo:1
     # ip addr show label lo:1
     <no output>

This is special case and return 0 from print_linkinfo() earlier to match
only filter.ifindex and filter.up if given, but not rest fields in
@filter. Then call print_selected_addrinfo() without calling
print_link_stats() in ipaddr_list_flush_or_save().

Later print_selected_addrinfo() calls print_addrinfo() that finally
matches IFA_LABEL attribute in netlink buffer with filter.label using
ifa_label_match_rta().

On the other hand there is three conditions checked in print_linkinfo()
to determine label special case:

    1) filter.label != NULL
    2) filter.family == AF_UNSPEC || filter.family == AF_PACKET
    3) fnmatch(filter.label, name, 0)

With 1) it is ok to check if filtering by label is on by given pattern
in @filter.label.

Since label is IPv4 specific and AF_PACKET is for printing ip-link(8)
information (see ipaddr_link_list()::ipaddress.c as example) checking
for AF_PACKET in 2) doesn't take much sense: better to defer these
checks to print_addrinfo() determine valid combinations before calling
ifa_label_match_rta() to finally match IFA_LABEL to pattern in
filter.label.

For 3) we have following call for test case:

    fnmatch(pattern, string, flags) ->
      fnmatch(filter.label, name, 0) ->
        fnmatch("lo:1", "lo", 0) == FNM_NOMATCH (1) or non-zero on error

To support special case in print_linkinfo() for filtering by label we
only need to check if label pattern is given in filter.label and return
0 to skip print_link_stats() in ipaddr_list_flush_or_save(): actual
filtering will be done in print_addrinfo().

Before commit 9516823051 ("ipaddress: Improve print_linkinfo()"):
-------------------------------------------------------------------

$ ip addr sh label lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                          fnmatch("lo", "lo", 0) == 0
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip addr sh label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip -4 addr sh label lo:1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                             filter.family == AF_INET
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

After this change applied:
--------------------------

$ ip/ip addr show label lo
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip/ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
        valid_lft forever preferred_lft forever
$ ip/ip addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip/ip -4 addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

Note that we no longer show link information as we did previously:
    we are filtering by "label" pattern, not showing by "dev".

Fixes: commit 9516823051 ("ipaddress: Improve print_linkinfo()")
Reported-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:52:55 -07:00
David Ahern b05a68f721 Merge branch 'bpf-btf' into iproute2-next
Daniel Borkmann  says:

====================

Main part of this set is to: i) avoid strict af_alg kernel dependency,
ii) add loader support for bpf to bpf calls and iii) add btf loader
support with an option to annotate maps. For details please see the
individual patches. Thanks!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:39:06 -07:00
Daniel Borkmann f823f36012 bpf: implement btf handling and map annotation
Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.

One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.

The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.

Example with test_xdp_noinline.o from kernel selftests:

  [...]

  struct ctl_value {
        union {
                __u64 value;
                __u32 ifindex;
                __u8 mac[6];
        };
  };

  struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
        .type		= BPF_MAP_TYPE_ARRAY,
        .key_size	= sizeof(__u32),
        .value_size	= sizeof(struct ctl_value),
        .max_entries	= 16,
        .map_flags	= 0,
  };
  BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);

  [...]

Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 7.0.0svn
    Optimized build.
    Default target: x86_64-unknown-linux-gnu
    Host CPU: skylake

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
  [...]

  # clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
    llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
  # pahole -J test_xdp_noinline.o

Checking pahole dump of BPF object file:

  # file test_xdp_noinline.o
  test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
  # pahole test_xdp_noinline.o
  [...]
  struct ctl_value {
	union {
		__u64              value;                /*     0     8 */
		__u32              ifindex;              /*     0     4 */
		__u8               mac[0];               /*     0     0 */
	};                                               /*     0     8 */

	/* size: 8, cachelines: 1, members: 1 */
	/* last cacheline: 8 bytes */
  };

Now loading into kernel and dumping the map via bpftool:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  [...]
  # bpftool prog show id 227
  227: xdp  tag a85e060c275c5616  gpl
      loaded_at 2018-07-17T14:41:29+0000  uid 0
      xlated 8152B  not jited  memlock 12288B  map_ids 381,385,386,382,384,383
  # bpftool map dump id 386
   [{
        "key": 0,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
        "key": 1,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:44 -07:00
Daniel Borkmann b5cb33aec6 bpf: implement bpf to bpf calls support
Implement missing bpf to bpf calls support. The loader will
recognize .text section and handle relocation entries that
are emitted by LLVM.

First step is processing of map related relocation entries
for .text section, and in a second step loader will copy .text
section into program section and adjust call instruction
offset accordingly.

Example with test_xdp_noinline.o from kernel selftests:

 1) Every function as __attribute__ ((always_inline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:233 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 233
  [...]
  1669: (2d) if r3 > r2 goto pc+4
  1670: (79) r2 = *(u64 *)(r10 -136)
  1671: (61) r2 = *(u32 *)(r2 +0)
  1672: (63) *(u32 *)(r1 +0) = r2
  1673: (b7) r0 = 1
  1674: (95) exit        <-- 1674 insns total

 2) Every function as __attribute__ ((noinline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:236 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 236
  [...]
  1000: (bf) r1 = r6
  1001: (b7) r2 = 24
  1002: (85) call pc+3   <-- pc-relative call insns
  1003: (1f) r7 -= r0
  1004: (bf) r0 = r7
  1005: (95) exit
  1006: (bf) r0 = r1
  1007: (bf) r1 = r2
  1008: (67) r1 <<= 32
  1009: (77) r1 >>= 32
  1010: (bf) r3 = r0
  1011: (6f) r3 <<= r1
  1012: (87) r2 = -r2
  1013: (57) r2 &= 31
  1014: (67) r0 <<= 32
  1015: (77) r0 >>= 32
  1016: (7f) r0 >>= r2
  1017: (4f) r0 |= r3
  1018: (95) exit        <-- 1018 insns total

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:43 -07:00
Daniel Borkmann 6e5094dbb7 bpf: remove strict dependency on af_alg
Do not bail out when AF_ALG is not supported by the kernel and
only do so when a map is requested in object ns where we're
calculating the hash. Otherwise, the loader can operate just
fine, therefore lets not fail early when it's not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:40 -07:00
Daniel Borkmann 282a1fe1f8 bpf: move bpf_elf_map fixup notification under verbose
No need to spam the user with this if it can be fixed gracefully
anyway. Therefore, move it under verbose option.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:38 -07:00
David Ahern 5081979176 Import btf.h from kernel headers
Import btf.h from kernel headers at commit
    2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
which is the last sync point.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:37:50 -07:00
Roopa Prabhu 9c6a6d84ee ipneigh: exclude NTF_EXT_LEARNED from default filter
NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 18:57:21 -07:00
Jakub Kicinski da083b5a48 iplink: add support for reporting multiple XDP programs
Kernel now supports attaching XDP programs in the driver
and hardware at the same time.  Print that information
correctly.

In case there are multiple programs attached kernel will
not provide IFLA_XDP_PROG_ID, so don't expect it to be
there (this also improves the printing for very old kernels
slightly, as it avoids unnecessary "prog/xdp" line).

In short mode preserve the current outputs but don't print
IDs if there are multiple.

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpoffload/id:11 qdisc [...]

and:

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpmulti qdisc [...]

ip link output will keep using prog/xdp prefix if only one program
is attached, but can also print multiple program lines:

    prog/xdp id 8 tag fc7a51d1a693a99e jited

vs:

    prog/xdpdrv id 8 tag fc7a51d1a693a99e jited
    prog/xdpoffload id 9 tag fc7a51d1a693a99e

JSON output gains a new array called "attached" which will
contain the full list of attached programs along with their
attachment modes:

        "xdp": {
            "mode": 3,
            "prog": {
                "id": 11,
                "tag": "fc7a51d1a693a99e",
                "jited": 0
            },
            "attached": [ {
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

In case there are multiple programs attached the general "xdp"
section will not contain program information:

        "xdp": {
            "mode": 4,
            "attached": [ {
                    "mode": 1,
                    "prog": {
                        "id": 10,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 1
                    }
                },{
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:10:03 -07:00
Jianbo Liu 1f0a5dfd38 tc: flower: Add support for QinQ
To support matching on both outer and inner vlan headers,
we add new cvlan_id/cvlan_prio/cvlan_ethtype for inner vlan header.

Example:
# tc filter add dev eth0 protocol 802.1ad parent ffff: \
    flower vlan_id 1000 vlan_ethtype 802.1q \
        cvlan_id 100 cvlan_ethtype ipv4 \
    action vlan pop \
    action vlan pop \
    action mirred egress redirect dev eth1

# tc filter show dev eth0 ingress
filter protocol 802.1ad pref 1 flower chain 0
filter protocol 802.1ad pref 1 flower chain 0 handle 0x1
  vlan_id 1000
  vlan_ethtype 802.1Q
  cvlan_id 100
  cvlan_ethtype ip
  eth_type ipv4
  in_hw

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:03:50 -07:00
David Ahern 3eebc1d4f4 Update kernel headers
Update kernel headers to commit
2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:02:51 -07:00
David Ahern 5910422d21 Merge branch 'tc-etf' into iproute2-next
Jesus Sanchez-Palencia  says:

====================

fixes since v3:
 - Add support for clock names with the "CLOCK_" prefix;
 - Print clock name on print_opt();
 - Use strcasecmp() instead of strncasecmp().

The ETF (earliest txtime first) qdisc was recently merged into net-next
[1], so this patchset adds support for it through the tc command line
tool.

An initial man page is also provided.

The first commit in this series is adding an updated version of
include/uapi/linux/pkt_sched.h and is not meant to be merged. It's
provided here just as a convenience for those who want to easily build
this patchset.

[1] https://patchwork.ozlabs.org/cover/938991/

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:51:46 -07:00
Jesus Sanchez-Palencia 85d699c3a8 man: Add initial manpage for tc-etf(8)
Add an initial manpage for tc-etf covering all config options, basic
concepts and operation modes.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:53 -07:00
Vinicius Costa Gomes 7da5ef2200 tc: Add support for the ETF Qdisc
The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.

The syntax is:

tc qdisc add dev DEV parent NODE etf delta <DELTA>
                     clockid <CLOCKID> [offload] [deadline_mode]

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:10 -07:00
Stephen Hemminger b49759c0e7 tc: don't double print rate
Conversion to print stats in JSON forgot to remove existing
fprintf.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-09 09:53:45 -07:00
Jesus Sanchez-Palencia 4df5bb1be0 man: Fix typos on tc-cbs
Fix 2 typos on the man page of the CBS qdisc.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
fumihiko kakuma d529ea2ff4 tc: Fix the bug not to display prio and quantum options of htb
A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.

Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Roi Dayan 425dcc2741 tc: Fix output of ip attributes
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Stephen Hemminger dc3ef235f3 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:56:27 -07:00
Simon Horman 6217917a38 tc: m_tunnel_key: Add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 09:10:05 -07:00
Moshe Shemesh 13925ae9eb devlink: Add param command support
Add support for configuration parameters set and show.
Each parameter can be either generic or driver-specific.
The user can retrieve data on these configuration parameters by devlink
param show command and can set new value to a configuration parameter
by devlink param set command.
The configuration parameters can be set in different configuration
modes:
  runtime - set while driver is running, no reset required.
  driverinit - applied while driver initializes, requires restart
               driver by devlink reload command.
  permanent - written to device's non-volatile memory, hard reset
              required to apply.

New commands added:
  devlink dev param show [DEV name PARAMETER]
  devlink dev param set DEV name PARAMETER value VALUE
			    cmode { permanent | driverinit | runtime }

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:43:28 -07:00
David Ahern 22ddbd8204 Update kernel headers
Update kernel headers to commit
ab8565af68001 ("Merge branch 'IP-listification-follow-ups'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:42:22 -07:00
Nikolay Aleksandrov 05001bcfab bridge: add support for isolated option
This patch adds support for the new isolated port option which, if set,
would allow the isolated ports to communicate only with non-isolated
ports and the bridge device. The option can be set via the bridge or ip
link type bridge_slave commands, e.g.:
$ ip link set dev eth0 type bridge_slave isolated on
$ bridge link set dev eth0 isolated on

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 07:58:41 -07:00
David Ahern f2bfb31bef Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-21 08:12:39 -07:00
Keara Leibovitz 4757a54799 tc: jsonify nat action
Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
	continue index 99
~$ $TC -j actions ls action nat

[{
	"total acts": 2
}, {
	"actions": [{
		"order": 0,
		"type": "nat",
		"direction": "egress",
		"old_addr": "10.10.10.1/32",
		"new_addr": "20.20.20.2",
		"control_action": {
			"type": "pass"
		},
		"index": 2,
		"ref": 1,
		"bind": 0
	}, {
		"order": 1,
		"type": "nat",
		"direction": "ingress",
		"old_addr": "100.100.100.1/32",
		"new_addr": "200.200.200.2",
		"control_action": {
			"type": "continue"
		},
		"index": 99,
		"ref": 1,
		"bind": 0
	}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-20 10:20:34 -07:00
Eric S. Raymond a85f921ae5 devlink.8, translate unparseable callout syntax to parseable form.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:41:41 -07:00
Vlad Buslov b133392468 tc: fix batch force option
When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:32:36 -07:00
Subash Abhinov Kasiviswanathan 2ecb61a0c2 ip-xfrm: Add support for OUTPUT_MARK
This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output-

(with mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with output-mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(no mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

v2->v3: Fix one help formatting error as mentioned by Lorenzo.
Keep mark and output-mark on the same line and add man page info as
mentioned by David.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-18 06:37:00 -07:00
Daniele Palmas 46c16a5d1e ip: add rmnet initial support
This patch adds basic support for Qualcomm rmnet devices.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:15:14 -07:00
Patrick Talbert cad73425d8 ipaddress: strengthen check on 'label' input
As mentioned in the ip-address man page, an address label must
be equal to the device name or prefixed by the device name
followed by a colon. Currently the only check on this input is
to see if the device name appears at the beginning of the label
string.

This commit adds an additional check to ensure label == dev or
continues with a colon.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:14:19 -07:00
Hoang Le 5887ff0922 rdma: sync some IP headers with glibc
In the commit 9a362cc71a, new userspace header:
  (i.e rdma/rdma_user_cm.h -> linux/in6.h)
is included before the kernel space header:
  (i.e utils.h -> resolv.h -> netinet/in.h).

This leads to unsynchronous some IP headers and compiler got failure
with error: redefinition of some structs IP.

In this commit, just reorder this including to make them in-sync.

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:11:51 -07:00
Hoang Le a56e0db7e8 tipc: JSON support for tipc link printouts
Add json output support for tipc link command

Example output:
$tipc -j -p link list

[ {
        "broadcast-link": "up",
        "1.1.1:bridge-1.1.104:eth0": "up",
        "1.1.1:bridge-1.1.105:eth0": "up",
        "1.1.1:bridge-1.1.106:eth0": "up"
    } ]

--------------------
$tipc -j -p link stat show link broadcast-link

[ {
        "link": "broadcast-link",
        "window": 50,
        "rx packets": {
            "rx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "tx packets": {
            "tx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "rx naks": {
            "rx naks": 0,
            "defs": 0,
            "dups": 0
        },
        "tx naks": {
            "tx naks": 0,
            "acks": 0,
            "retrans": 0
        },
        "congestion link": 0,
        "send queue max": 0,
        "avg": 0
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:59 -07:00
Hoang Le 1304f50a5b tipc: JSON support for showing nametable
Add json output support for nametable show

Example output:
$tipc -j -p nametable show

[ {
        "type": 0,
        "lower": 16781313,
        "upper": 16781313,
        "scope": "zone",
        "port": 0,
        "node": ""
    },{
        "type": 0,
        "lower": 16781416,
        "upper": 16781416,
        "scope": "cluster",
        "port": 0,
        "node": ""
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h
    Add new parameter '-pretty' to support pretty output

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:38 -07:00
Donald Sharp a313455c6c iproute2: Add support for a few routing protocols
Add support for:

BGP
ISIS
OSPF
RIP
EIGRP

Routing protocols to iproute2.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-11 11:18:30 -07:00
David Ahern ee095a417e Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-10 07:30:32 -07:00
Stephen Hemminger 776f1813b5 uapi: update headers from linux-net
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:27:13 -07:00
Stephen Hemminger 17678d3059 Merge ../iproute2-next 2018-06-08 10:27:04 -07:00
Stephen Hemminger 2d3dd6f6c1 v4.17.0 2018-06-08 10:11:50 -07:00
Stephen Hemminger 4be85d574e uapi: update bpf.h to include padding
Last minute upstream 4.17 change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:09:21 -07:00
Hoang Le 313ce6949c tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK
In the commit 94f6a80 on next-net, TIPC_NLA_LINK_NAME attribute should be
retrieved and validated via TIPC_NLA_LINK nesting entry in
tipc_nl_node_get_link().
According to that commit, TIPC_NLA_LINK_NAME value passing via
tipc link get command must follow above hierachy.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:07:13 -07:00
Nicolas Dichtel 974ef93bf1 iplink: enable to specify a name for the link-netns
The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:06:21 -07:00
Nicolas Dichtel 9580bad7b9 ip: display netns name instead of nsid
When iproute2 has a name for the nsid, let's display it. It's more
user friendly than a number.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:06:21 -07:00
Keara Leibovitz 831b5d40d9 tc: add json support in csum action
Add json output support for checksum action.

Example output:

~$ $TC actions add action csum udp continue index 7
~$ $TC actions add action csum icmp iph igmp pipe index 200 cookie 112233
~$ $TC -j actions ls action csum

[{
    "total acts":2
}, {
    "actions": [{
        "order":0,
        "csum":"udp",
        "control_action": {
            "type":"continue"
        },
        "index":7,
        "ref":1,
        "bind":0
    }, {
        "order":1,
        "csum":"iph, icmp, igmp",
        "control_action": {
            "type":"pipe"
        },
        "index":200,
        "ref":1,
        "bind":0,
        "cookie":"112233"
    }]
}]

v2:
    Don't initialized char buf[64];
    Add output example

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-05 15:30:30 -07:00
David Ahern 55b973329c Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-05 14:22:15 -07:00
Ivan Vecera 9e4a92e5aa devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts
Since commit 049c58539f ("devlink: mnlg: Add support for extended ack")
devlink requires NETLINK_{CAP,EXT}_ACK. This prevents devlink from
working with older kernels that don't support these features.

host # ./devlink/devlink
Failed to connect to devlink Netlink

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-06-01 16:03:59 -04:00
Nicolas Dichtel eaf89d7d52 ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support
Parse and display those attributes.
Example:
ip l a type dummy
ip netns add foo
ip monitor link&
ip l s dummy1 netns foo
Deleted 6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 66:af:3a:3f:a0:89 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:59:40 -04:00
Nathan Harold b8e7799003 iproute2: fix 'ip xfrm monitor all' command
Currently, calling 'ip xfrm monitor all' will
actually invoke the 'all-nsid' command because the
soft-match for 'all-nsid' occurs before the precise
match for 'all'. This patch rearranges the checks
so that the 'all' command, itself an alias for
invoking 'ip xfrm monitor' with no argument, can
be called consistent with the syntax for other ip
commands that accept an 'all'.

Signed-off-by: Nathan Harold <nharold@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:57:26 -04:00
David Ahern 4c3493689e iplink_vrf: Save device index from response for return code
A recent commit changed rtnl_talk_* to return the response message in
allocated memory so callers need to free it. The change to name_is_vrf
did not save the device index which is pointing to a struct inside the
now allocated and freed memory resulting in garbage getting returned
in some cases.

Fix by using a stack variable to save the return value and only set
it to ifi->ifi_index after all checks are done and before the answer
buffer is freed.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:45:09 -04:00
Stephen Hemminger 2884af6d37 rt_protos: drop old experimental gated names
No longer need these petroglyph values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:44:52 -04:00
Roopa Prabhu 804c7fff76 iproute: ip route get support for sport, dport and ipproto match
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:19:30 -07:00
David Ahern 9107c425ac ip route: print RTA_CACHEINFO if it exists
RTA_CACHEINFO can be sent for non-cloned routes. If the attribute is
present print it. Allows route dumps to print expires times for example
which can exist on FIB entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:18:31 -07:00
David Ahern 45c0dd7286 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:17:23 -07:00
David Ahern 78d04c7b27 ipaddress: Add support for address metric
Add support for IFA_RT_PRIORITY using the same keywords as iproute for
RTA_PRIORITY.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-30 08:20:04 -07:00
David Ahern 57ac202c78 Update kernel headers
Update kernel headers to commit
ae40832e53c3 ("bpfilter: fix a build err")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-30 08:06:19 -07:00
Stephen Hemminger 65083b5fe3 ip: defer lookup interface index
The ip command would always lookup the network device index
even when not necessary. This slows down operations like creating
lots of VLAN's.

David reported the original issue, this is an alternative patch
that solves it in a slightly more general method.

Using iproute2 to create a bridge and add 4094 vlans to it can take from
2 to 3 *minutes*. The reason is the extraneous call to ll_name_to_index.
ll_name_to_index results in an ioctl(SIOCGIFINDEX) call which in turn
invokes dev_load. If the index does not exist, which it won't when
creating a new link, dev_load calls modprobe twice -- once for
netdev-NAME and again for NAME. This is unnecessary overhead for each
link create.

When ip link is invoked for a new device, there is no reason to
call ll_name_to_index for the new device. With this patch, creating
a bridge and adding 4094 vlans takes less than 3 *seconds*.

	old:
	# time ip -batch ip-vlan.batch
	real    3m13.727s
	user    0m0.076s
	sys     0m1.959s

	new:
	# time ip -batch ip-vlan.batch
	real    0m3.222s
	user    0m0.044s
	sys     0m1.777s

Reported-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-25 07:48:40 -07:00
David Ahern 39d16a02d9 ip route: Print expires as signed int
rta_expires is a signed int; print it as one.

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:29:56 -07:00
Pavel Maltsev e2f5ceccda Allow to configure /var/run/netns directory
Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
However, some systems (e.g. Android) doesn't have /var
which results in error attempts to create network namespaces on these
systems.  This change makes NETNS_RUN_DIR configurable at build time
by allowing to pass environment variable to make command.
Also, this change makes /etc/netns directory configurable through
NETNS_ETC_DIR environment variable.

For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make

Tested: verified that iproute2 with configuration mentioned above
creates namespaces in /mnt/vendor/netns

Signed-off-by: Pavel Maltsev <pavelm@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:16:53 -07:00
Jiri Pirko 852ed60528 devlink: introduce support for showing port flavours
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:55 -07:00
David Ahern c2a569e63c Update kernel headers
Update kernel headers to commit
e89e59c08d1b ("Merge branch 'net-sfp-small-improvements'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:34 -07:00
David Ahern 24bd5ac6b8 Merge branch 'rdma-resource-tracking' into iproute2-next
Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include displaying
driver-specific resource attributes.  It is the user-space part of the
kernel driver resource tracking series that has been accepted for merging
into linux-4.18 [1]

If there are no additional review comments, it can now be merged, I think.

Changes since v2:
- resync rdma_netlink.h to fix uapi break

Changes since v1:
- commit log editorial fixes
- cite kernel commits that updated rdma_netlink.h in the
  iproute2 commit syncing this header
- reorder stack definitions ala "reverse christmas tree"
- correctly handle unknown driver attributes when printing

Changes since v0/rfc:
- changed "provider" to "driver" based on kernel side changes
- updated man pages
- removed "RFC" tag

Thanks,

Steve.

[1] https://www.spinics.net/lists/linux-rdma/msg64199.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:21:42 -07:00
Steve Wise 853d222d78 rdma: update man pages
Update the man pages for the resource attributes as well
as the driver-specific attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:20:01 -07:00
Steve Wise 331152752a rdma: print driver resource attributes
This enhancement allows printing rdma device-specific state, if provided
by the kernel. This is done in a generic manner, so rdma tool doesn't
need to know about the details of every type of rdma device.

Driver attributes for a rdma resource are in the form of <key,
[print_type], value> tuples, where the key is a string and the value can
be any supported driver attribute. The print_type attribute, if present,
provides a print format to use vs the standard print format for the type.
For example, the default print type for a PROVIDER_S32 value is "%d ",
but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.

Driver resources are only printed when the -dd flag is present.
If -p is present, then the output is formatted to not exceed 80 columns,
otherwise it is printed as a single row to be grep/awk friendly.

Example output:

# rdma resource show qp lqpn 1028 -dd -p
link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
    sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
    size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
    rqt_size 256 in_use 128 size 130

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:17:16 -07:00
Steve Wise 366d20b91f rdma: update rdma_netlink.h to get new driver attributes
Pull in the rdma_netlink.h changes from kernel
commits:

25a0ad85156a ("RDMA/nldev: Add explicit pad attribute")
da5c85078215 ("RDMA/nldev: add driver-specific resource tracking)"
0d52d803767e ("RDMA/uapi: Fix uapi breakage")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:16:22 -07:00
Jon Maloy 5947046dd9 tipc: fixed node and name table listings
We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:12:24 -07:00
Roman Mashak 53d34eb66c tc: add missing space symbol in ife output
In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Fixes: 8744c5d338 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:10:48 -07:00
Marcelo Ricardo Leitner ac6a4c2299 tc: flower: add support for verbose logging
Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.

This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:06:04 -07:00
David Ahern 4276e65290 Update kernel headers
Update kernel headers to commit
64a2658b58ab ("net: mscc: Add SPDX identifier")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:05:07 -07:00
Stephen Hemminger 405e0c4ffe tc: allow 0% for percent options
Allowing 0% is sometimes useful for example in netem loss and drop
or perhaps dropping all traffic in a HTB bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199745
Reported-by: stuartmarsden@gmail.com
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-17 16:20:50 -07:00
Marcelo Ricardo Leitner 4f59f4a5af tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-16 15:36:48 -07:00
Marcelo Ricardo Leitner 3f2c23811d tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:16:29 -07:00
David Ahern 961d0991bc Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:10:27 -07:00
Luca Boccassi 9b13cc98f5 ip: do not drop capabilities if net_admin=i is set
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.

As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.

Fixes: ba2fc55b99 ("Drop capabilities if not running ip exec vrf with libcap")

Signed-off-by: Luca Boccassi <bluca@debian.org>
2018-05-14 21:07:34 -07:00
David Ahern 4aba7dc4f4 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	rdma/include/uapi/rdma/rdma_netlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 21:04:16 -07:00
GhantaKrishnamurthy MohanKrishna 7d40bdbc8d tipc: Add support to set and get MTU for UDP bearer
In this commit we introduce the ability to set and get
MTU for UDP media and bearer.

For set and get properties such as tolerance, window and priority,
we already do:

    $ tipc media set PPROPERTY media MEDIA
    $ tipc media get PPROPERTY media MEDIA

    $ tipc bearer set OPTION media MEDIA ARGS
    $ tipc bearer get [OPTION] media MEDIA ARGS

The same has been extended for MTU, with an exception to support
only media type UDP.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:53:32 -07:00
David Ahern fd95ec0e8e Update kernel headers
Update kernel headers to commit 53a7bdfb2a27
("dt-bindings: dsa: Remove unnecessary #address/#size-cells")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:52:52 -07:00
Stephen Hemminger 10f687736b ss: remove non-functional slabinfo
Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.

Rather than trying to fix kernel, just trim away the no longer
valid statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 13:57:08 -07:00
Stephen Hemminger 9b2ab68516 rdma: add ib header files
The iproute2 header files must be complete to allow builds on
other places where some of the headers are not present.

For example, iproute2 is built on Windows Services for Linux
as a test tool. With the partial addition of rdma it was broken.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:14:55 -07:00
Stephen Hemminger 36eece51e3 rdma: align headers with upstream
This makes rdma/include/uapi/rdma headers align with those produced
by doing make headers_install from upstream (Linus) tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:12:13 -07:00
Jakub Kicinski 0c0394ff83 bpf: don't offload perf array maps
Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:08:00 -07:00
David Ahern 7732148d1d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:07:47 -07:00
Ido Schimmel c0ec7c9f87 iproute: Parse last nexthop in a multipath route
Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:44 -07:00
Baruch Siach 37bf5c6fcb arpd: remove pthread dependency
Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Baruch Siach 3b07981a27 README: update libdb build dependency information
Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Toke Høiland-Jørgensen 4db2ff0db4 json_print: Fix hidden 64-bit type promotion
print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:55 -07:00
Toke Høiland-Jørgensen bf717756b5 ingress: Don't break JSON output
The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:39 -07:00
Hangbin Liu 8f01001abc vxlan: add ttl auto in help message
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:43:46 -07:00
Sabrina Dubroca 7f520601f5 gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.

Add no* variants to allow clearing them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:58 -07:00
Sabrina Dubroca d21c028cf7 man: ip link: document GRE tunnels
GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:44 -07:00
David Ahern 0d93d1e736 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:21 -07:00
Jakub Kicinski f5393225f9 iplink_geneve: correct size of message to avoid spurious errors
Commit 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:

addattr_l ERROR: message exceeded bound of 4

when remote is specified.

Fixes: 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-04-20 10:39:53 -07:00
Stephen Hemminger 260a92afe6 bpf: fix warnings on gcc-8 about string truncation
In theory, the path for BPF could exceed the 4K PATH_MAX.
In practice, not really possible. But shut up gcc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:38:00 -07:00
Roman Mashak 0aaf62fcb6 tc: return on invalid smac or dmac in ife action
Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-04-20 10:35:21 -07:00
Stephen Hemminger 0b01f088ee flower: use 16 bit format where possible
Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:35:00 -07:00
Stephen Hemminger 7cd3f08b6f ipneigh: fix missing format specifier
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:33:22 -07:00
Hangbin Liu 5b1c363c7b vxlan: fix ttl inherit behavior
Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
vxlan ttl inherit should means inherit the inner protocol's ttl value.

But currently when we add vxlan with "ttl inherit", we only set ttl 0,
which is actually use whatever default value instead of inherit the inner
protocol's ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
to means "use whatever default value", the same behavior with ttl == 0.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:11:27 -07:00
David Ahern 075bf62a70 Update kernel headers
Update kernel headers to commit 292eba02dbb4
("net-next/hinic: add arm64 support")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:10:27 -07:00
David Ahern d42c7891d2 utils: Do not reset family for default, any, all addresses
Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.

'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.

Fixes: 93fa12418d ("utils: Always specify family and ->bytelen in get_prefix_1()")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Serhey Popovych <serhe.popovych@gmail.com>
2018-04-16 17:00:48 -07:00
Jakub Sitnicki ee53b42fd8 iproute: Abort if nexthop cannot be parsed
Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:

  # ip -6 route add fd00::1/64 nexthop dev fake1
  Cannot find device "fake1"
  Cannot find device "fake1"
  Cannot find device "fake1"
  ...
  Segmentation fault (core dumped)

Don't ignore errors from the helper routine that parses the nexthop
definition, and abort immediately if parsing fails.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
2018-04-16 16:58:38 -07:00
Roman Mashak 8744c5d338 tc: jsonify ife action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:23:17 -07:00
Roman Mashak 7b17701717 tc: jsonify skbedit action
v2:
   FIxed strings format in print_string()

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:09:16 -07:00
Stephen Hemminger 811ee8943c uapi/sctp: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:50:00 -07:00
Stephen Hemminger b7d3a4f009 uapi/tipc: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:49:41 -07:00
Stephen Hemminger dcf7997bcd uapi/bpf: update kernel header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:48:56 -07:00
Guillaume Nault ef36717816 bridge: fix typo in hairpin error message
No 'g' to hairpin.

Fixes: 64108901b7 ("bridge: Add support for setting bridge port attributes")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-09 11:17:50 -07:00
Roman Mashak 8feb516bfc tc: jsonify tunnel_key action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:33 -07:00
Roman Mashak 1d3c91a7c4 tc: jsonify connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:32 -07:00
Leon Romanovsky 1525942736 rdma: Print net device name and index for RDMA device
The RDMA devices are operated in RoCE and iWARP modes have net device
underneath. Present their names in regular output and their net index
in detailed mode.

[root@nps ~]# rdma link show mlx5_3/1
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7
[root@nps ~]# rdma link show mlx5_3/1 -d
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7 netdev_index 7
    caps: <CM, IP_BASED_GIDS>

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-06 09:02:32 -07:00
David Ahern 09e3802b9a Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-06 09:02:02 -07:00
Guillaume Nault 458539ad35 l2tp: no need to export session offsets in JSON output
The offset and peer_offset parameters are only printed to avoid
confusing external scripts that may parse "ip l2tp show session"
output. There's no reason to keep them in JSON.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
2018-04-05 12:43:23 -07:00
Yuval Mintz 0927bf83e7 tc: Correct json output for actions
Commit 9fd3f0b255 ("tc: enable json output for actions") added JSON
support for tc-actions at the expense of breaking other use cases that
reach tc_print_action(), as the latter don't expect the 'actions' array
to be a new object.

Consider the following taken duringrun of tc_chain.sh selftest,
and see the latter command output is broken:

$ ./tc/tc -j -p actions list action gact | grep -C 3 actions
[ {
        "total acts": 1
    },{
        "actions": [ {
                "order": 0,

$ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
            },
            "skip_hw": true,
            "not_in_hw": true,{
                "actions": [ {
                        "order": 1,
                        "kind": "gact",
                        "control_action": {

Relocate the open/close of the JSON object to declare the object only
for the case that needs it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:41:36 -07:00
Guillaume Nault 2f75c5cf1a ip/l2tp: remove offset and peer-offset options
Ignore options "peer-offset" and "offset" when creating sessions. Keep
them when dumping sessions in order to avoid breaking external scripts.

"peer-offset" has always been a noop in iproute2. "offset" is now
ignored in Linux 4.16 (and was broken before that).

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:41:11 -07:00
Leon Romanovsky fda0a61dde rdma: Ignore unknown netlink attributes
The check if netlink attributes supplied more than maximum supported
is to strict and may lead to backward compatibility issues with old
application with a newer kernel that supports new attribute.

CC: Steve Wise <swise@opengridcomputing.com>
Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:39:58 -07:00
David Ahern 2c62a64d60 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c
	misc/ss.c
	tc/tc.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-02 10:47:34 -07:00
Stephen Hemminger 4b6c4177ee v4.16.0 2018-04-02 10:06:08 -07:00
Jiri Pirko 6b4f03f518 man: fix devlink object list
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-02 09:19:59 -07:00
Stephen Hemminger 200e9d1961 uapi/if_ether: add definition of ether type field
Part of upstream commit
4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-02 09:19:08 -07:00
David Ahern 43eb8728b3 devlink: Print size of -1 as unlimited
(u64)-1  essentially means the size is unlimited. Print as 'unlimited'
as opposed to the current unsigned int range of 4294967295.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-02 07:54:18 -07:00
Roman Mashak 7ada016aeb tc: jsonify sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:44:31 -07:00
Roman Mashak c2f60f5c8e tc: support oneline mode in action generic printer functions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:37:32 -07:00
David Ahern 386e37f543 Merge branch 'rdma-res-tracking' into iproute2-next
Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include dumping of
connection manager id (cm_id), completion queue (cq), memory region (mr),
and protection domain (pd) rdma resources.  It is the user-space part of
the kernel resource tracking series merged into rdma-next for 4.17 [1]
and [2].

Changes since v3:
- replaced rdma_cma.h inclusion with UAPI rdma_user_cm.h
- display only device names instead of device/port for cq, mr, and pd
since they are not associated with a specific port.

Changes since v2:
- pull in rdma-core:include/rdma/rdma_cma.h
- 80 column reformat
- add reviewed-by tags

Changes since v1/RFC:
- removed RFC tag
- initialize rd properly to avoid passing a garbage port number
- revert accidental change to qp_valid_filters
- removed cm_id dev/network/transport types
- cm_id ip addrs now passed up as __kernel_sockaddr_storage
- cm_id ip address ports printed as "address:port" strings
- only parse/display memory keys and iova if available
- filter on "users" for cqs and pds
- fixed memory leaks
- removed PD_FLAGS attribute
- filter on "mrlen" for mrs
- filter on "poll-ctx" for cqs
- don't require addrs or qp_type for parsing cm_ids
- only filter optional attrs if they are present
- remove PGSIZE MR attr to match kernel

[1] https://www.spinics.net/lists/linux-rdma/msg61720.html
[2] https://www.spinics.net/lists/linux-rdma/msg62979.html
    https://www.spinics.net/lists/linux-rdma/msg62980.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:19:21 -07:00
Steve Wise 4060e4c0d2 rdma: Add PD resource tracking information
Sample output:

Without CAP_NET_ADMIN capability:

dev mlx4_0 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 users 0 pid 0 comm [ib_srp]
dev mlx4_0 users 1 pid 0 comm [ib_core]
dev cxgb4_0 users 0 pid 0 comm [ib_srp]

With CAP_NET_ADMIN capability:
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srp]
dev mlx4_0 local_dma_lkey 0x8000 users 1 pid 0 comm [ib_core]
dev cxgb4_0 local_dma_lkey 0x0 users 0 pid 0 comm [ib_srp]

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:19:01 -07:00
Steve Wise 8958a15c04 rdma: Add MR resource tracking information
Sample output:

Without CAP_NET_ADMIN:

$ rdma resource show mr mrlen 65536
dev mlx4_0 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 mrlen 65536 pid 0 comm [nvme_rdma]

With CAP_NET_ADMIN:

# rdma resource show mr mrlen 65536
dev mlx4_0 rkey 0x12702 lkey 0x12702 iova 0x85724a000 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 rkey 0x68fe4e9 lkey 0x68fe4e9 iova 0x835b91000 mrlen 65536 pid 0 comm [nvme_rdma]

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:56 -07:00
Steve Wise b0b8e32cbf rdma: Add CQ resource tracking information
Sample output:

# rdma resource show cq
dev cxgb4_0 cqe 46 users 2 pid 30503 comm rping
dev cxgb4_0 cqe 46 users 2 pid 30498 comm rping
dev mlx4_0 cqe 63 users 2 pid 30494 comm rping
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping
dev mlx4_0 cqe 1023 users 2 poll_ctx WORKQUEUE pid 0 comm [ib_core]

# rdma resource show cq pid 30489
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:51 -07:00
Steve Wise 9a362cc71a rdma: Add CM_ID resource tracking information
Sample output:

# rdma resource
2: cxgb4_0: pd 5 cq 2 qp 2 cm_id 3 mr 7
3: mlx4_0: pd 7 cq 3 qp 3 cm_id 3 mr 7

# rdma resource show cm_id
link cxgb4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link cxgb4_0/2 lqpn 1048 qp-type RC state CONNECT ps TCP pid 30503 comm rping src-addr 172.16.2.1:7174 dst-addr 172.16.2.1:38246
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link mlx4_0/1 lqpn 539 qp-type RC state CONNECT ps TCP pid 30494 comm rping src-addr 172.16.99.1:7174 dst-addr 172.16.99.1:43670
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174

# rdma resource show cm_id dst-port 7174
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:47 -07:00
Steve Wise 80c0478fdf rdma: initialize the rd struct
Initialize the rd struct so port_idx is 0 unless set otherwise.
Otherwise, strict_port queries end up passing an uninitialized PORT
nlattr.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:43 -07:00
Steve Wise 8d61311611 rdma: add UAPI rdma_user_cm.h
This allows parsing rdma_cm_id UAPI values.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:38 -07:00
Steve Wise 29122c1aae rdma: update rdma_netlink.h
Pull in the latest rdma_netlink.h which has support for
the rdma nldev resource tracking objects being added
with this patch series.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:20 -07:00
Roman Mashak 9fd3f0b255 tc: enable json output for actions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:55:17 -07:00
Roman Mashak 6e8634eb13 tc: add oneline mode
Add initial support for oneline mode in tc; actions, filters and qdiscs
will be gradually updated in the follow-up patches.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:18:58 -07:00
David Ahern 8c5bf7f0e6 Merge branch 'tipc-addr' into iproute2-next
Jon Maloy  says:

====================

1: We introduce ability to set/get 128-bit node identities
2: We rename 'net id' to 'cluster id' in the command API,
   of course in a compatible way.
3: We print out all 32-bit node addresses as an integer in hex format,
   i.e., we remove the assumption about an internal structure.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-29 10:50:30 -07:00
Alexander Zubkov c121807250 arrange prefix parsing code after redundant patches
A problem was reported with parsing of prefixes all/any/default.
Commit 7696f1097f fixes the problem,
but there were also other pathces applied:
00b31a6b2e, which were intended to
fix the same problem. And they became redundant now. This patch
reverts changes introduced by those redundant patches.

Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-29 08:42:04 -07:00
Stephen Hemminger 89e3c36b06 namespace: limit the length of namespace name to avoid snprintf overflow
This fixes problem reported by gcc-8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:40:26 -07:00
Stephen Hemminger 08a93b32f5 bpf: avoid compiler warnings about strncpy
Use strlcpy to avoid cases where sizeof(buf) == strlen(buf)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29 08:32:48 -07:00
Stephen Hemminger da8034a019 misc: avoid snprintf warnings in ss and nstat
Gcc 8 checks that target buffer is big enough.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger d5732e3470 ematch: fix possible snprintf overflow
Fixes gcc 8 warning about possible snprint overflow

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger b8a6088e13 tc_class: fix snprintf warning
Size buffer big enough to avoid any possible overflow.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger fcb18aa3d9 tunnel: use strlcpy to avoid strncpy warnings
Fixes warnings about strncpy size by using strlcpy.

tunnel.c: In function ‘tnl_gen_ioctl’:
tunnel.c:145:2: warning: ‘strncpy’ specified bound
 16 equals destination size [-Wstringop-truncation]
  strncpy(ifr.ifr_name, name, IFNAMSIZ);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger fc9d755a3e ip: use strlcpy() to avoid truncation
This fixes gcc-8 warnings about strncpy bounds by using
strlcpy instead.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger 95744efac4 pedit: fix strncpy warning
Newer versions of Gcc warn about string truncation.
Fix by using strlcpy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger 6c6c0291d2 bridge: avoid snprint truncation on time
This fixes new gcc warning about possible string overflow.

mdb.c: In function ‘__print_router_port_stats’:
mdb.c:61:11: warning: ‘%.2i’ directive output may be truncated
 writing between 2 and 7 bytes into a region of size
 between 0 and 4 [-Wformat-truncation=]
      "%4i.%.2i", (int)tv.tv_sec,
           ^~~~
Note: already fixed in iproute2-next.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:27 -07:00
Jon Maloy 5aad0baa3d tipc: change node address printout formats
Since a node address now per definition is only an unstructured 32-bit
integer it makes no sense print it out as a structured string.

In this commit, we replace all occurrences of "<Z.C.N>" printouts with
just an "%x".

Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:41:15 -07:00
Jon Maloy 725ebfbf62 tipc: introduce command for handling a new 128-bit node identity
We add the possibility to set and get a 128 bit node identifier, as
an alternative to the legacy 32-bit node address we are using now.

We also add an option to set and get 'clusterid' in the node. This
is the same as what we have so far called 'netid' and performs the
same operations. For compatibility the old 'netid' commands are
retained, -we just remove them from the help texts.

Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:38:52 -07:00
Stephen Hemminger 98453b6580 ip/l2tp: add JSON support
Convert ip l2tp to use JSON output routines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:37:00 -07:00
Stephen Hemminger 1f483fc618 ip/ila: support json and color
Use json print to enhance ila output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:36:58 -07:00
David Ahern 083d782718 Merge branch 'tipc-stats' into iproute2-next
GhantaKrishnamurthy MohanKrishna
         says:

====================

The following patchset add user space TIPC socket diagnostics support
in ss tool of iproute2. It requires the sock_diag framework
for AF_TIPC support in the kernel, commit id: c30b70deb5f
(tipc: implement socket diagnostics for AF_TIPC).

tipc socket stats are requested with the "--tipc" option. Additional
tipc specific info are requested with "--tipcinfo" option.

This patchset is based on top of iproute2 v4.15.0-100-g4f63187
commitid: f85adc6. It has been co-authored by
Parthasarathy Bhuvaragan.

Example output (the first socket is the internal topology server)

State  Recv-Q  Send-Q     Local Address:Port           Peer Address:Port
UNCONN 0       0               16781313:2809484547                 -             ino:13348 sk:4 users:(("tipc-pipe",pid=292,fd=3))
LISTEN 0       0               16781313:4117673024                 -             ino:13346 sk:5 users:(("tipc-pipe",pid=291,fd=3))
ESTAB  0       0               16781313:484097386          16781313:3203149317   ino:13345 sk:6 users:(("tipc-pipe",pid=294,fd=4))
LISTEN 0       0               16781313:2438310591                 -             ino:13344 sk:7 users:(("tipc-pipe",pid=294,fd=3),("tipc-pipe",pid=290,fd=3))
LISTEN 0       0               16781313:2658440413                 -             ino:12368 sk:3
ESTAB  0       0               16781313:3203149317         16781313:484097386    ino:13349 sk:8 users:(("tipc-pipe",pid=293,fd=3))

State  Recv-Q  Send-Q     Local Address:Port           Peer Address:Port
UNCONN 0       0               16781313:2809484547                 -
type:RDM cong:none  drop:0  publ
LISTEN 0       0               16781313:4117673024                 -
type:SEQPACKET cong:none  drop:0  publ
ESTAB  0       0               16781313:484097386          16781313:3203149317
type:STREAM cong:none  drop:0  via {1000,1000}
LISTEN 0       0               16781313:2438310591                 -
type:STREAM cong:none  drop:0  publ
LISTEN 0       0               16781313:2658440413                 -
type:SEQPACKET cong:none  drop:0  publ
ESTAB  0       0               16781313:3203149317         16781313:484097386
type:STREAM cong:none  drop:0  via {1000,1000}

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:28:58 -07:00
GhantaKrishnamurthy MohanKrishna 5caf79a0bc ss: Add support for TIPC socket diag in ss tool
For iproute 4.x
Allow TIPC socket statistics to be dumped with --tipc
and tipc specific info with --tipcinfo.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:28:06 -07:00
David Ahern 9effc146b7 Update kernel headers
Update kernel headers to commit 5d22d47b9ed9
("Merge branch 'sfc-filter-locking'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:26:25 -07:00
Stephen Hemminger 83b3c60544 rdma: fix man page typos
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-28 11:06:55 -07:00
Phil Sutter 3e1652c94c ss: Drop filter_default_dbs()
Instead call filter_db_parse(..., "all"). This eliminates the duplicate
default DB definition.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Phil Sutter 67d5fd5587 ss: Put filter DB parsing into a separate function
Use a table for database name parsing. The tricky bit is to allow for
association of a (nearly) arbitrary number of DBs with each name.
Luckily the number is not fully arbitrary as there is an upper bound of
MAX_DB items. Since it is not possible to have a variable length
array inside a variable length array, use this knowledge to make the
inner array of fixed length. But since DB values start from zero, an
explicit end entry needs to be present as well, so the inner array has
to be MAX_DB + 1 in size.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Phil Sutter c121111ecb ss: Allow excluding a socket table from being queried
The original problem was that a simple call to 'ss' leads to loading of
sctp_diag kernel module which might not be desired. While searching for
a workaround, it became clear how inconvenient it is to exclude a single
socket table from being queried.

This patch allows to prefix an item passed to '-A' parameter with an
exclamation mark to inverse its meaning.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Roman Mashak d64a22f393 tc: print index, refcnt & bindcnt for nat action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 17:00:32 -07:00
Stephen Hemminger fec62c0ec7 tc: help and whitespace cleanup
Break long lines, and cleanup usage message.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 15:33:13 -07:00
David Ahern 54eae5f76d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 12:33:02 -07:00
Luca Boccassi ba2fc55b99 Drop capabilities if not running ip exec vrf with libcap
ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 11:48:23 -07:00
Phil Sutter b2038cc0b2 ssfilter: Eliminate shift/reduce conflicts
The problematic bit was the 'expr: expr expr' rule. Fix this by making
'expr' token represent a single filter only and introduce a new token
'exprlist' to represent a combination of filters.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 11:41:08 -07:00
Phil Sutter 8ee38d833c man: tc-vlan.8: Fix for incorrect example
This has to be a second match statement to the same u32 filter, not a
second one (which tc-filter doesn't support at all).

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:13:28 -07:00
Jiri Pirko da7a1aa7da devlink: fix port new monitoring message typo
s/net/new/

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:13:09 -07:00
Stefano Brivio 32ea3d54b4 ss: Fix rendering of continuous output (-E, --events)
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).

This happens because I missed this case in 691bd854bf ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.

To fix this, we need to:

- call render() every time we need to display new socket events
  from generic_show_sock(), which is only used to follow events.
  Always call it even if specific socket display functions
  return errors to ensure we clean up buffers

- get the screen width every time we have new events to display,
  thus factor out getting the screen width from main() into a
  function we'll call whenever we calculate columns width

- reset the current field pointer after rendering, more output
  might come after render() is called

Reported-by: Roman Mashak <mrv@mojatatu.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:09:38 -07:00
Phil Sutter 79f49f58aa man: ip-route.8: ssthresh parameter is NUMBER
Synopsis section was inconsistent with regards to help text and later
description of ssthresh parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 09:07:16 -07:00
Roman Mashak 990b1d90d7 tc: print actual action for connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 09:03:15 -07:00
Stephen Hemminger 00b31a6b2e Merge branch 'revert' 2018-03-27 08:58:36 -07:00
Alexander Zubkov 7696f1097f treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-27 08:58:26 -07:00
Roi Dayan 17504be81d tc: Fix compilation error with old iptables
The compat_rev field does not exists in old versions of iptables.
e.g. iptables 1.4.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 06:38:52 -07:00
Leon Romanovsky 2c6962cfaf rdma: Move RDMA UAPI header file to be under RDMA responsibility
In iproute2 package, the updates of UAPIs files are performed
after the needed feature lands in kernel's net-next tree.

Such development flow created delays to the rdma tool developers,
who uses rdma-next tree as a basis for their work.

Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
of syncing this file will be on them.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-26 07:02:19 -07:00
Roopa Prabhu b4f84bf8c9 bridge: add option extern_learn to set NTF_EXT_LEARNED on fdb entries
NTF_EXT_LEARNED can be set by a user on bridge fdb entry.
Provide a bridge command option to allow a user to set
NTF_EXT_LEARNED on a bridge fdb entry.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-20 08:24:51 -07:00
Alexander Zubkov b8d2619989 treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-19 09:17:28 -07:00
Roman Mashak bf7d148803 tc: use get_u32() in psample action to match types
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:50 -07:00
Roman Mashak e9fa16583a tc: print actual action for sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:38 -07:00
Toke Høiland-Jørgensen 997f2dc193 tc: Add JSON output of fq_codel stats
Enable proper JSON output support for fq_codel in `tc -s qdisc` output.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:40 -07:00
Toke Høiland-Jørgensen d7d044ff53 tc: Add missing documentation for codel and fq_codel parameters
Add missing documentation of the memory_limit fq_codel parameter and the
ce_threshold codel and fq_codel parameters.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:35 -07:00
Pieter Jansen van Vuuren fb4e6abfca tc: f_flower: Add support for matching first frag packets
Add matching support for distinguishing between first and later fragmented
packets.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags firstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags nofirstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:03:21 -07:00
David Ahern 4de0a06b34 Update kernel headers
Update kernel headers to commit a870a02cc963
("pktgen: use dynamic allocation for debug print buffer")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:59:59 -07:00
David Ahern e9625d6aea Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c

Updated bridge/bridge.c per removal of check_if_color_enabled by commit
1ca4341d2c ("color: disable color when json output is requested")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:48:10 -07:00
Stephen Hemminger 96303c25ee Revert "iproute: "list/flush/save default" selected all of the routes"
This reverts commit 9135c4d603.

Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-12 14:02:36 -07:00
David Ahern a121129df9 Merge branch 'mcast-json' into iproute2-next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <sthemmin@microsoft.com>

Some more JSON support and report better error if kernel
is configured without multicast.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:53:36 -07:00
Stephen Hemminger e06e9a6bac ipmroute: better error message if no kernel mroute
If kernel does not support the IP multicast address family,
then it will report all routes (PF_UNSPEC).
Give the user a better error message and abort the command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:34 -07:00
Stephen Hemminger 0f1475c268 ipmroute: convert to output JSON
Should be no change for non-json case except putting color
on address if desired.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:23 -07:00
Stephen Hemminger 311dca0aa0 ipmaddr: json and color support
Support printing mulitcast addresses in json and color mode.
Output format is unchanged for normal use.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:06 -07:00
David Ahern bea42e6c24 Merge branch 'iplink-parse' into iproute2-next
Serhey Popovych  says:

====================

This is main routine to parse ip-link(8) configuration parameters.

Move all code related to command line parsing and validation to it from
iptables_modify(). As benefit we reduce number of arguments as well as
checking for most of weired cases in single place to give benefit to
iptables_parse() users.

See individual patch description message for more information.

v4
  Drop patches intended to reduce number of arguments to
  iptables_parse(): postpone to the series with real use cases.

  Save only ifi_index in iplink_vxcan.c and link_veth.c: no need
  to save whole ifinfomsg data structure.

  Note that there is no sense to introduce custom version of
  iplink_parse() to use in iplink_vxcan.c and link_veth.c because
  there is too much parameters we need to support (except VF and
  few others) making huge code duplication.

v3
  Move vxlan/veth ifinfomsg save/restore to separate patch to
  make clear change that perform most of request buffer setups
  and checks in iplink_parse().

  Update commit message descriptions and extra new line from
  "utils: Introduce and use nodev() helper routine" patch.

v2
  Terminate via exit() when failing to parse command line arguments
  to help identify failing line in batch mode.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:46:07 -07:00
Serhey Popovych c58213f69c iplink: Perform most of request buffer setups and checks in iplink_parse()
To benefit other users (e.g. link_veth.c) of iplink_parse() from
additional attribute checks and setups made in iplink_modify(). This
catches most of weired cobination of parameters to peer device
configuration.

Drop @name, @dev, @link, @group and @index from iplink_parse() parameters
list: they are not needed outside.

While there change return -1 to exit(-1) for group parsing errors: we
want to stop further command processing unless -force option is given
to get error line easily.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:59:03 -07:00
Serhey Popovych b06a29603a iplink: Follow documented behaviour when "index" is given
Both ip-link(8) and error message when "index" parameter is given for
set/delete case says that index can only be given during network
device creation.

Follow this documented behaviour and get rid of ambiguous behaviour in
case of both "dev" and "index" specified for ip link delete scenario
(actually "index" being ignored in favor to "dev").

Prohibit "index" when configuring/deleting group of network devices.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:56 -07:00
Serhey Popovych a24315ba46 iplink: Use "dev" and "name" parameters interchangeable when possible
Both of them accept network device name as argument, but have different
meaning:

  dev  - is a device by it's name,
  name - name for specific device.

The only case where they treated separately is network device rename
case where need to specify both ifindex and new name. In rest of the
cases we can assume that dev == name.

With this change we do following:

  1) Kill ambiguity with both "dev" and "name" parameters given the same
     name:

       ip link {add|set} dev veth100a name veth100a ...

  2) Make sure we do not accept "name" more than once.

  3) For VF and XDP treat "name" as "dev". Fail in case of "dev" is
     given after VF and/or XDP parsing.

  4) Make veth and vxcan to accept both "name" and "dev" as their peer
     parameters, effectively following general ip-link(8) utility
     behaviour on link create:

       ip link add {name|dev} veth1a type veth peer {name|dev} veth1b

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:51 -07:00
Serhey Popovych fe99adbca4 utils: Introduce and use nodev() helper routine
There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.

Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.

Use -nodev() in traffic control (tc) code to return 1.

Simplify expression for checking for argument being 0/NULL in @if
statement.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:36 -07:00
Tariq Toukan 527f85141c ip-address: Fix negative prints of large TX rate limits
TX rate limit fields are unsigned (__u32).
Use %u and print_uint when printing.

Tested:
$ ip link set ens1 vf 1 rate 2294967296
$ ip link show |grep -iE "vf 1" | grep rate

before:
vf 1 MAC 00:00:00:00:00:00, tx rate -2000000000 (Mbps), max_tx_rate -2000000000Mbps, ...

after:
vf 1 MAC 00:00:00:00:00:00, tx rate 2294967296 (Mbps), max_tx_rate 2294967296Mbps, ...

Fixes: 3fd8663087 ("iproute2: rework SR-IOV VF support")
Fixes: 8c29ae7cc2 ("ip link: Fix crash on older kernels when show VF dev")
Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Fixes: ae7229d5f9 ("ip: Add support for setting and showing SR-IOV virtual funtion link params")
Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2018-03-10 09:00:27 -08:00
Roopa Prabhu f686f76468 iprule: support for ip_proto, sport and dport match options
add support to match on ip_proto, sport and dport ranges.
For ip_proto, this patch currently enumerates, tcp, udp and sctp.
This list can be extended in the future.

example:
$ip rule add sport 666-777 dport 999 ip_proto tcp table 100
$ip rule show
0:      from all lookup local
32765:  from all ip_proto 6 sport 666-777 dport 999 lookup 100
32766:  from all lookup main
32767:  from all lookup default

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 10:08:18 -08:00
Stephen Hemminger e93d922123 netns: add JSON support
Basic support for JSON output when showing network namespaces.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:53:11 -08:00
David Ahern 8c278ecad0 Update kernel headers to 4.16.0-rc4+
Update kernel headers to commit 08a24239cd46
("Merge branch 'hns3-next'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:34:05 -08:00
Leon Romanovsky f2ffa0a0ff rdma: Update device capabilities flags
In kernel commit e1d2e8873369 ("IB/core: Add PCI write
end padding flags for WQ and QP"), we introduced new
device capability to advertise PCI write end padding.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

This commit updates RDMAtool to present this field.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:15:28 -08:00
Roman Mashak b80c9af8a4 tc: updated tc-bpf man page
Added description of direct-action parameter.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-07 14:55:08 -08:00
David Ahern 8966c2490f Merge branch 'macsec-json' into iproute2-next
Stephen Hemminger  says:

====================

The macsec code didn't really support JSON and had several
pieces of copy/pasted code.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:43:29 -08:00
Stephen Hemminger c0b904de62 macsec: support JSON
The JSON support in macsec code was mostly missing and what was
there was broken. This uses new json_print utilities to complete
output.

Compile tested only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:43 -08:00
Stephen Hemminger d341863839 ipmacsec: collapse common code
Several places copy/paste same code for printing array of statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:39 -08:00
Stephen Hemminger c2f260f4eb ip: macsec cleanup
Break long lines and use const as recommended by checkpatch.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:23 -08:00
David Ahern 65745eae83 Merge branch 'more-json' into iproute2-next
Stephen Hemminger says:

====================

The ip command implementation of JSON was very spotty. Only address
and link were originally implemented. After doing route for next,
went ahead and implemented it for a bunch of the other sub commands.

Hopefully will reach full coverage soon.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:48:22 -08:00
Stephen Hemminger 41b99db1c6 fou: support JSON output
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:34 -08:00
Stephen Hemminger 5c92c2eee5 fou: break long lines
Split up long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:30 -08:00
Stephen Hemminger 689bef5dc9 tuntap: support JSON output
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:25 -08:00
Stephen Hemminger b62ec792a9 token: support JSON
Add JSON output to ip token command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:19 -08:00
Stephen Hemminger 111f79ad38 ipsr: add json support
Add json flag to ip sr command outputs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:14 -08:00
Stephen Hemminger 74498126fd tcp_metrics: add json support
Add JSON support to the ip tcp_metrics output.

$ ip -j -p tcp_metrics show
[ {
        "dst": "192.18.1.11",
        "age": 23617.8,
        "ssthresh": 7,
        "cwnd": 3,
        "rtt": 0.039176,
        "rttvar": 0.039176,
        "source": "192.18.1.2"
    }
...

The JSON output does scale values differently since there is no good
way to indicate units. The rtt values are displayed in seconds in
JSON and microseconds in the original (non JSON) mode. In the example
above the output in without the -j flag, the output would be
 ... rtt 39176us rttvar 39176us

I did this since all the other values in the JSON record are also in
floating point seconds.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:07 -08:00
Stephen Hemminger 8a61d8968c tcp_metrics; make tables const
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:02 -08:00
Stephen Hemminger 96032aaf7d ipnetconf: add JSON support
Basic JSON support for ip netconf command.
Also cleanup some checkpatch warnings about long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:57 -08:00
Stephen Hemminger 3c1e087b05 ipntable: add json support
Add JSON (and limited color) to ip neighbor table parameter output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:50 -08:00
Stephen Hemminger 0dd4ccc56c iprule: add json support
More JSON and colorizing.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:44 -08:00
Stephen Hemminger a7ad1c8a68 ipaddrlabel: add json support
Add missing json and color support to addrlabel display

Example:
$ ip -j -p addrlabel
[ {
        "address": "::1",
        "prefixlen": 128,
        "label": 56
    },{
        "address": "::",
        "prefixlen": 96,
        "label": 56
    },{
...

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:41 -08:00
Stephen Hemminger aac7f725fa ipneigh: add color and json support
Use json_print to provide json (and color) support to
ip neigh command.

Example:
$ ip -j -p neigh
[ {
        "dst": "192.168.1.29",
        "dev": "enp12s0",
        "state": [ "FAILED" ]
    },{
        "dst": "192.168.1.130",
        "dev": "enp12s0",
        "state": [ "FAILED" ]
    },{
        "dst": "192.168.1.131",
        "dev": "enp12s0",
        "lladdr": "00:15:5d:2a:16:4f",
        "state": [ "STALE" ]
    }
...

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:18 -08:00
Stephen Hemminger d9d8c8393e json_writer: add SPDX Identifier (GPL-2/BSD-2)
I wrote this code so put SPDX License on it and intentionally
allow use in BSD code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-06 14:39:19 -08:00
Roman Mashak 9426673910 tc: added tc monitor description in man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-05 15:02:12 -08:00
Davide Caratti 75ef7b18d2 tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:

 Test a6d6: Add skbedit action with index
 Test 38f3: Delete skbedit action
 Test a568: Add action with ife type
 Test b983: Add action without ife type
 Test 7d50: Add skbmod action to set destination mac
 Test 9b29: Add skbmod action to set source mac
 Test e93a: Delete an skbmod action

Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:

 Test 8dd5: Add police action with control ok

tested with:
 # ./tdc.py

test results:
 all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
 action from a list" (which is failing also before this commit)

Fixes: 3572e01a09 ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()")
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:38 -08:00
Jean-Philippe Brucker eb8559eff1 ss: fix NULL dereference when rendering without header
When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.

The bug can be triggered with: ss -H sport = 100000

Signed-off-by: Jean-Philippe Brucker <jphilippe.brucker@gmail.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:31 -08:00
David Ahern 3dec72672f libnetlink: __rtnl_talk_iov should only loop max iovlen times
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.

Fixes: 72a2ff3916 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-02 13:30:34 -08:00
Phil Sutter 06867c3719 ip-link: Fix use after free in nl_get_ll_addr_len()
Immediately after freeing the buffer returned from rtnl_talk(), it is
accessed again via pointer in struct rtattr array. This leads to some
builds not allowing to set an interface's MAC address because the
expected length value is garbage.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-02 13:29:40 -08:00
Joe Stringer a0405444f7 bpf: Print section name when hitting non ld64 issue
It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-02 13:28:53 -08:00
David Ahern 62964f1a95 Merge branch 'ip-rule-proto' into iproute2-next
Donald Sharp  says:

====================

Fix iprule.c to use the actual `struct fib_rule_hdr` and to
allow the end user to see and use the protocol keyword
for rule manipulation.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:45:56 -08:00
Donald Sharp 33f1e250ec ip: Allow rules to accept a specified protocol
Allow the specification of a protocol when the user
adds/modifies/deletes a rule.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:37 -08:00
Donald Sharp 7c083da77c ip: Display ip rule protocol used
Modify 'ip rule' command to notice when the kernel passes
to us the originating protocol.

Add code to allow the `ip rule flush protocol XXX`
command to be accepted and properly handled.

Modify the documentation to reflect these code changes.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:29 -08:00
Donald Sharp 5baaf07cb3 ip: Use the `struct fib_rule_hdr` for rules
The iprule.c code was using `struct rtmsg` as the data
type to pass into the kernel for the netlink message.
While 'struct rtmsg' and `struct fib_rule_hdr` are
the same size and mostly the same, we should use
the correct data structure.  This commit translates
the data structures to have iprule.c use the correct
one.

Additionally copy over the modified fib_rules.h file

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:21 -08:00
Arkadi Sharshevsky f85adc61dd devlink: Fix error reporting
The current code doesn't set errno in case of extended ack.

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-28 16:10:32 -08:00
David Ahern 7c6e942e84 Merge branch 'tc-ipt-ematch' into iproute2-next
Eyal Birger  says:

====================

This patchset extends tc to support the ipt ematch.

The first patch adds the ability for ematch cmdline parsers
to receive argc,argv parameters.
The second patch adds the em_ipt module.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:44:33 -08:00
Eyal Birger dd29621578 tc: add em_ipt ematch for calling xtables matches from tc matching context
The commit calls a new tc ematch for using netfilter xtable matches.

This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.

Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).

The matcher uses libxtables for parsing the input parameters.

Example use for matching an IPSec state with reqid 1:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
    basic match 'ipt(-m policy --dir in --pol ipsec --reqid 1)' \
    action drop

This is the user-space counter part of kernel commit ccc007e4a746
("net: sched: add em_ipt ematch for calling xtables matches")

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:16 -08:00
Eyal Birger 526862038e tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
ematche uses YACC to parse ematch arguments and places them in struct bstr
linked lists.

It is useful to be able to receive parameters as argc,argv in order to use
getopt (and alike) argument parsers.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:06 -08:00
David Ahern cb4ade6e38 Import tc_em_ipt.h from kernel at commit 08009a760213
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:42:23 -08:00
David Ahern 02ffee14ae Update kernel headers to 08009a760213
Update kernel headers to commit 08009a760213
("net: make kmem caches as __ro_after_init")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-26 13:24:38 -08:00
Sabrina Dubroca 7ba0a77b7e ip link: add json support for tun attributes
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 118eda77d6 ("ip link: add support to display extended tun attributes")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-26 09:28:16 -08:00
Petr Machata f798a8ab52 ip: link_gre6.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
For IP-in-IP tunnels, one can specify the [no]allow-localremote command
when configuring a device. Under the hood, this flips the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on the netdevice. However, ip6gretap
and ip6erspan devices, where the flag is also relevant, are not IP-in-IP
tunnels, and thus there's no way to configure the flag on these
netdevices. Therefore introduce the command to link_gre6 as well.

The original support was introduced in commit 21440d19d9
("ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag")

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-25 19:45:39 -08:00
Donald Sharp 728eb8d00b ip: Properly display AF_BRIDGE address information for neighbor events
The vxlan driver when a neighbor add/delete event occurs sends
NDA_DST filled with a union:

union vxlan_addr {
	struct sockaddr_in sin;
	struct sockaddr_in6 sin6;
	struct sockaddr sa;
};

This eventually calls rt_addr_n2a_r which had no handler for the
AF_BRIDGE family and "???" was being printed.

Add code to properly display this data when requested.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 11:27:09 -08:00
Leon Romanovsky 4ac152d003 rdma: Avoid memory leak for skipper resource
The call to get_task_name() allocates memory which is not freed
in case of skipping the object.

Fixes: 8ecac46a60 ("rdma: Add QP resource tracking information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:42:28 -08:00
Arkadi Sharshevsky 58b48c5d75 devlink: Update man pages and add resource man
Add resource man, and update dev manual for reload command.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky ead180274c devlink: Add support for resource/dpipe relation
Dpipe - Each dpipe table can have one resource which is mapped to it.
The resource is presented via its full path. Furthermore, the number
of units consumed by single table entry is presented.

Resource - Each resource presents the dpipe tables that use it.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 06a2cda9b0 devlink: Move dpipe context from heap to stack
Move dpipe context to stack instead of dynamically.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 06dd94f952 devlink: Add support for hot reload
Add support for hot reload. It should be used in order for resource
updates to take place.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 8cd6440958 devlink: Add support for devlink resource abstraction
Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.

First the resources exposed by the driver can be observed, for example:

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry
    resources:
      name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
      name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
      name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

Some resource's size can be changed. Examples:

$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368

The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry size_valid false
  resources:
    name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
    name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
    name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).

In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 049c58539f devlink: mnlg: Add support for extended ack
Add support for extended ack.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 844646a528 devlink: Change empty line indication with indentations
Currently multi-line objects are separated by new-lines. This patch
changes this behavior by using indentations for separation.

Signed-off-by: Arkadi Sharhsevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Masatake YAMATO 97352f1b33 ss: prepare rth when killing inet sock
kill_inet_sock() expects rhn_handle instance is passed
via inet_diag_arg argument. However on the following calling path:

    generic_show_sock
    => show_one_inet_sock
       => kill_inet_sock

rth field of inet_diag_arg is not filled with the address of
rhn_handle instance. As the result ss crashes.

This commit fills the field with newly created rhn_handle
instance.

Changes in v2:
Instead of creating rtn_handle instances for each socket, create
one in upper layer and reuse it.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:32:39 -08:00
Quentin Monnet a883dd8b06 README: re-add updated information link
The "Information" link was removed from README file in commit
d7843207e6 ("README: update location of git repositories, remove
broken info link"), because it redirected to a page that no longer
existed on the Linux Foundation wiki.

This page has just been restored, so we can add the link back again.
Since the previous link was a redirection, use the updated link instead.

Thanks to Luca Boccassi for investigating this issue, restoring and
updating the page.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-02-23 08:19:38 -08:00
Vincent Bernat 1ca4341d2c color: disable color when json output is requested
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:33 -08:00
Adam Vyskovsky 2fb854d07c tc: fix an off-by-one error while printing tc actions
The tc_print_action() function did not print all tc actions
when e.g. TCA_ACT_MAX_PRIO actions were defined for a single
tc filter.

Signed-off-by: Adam Vyskovsky <adamvyskovsky@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:29 -08:00
Timothy Redaelli 7bdd623948 bridge: Prevent a double space in bridge mdb show
Prevent a double space in "bridge mdb show" when the MDB entry is not
marked as "offload".

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:18 -08:00
Lubomir Rintel 8f0807023d lib/namespace: don't try to mount rw /sys over a ro one
It will fail with EPERM on Linux 4.15.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:06 -08:00
Roopa Prabhu 430e05d33f ss: print skmeminfo for packet sockets
before:
$ss --packet -p -m
p_raw    0          0                            *:eth0
          users:(("lldpd",pid=2240,fd=11))

after:
$ss --packet -p -m
p_raw    0          0                            *:eth0
          users:(("lldpd",pid=2240,fd=11))
          skmem:(r0,rb266240,t0,tb266240,f0,w0,o320,bl0,d0)

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:45:27 -08:00
Leon Romanovsky 486fe5f03c rdma: Add batch command support
Implement an option (-b) to execute RDMAtool commands
from supplied file. This follows the same model as
in use for ip and devlink tools, by expecting
every new command to be on new line.

These commands are expected to be without any -*
(e.g. -d, -j, e.t.c) global flags, which should be
called externally.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:44:46 -08:00
David Ahern 472e59b0eb Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:43:33 -08:00
Stephen Hemminger 2d165c0811 tc: implement color output
Implement the -color option; in this case -co is ambiguous
since it was already used for -conf.
For now this just means putting device name in color.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 09:12:28 -08:00
David Ahern 14f2124a34 Merge branch 'bridge-color-json' into next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <sthemmin@microsoft.com>

This set of patches adds color and full JSON support to bridge command.

The output format for bridge link command changes so that
  $ bridge link show
and
  $ ip link show
use same basic format.

The "-c" flag to bridge changes from shortened form of "-compressvlan"
to shortened form of "-color".  Once again this is so that ip
and bridge command take similar options.

Lastly the JSON output format changes slightly but this
could not impact any real user, because in several cases
the current format was invalid JSON!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:47:03 -08:00
Stephen Hemminger 4328b687b4 ip: always print interface name in color
Even in brief mode the interface name should be printed
in color if desired. This makes output consistent across
regular and brief mode.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:42:04 -08:00
Stephen Hemminger 3a1ca9a5b6 bridge: update man page for new color and json changes
Document color option, and no longer have restriction on json

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:42:02 -08:00
Stephen Hemminger f32e4977dc bridge: add json support for link command
Add json output for bridge link show command and reuse code
from ip command to display interface information.

This also changes the output format slightly for the non JSON case so
that it has same format as the ip link show command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:38 -08:00
Stephen Hemminger c7c1a1ef51 bridge: colorize output and use JSON print library
Use new functions from json_print to simplify code.
Provide standard flag for colorizing output.

The shortened -c flag is ambiguous it could mean color or
compressvlan; it is now changed to mean color for consistency
with other iproute2 commands.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:31 -08:00
Stephen Hemminger 01842eb581 bridge: implement json pretty print flag
Make bridge work like other iproute2 commands and accept
same json and pretty flags.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:28 -08:00
Stephen Hemminger 6bfa7a6b0e ip: remove dead code
Remove long dead code (in #if 0) from original iproute2
for numeric names.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-20 16:01:46 -08:00
Stephen Hemminger b68b361b4b ip: don't colorize the master device
Putting whole string "master eth0" in the interface name color
is wrong and confusing. Let's just turn color off for all attributes
of device.

Fixes: d92cc2d087 ("ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-20 12:16:42 -08:00
Stephen Hemminger a8beadb5f6 uapi: update if_ether compat headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-20 10:48:32 -08:00
Sabrina Dubroca 118eda77d6 ip link: add support to display extended tun attributes
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-20 07:19:22 -08:00
David Ahern 07ed8df604 Update kernel headers to 4.16.0-rc2+
Update kernel headers to commit f5c0c6f4299f
("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-19 20:06:04 -08:00
David Ahern 34894a7b96 Merge branch 'print_linkinfo_brief' into next
Serhey Popovych  says:

====================

With this series I propose to make print_linkinfo_brief() static in
favor of print_linkinfo() as single point for linkinfo printing.

Changes presented with this series tested using following script:

\#!/bin/bash

iproute2_dir="$1"
iface='eth0.2'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
	DIR="/tmp/$i"
	mkdir -p "$DIR"

	ln -snf ip.$i ip/ip

	# normal
	ip/ip link show                  >"$DIR/ip-link-show"
	ip/ip -4 addr show               >"$DIR/ip-4-addr-show"
	ip/ip -6 addr show               >"$DIR/ip-6-addr-show"
	ip/ip addr show dev "$iface"     >"$DIR/ip-addr-show-$iface"

	# brief
	ip/ip -br link show              >"$DIR/ip-br-link-show"
	ip/ip -br -4 addr show           >"$DIR/ip-br-4-addr-show"
	ip/ip -br -6 addr show           >"$DIR/ip-br-6-addr-show"
	ip/ip -br addr show dev "$iface" >"$DIR/ip-br-addr-show-$iface"
done
rm -f ip/ip

diff -urN /tmp/{old,new} |sed -n -Ee'/^(-{3}|\+{3})[[:space:]]+/!p'
rc=$?

popd &>/dev/null
exit $rc

Expected results : <no output>
Actual results   : <no output>

Although test coverage is far from ideal in my opinion it covers most
important aspects of the changes presented by the series.

All this work is done in prepare of iplink_get() enhancements to support
attribute parse that finally will be used to simplify ip/tunnel
RTM_GETLINK code.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:49 -08:00
Serhey Popovych c956e9a934 ipaddress: Make print_linkinfo_brief() static
It shares lot of code with print_linkinfo(): drop duplicated part,
change parameters list, make it static and call from print_linkinfo()
after common path.

While there move SPRINT_BUF() to the function scope from blocks to
avoid duplication and use "%s" to print "\n" to help compiler optimize
exit for both print_linkinfo_brief() and normal paths.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:25 -08:00
Serhey Popovych f5b50a18ae utils: Introduce and use print_name_and_link() to print name@link
There is at least three places implementing same things: two in
ipaddress.c print_linkinfo() & print_linkinfo_brief() and one in
bridge/link.c.

They are diverge from each other very little: bridge/link.c does not
support JSON output at the moment and print_linkinfo_brief() does not
handle IFLA_LINK_NETNS case.

Introduce and use print_name_and_link() routine to handle name@link
output in all possible variations; respect IFLA_LINK_NETNS attribute to
handle case when link is in different namespace; use ll_idx_n2a() for
interface name instead of "<nil>" to share logic with other code (e.g.
ll_name_to_index() and ll_index_to_name()) supporting such template.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:22 -08:00
Serhey Popovych fcac966526 utils: Introduce and use get_ifname_rta()
Be consistent in handling of IFLA_IFNAME attribute in all places: if
there is no attribute report bug to stderr and use ll_idx_n2a() as
last measure to get name in "if%u" format instead of "<nil>".

Use check_ifname() to validate network device name: this catches both
unexpected return from kernel and ll_idx_n2a().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:20 -08:00
Serhey Popovych 0cec58dac4 lib: Correct object file dependencies
Neither internal libnetlink nor libgenl depends on ll_map.o: prepare for
upcoming changes that brings much more cleaner dependency between
utils.o and ll_map.o.

However ll_map.o depends on libnetlink.o functions so we need to provide
libnetlink.a after libutil.a in LIBNETLINK at global Makefile.

Tested using make clean && make -j4. No problems so far.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:18 -08:00
Serhey Popovych 1bccd1e43b ipaddress: Simplify print_linkinfo_brief() and it's usage
Simplify calling code in ipaddr_list_flush_or_save() by introducing
intermediate variable of @struct nlmsghdr, drop duplicated code:
print_linkinfo_brief() never returns values other than <= 0 so we can
move print_selected_addrinfo() outside of each block.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:16 -08:00
Serhey Popovych 9516823051 ipaddress: Improve print_linkinfo()
There are few places to improve:

  1) return -1 when entry is filtered instead of zero, which means
     accept entry: ipaddress_list_flush_or_save() the only user of this

  2) use ll_idx_n2a() as last resort to translate name to index for
     "should never happen" cases when cache shouldn't be considered

  3) replace open coded access to IFLA_IFNAME attribute data by
     RTA_DATA() with rta_getattr_str()

  4) simplify ifname printing since name is never NULL, thanks to (2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:15 -08:00
Serhey Popovych fe269b6e7c utils: Reimplement ll_idx_n2a() and introduce ll_idx_a2n()
Now all users of ll_idx_n2a() replaced with ll_index_to_name() we can
move it's functionality to ll_index_to_name() and implement index to
name conversion using snprintf() and "if%u".

Use %u specifier in "if%..." template consistently: network device
indexes are always greather than zero.

Also introduce ll_idx_n2a() conterpart: ll_idx_a2n() that is used
to translate name of the "if%u" form to index using sscanf().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:13 -08:00
Serhey Popovych d92cc2d087 ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()
There is no reentrancy as well as deferred result usage for all cases
where ll_idx_n2a() being used: it is safe to use ll_index_to_name() that
internally calls ll_idx_n2a() with static buffer to hold result.

While there print master network device name using correct color.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:11 -08:00
Serhey Popovych 17df3d607d ipaddress: Abstract IFA_LABEL matching code
There at least two places in ip/ipaddress.c where we match IFA_LABEL
against filter.label if that is given.

Get rid of "common" if () statement for inet_addr_match_rta() and
ifa_label_match_rta(): it is not common because first will check for
filter.pfx.family != AF_UNSPEC inside and second for filter.label being
non NULL.

This allows us to further simplify down code and prepare for
ll_idx_n2a() replacement with ll_index_to_name() without 80 columns
checkpatch notice.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:09 -08:00
Serhey Popovych 5433656705 ip: Use single variable to represent -pretty
After commit a233caa0aa ("json: make pretty printing optional") I get
following build failure:

    LINK     rtmon
    ../lib/libutil.a(json_print.o): In function `new_json_obj':
    json_print.c:(.text+0x35): undefined reference to `show_pretty'
    collect2: error: ld returned 1 exit status
    make[1]: *** [rtmon] Error 1
    make: *** [all] Error 2

It is caused by missing show_pretty variable in rtmon.

On the other hand tc/tc.c there are two distinct variables and single
matches() call that handles -pretty option thus setting show_pretty
will never happen. Note that since commit 44dcfe8201 ("Change
formatting of u32 back to default") show_pretty is used in tc/f_u32.c
so this is first place where -pretty introduced.

Furthermore other utilities like misc/ifstat.c and misc/nstat.c define
pretty variable, however only for their own purposes. They both support
JSON output and thus depend show_pretty in new_json_obj().

Assuming above use common variable to represent -pretty option, define
it in utils.c and declare in utils.h that is commonly used. Replace
show_pretty with pretty.

Fixes: a233caa0aa ("json: make pretty printing optional")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:13:36 -08:00
David Ahern 3ce21b2d84 Merge branch 'unify-tunnel-endpoint-parsing' into next
Serhey Popovych  says:

====================

Use get_addr_rta() helper to unify address retriveal from netlink
message when configuring tunnel and get_addr() to parse endpoint
address into @inet_prefix.

This is next step towards ip and ipv6 tunnel module merge: endpoint
address parsing code will differ only in @family constant being
passed to get_addr_rta() and get_addr().

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:02:12 -08:00
Serhey Popovych 7bda9fd37a iptnl/ip6tnl: Unify local/remote endpoint and 6rd address parsing
We are going to merge link_iptnl.c and link_ip6tnl.c and this is final
step to make their diffs clear and show what needs to be changed during
merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Make sure we initialize ip6rdprefix and ip6rdrelayprefix bitlen in
link_iptnl.c only when configuring existing tunnel: if kernel does not
submit prefixlen in corresponding attributes preceeding get_addr_rta()
will set bitlen to -1 which is incorrect value.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:01:07 -08:00
Serhey Popovych a066cc6623 gre/gre6: Unify local/remote endpoint address parsing
We are going to merge link_gre.c and link_gre6.c and this is final step
to make their diffs clear and show what needs to be changed during merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:01:03 -08:00
Serhey Popovych 3d4e0db65c vti/vti6: Unify local/remote endpoint address parsing
We are going to merge link_vti.c and link_vti6.c and this is final step
to make their diffs clear and show what needs to be changed during merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:00:46 -08:00
Serhey Popovych 9cc173d485 utils: Introduce and use inet_prefix_reset()
Initializing @inet_prefix using C initializers or memset() seems
inefficient and unnecessary: only small part of ->data[] field will be
used to store address corresponding to ->family.

Instead initialize ->flags with zero and assume no other fields accessed
before checking corresponding bits in ->flags. For example special
helpers (e.g. is_addrtype_*()) can be used to ensure that @inet_prefix
contains valid ip or ipv6 address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:00:26 -08:00
Phil Sutter 8a237420f2 Remove leftovers from removed Latex documentation
Since there is no documentation in Latex format left, there is no need
to check for commands to build it. Also there is no need to ignore any
of the temporary files which were created by them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-02-13 16:43:19 -08:00
Quentin Monnet d7843207e6 README: update location of git repositories, remove broken info link
Reflect the recent change of location for the git repositories, and the
creation of the -next development repo, in README and README.devel.

Also remove the link to the Linux Foundation wiki that contained
information about iproute2. The link is now broken, I did not find any
alternative page to point to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
2018-02-13 16:42:51 -08:00
Stephen Hemminger 766fa4ac33 include: update rdma header from 4.16-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-13 16:42:00 -08:00
David Ahern 5a809da4f5 Merge branch 'iproute-json-color' into next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <stephen@networkplumber.org>

This set of patches adds JSON output to route printing.
Tested for the simple cases, but there are many variations and there
such as lw tunnels which have not be tested.

The color formatting may need some additional tweaks. It looks
like for some tags the tag is also showing up in color.
This should be fixed in print_color_string rather than having
to do special case handling in so many places.

This patchset also changes the default JSON output to be compressed
(since the purpose of JSON is to make output machine readable);
but do optional pretty print formatting with -p flag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:24:57 -08:00
Stephen Hemminger 663c3cb231 iproute: implement JSON and color output
Add JSON and color output formatting to ip route command.
Similar to existing address and link output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:20:39 -08:00
Stephen Hemminger 6cbd9465bc json: fix newline at end of array
The json print library was toggling pretty print at the end of
an array to workaround a bug in underlying json_writer.
Instead, just fix json_writer to pretty print array correctly.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:18:49 -08:00
Stephen Hemminger bff0f25241 man: add documentation for json and pretty flags
Add description for -json and -pretty options.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:16:14 -08:00
Stephen Hemminger a233caa0aa json: make pretty printing optional
Since JSON is intended for programmatic consumption, it makes
sense for the default output format to be concise as possible.

For programmer and other uses, it is helpful to keep the pretty
whitespace format; therefore enable it with -p flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:15:08 -08:00
Serhey Popovych db2b8b6ef0 ip: Use print_0xhex() where appropriate
In gre/gre6 for non-JSON output 0x%x format is used: use print_0xhex()
to get the same value for JSON.

Get rid of custom _print_hex() in bridge slave code: print_0xhex() can
be used perfectly.

Break long print_uint() with long argument list to fit into 80 columns.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:05:30 -08:00
Serhey Popovych 5bd937957d ip/tunnel: Minor cleanups
Few minor changes to reduce diffs between ip and ipv6 tunnel code:

  1) reduce intendation by one level when adding attributes in gre and
     gre6; reorder addattr*() calls to simplify diff

  2) reorder local variables definition; change their type (e.g. for
     IFLA_LINK) to match ones returned by rta_getattr_*()

  3) move "mode" parameter parsing in link_iptnl.c to the similar
     position as in link_ip6tnl.c

  4) handle "tc" as shortcut for "tclass"/"tos" in link_iptnl.c

  5) add whitespace where required

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:05:18 -08:00
David Ahern 8e7548462e Merge branch 'unify_tunnel_help' into next
Serhey Popovych  says:

====================

To show only relevant diffs of ip and ipv6 variants help message print
routines needs to be unified and improved.

Get rid of print_usage() and usage() wrappers: use single function to
output help message. As side effect we return -1 from parse function
instead of calling exit(2) in case of "... tunnel <help|garbage>" is
found.

Additionally we get pointer to @struct link_util and can directly access
->id information to prepare customized help message.

Split calls to fprintf() two group: one that contains format string with
specifiers (thus requiring parameters) and another one that does not.
This helps compiler to optimize calls to fprintf() with fputs() when no
format specifiers in string. Do not use fputs() directly to keep code
formatting nice.

After this series applied following diffs:

  # diff -urN ip/link_gre{,6}.c
  # diff -urN ip/link_vti{,6}.c
  # diff -urN ip/link_ip{,6}tnl.c

in scope of help print routines reduced to necessary minimum.

Tested minimally by compiling and executing "ip link help <kind>" and
"ip link add type help" commands. Looks correct.

See individual patch description for more information.

Reviews, commands and suggestions are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:39 -08:00
Serhey Popovych 06e3975f4c iptnl/ip6tnl: Unify iptunnel_print_help()
Reduce diff lines between iptnl and ip6tnl help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to iptunnel_print_help().

Get rid of custom print_usage() and usage() functions and use
iptunnel_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:23 -08:00
Serhey Popovych ae91205c4d gre/gre6: Unify gre_print_help()
Reduce diff lines between gre and gre6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to gre_print_help().

Get rid of custom print_usage() and usage() functions and use
gre_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:19 -08:00
Serhey Popovych 4aa552eac1 vti/vti6: Unify vti_print_help()
Reduce diff lines between vti and vti6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to vti_print_help().

Get rid of custom print_usage() and usage() functions and use
vti_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:05 -08:00
Christian Brauner 375d51caaa netns: allow negative nsid
If the kernel receives a negative nsid it will automatically assign
the next available nsid. In this case alloc_netid() will set min and
max to 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret
this as the maximum range, i.e. specific to nsids it will try to find
an id in the range [0,INT_MAX). This is intentionally supported in the
kernel for nsids.

Commit acbe9118ce ("ip netns: use strtol() instead of atoi()")
regressed ip netns in that respect although previously the use-case
was either accidentally supported or opaquely supported such that it
triggered the original commit. From what I can gather it went as
follows before: atoi() was called with a string indicating a negative
value which caused it to return -1 which was passed to the
kernel. Let's make it less opaque by introducing the keyword "auto":

ip netns set <netns-name> auto

will cause nsid to be set to -1 and the kernel will select an available
nsid.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-08 07:57:34 -08:00
Stephen Hemminger 432e5f97be iproute: make flush a separate function
Minor refactoring to move flush into separate function to improve
readability and reduce depth of nesting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:23 -08:00
Stephen Hemminger cc5608250a iproute: don't do assignment in condition
Fix checkpatch complaints about assignment in conditions.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:19 -08:00
Stephen Hemminger 80d1c528b9 iproute: whitespace fixes
Add whitespace around operators for consistency.
Use tabs for indentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:11 -08:00
David Ahern 58cf7b6759 Merge branch 'dev_walk' into iproute2-next
Serhey Popovych  says:

====================

In this seris I replace /proc/net/dev and /sys/class/net usage for walk
through network device list in iptunnel/ip6tunnel and iptuntap with
netlink dump.

Following changed since RFC was sent:

  1) Treat @struct rtnl_link_stats and @struct rtnl_link_stats64 as
     array with __u32 and __u64 elements respectively in
     copy_rtnl_link_stats64() as suggested by Stephen Hemminger.

  2) Remove @name and @size parameters from @struct tnl_print_nlmsg_info
     since we can get them easily from other data.

Testing.
========

Following script is used to ensure I didn't broke things too much:

\#!/bin/bash

iproute2_dir="$1"
iface='gre1'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
	DIR="/tmp/$i"
	mkdir -p "$DIR"

	ln -snf ip.$i ip/ip

	for o in '' -s -d; do
		ip/ip $o tunnel show           >"$DIR/ip${o}-tunnel-show"
		ip/ip -4 $o tunnel show        >"$DIR/ip-4${o}-tunnel-show"
		ip/ip -6 $o tunnel show        >"$DIR/ip-6${o}-tunnel-show"
		ip/ip $o tunnel show dev "$iface" \
			>"$DIR/ip${o}-tunnel-show-$iface"
		ip/ip $o tuntap show           >"$DIR/ip${o}-tuntap-show"
	done
done
rm -f ip/ip

diff -urN /tmp/{old,new} |sed -n -Ee'/^(-{3}|\+{3})[[:space:]]+/!p'
rc=$?

popd &>/dev/null
exit $rc

Results:
========

...
fopen /sys/class/net/ipip1/tun_flags: No such file or directory
fopen /sys/class/net/ipip2/tun_flags: No such file or directory
fopen /sys/class/net/gre10/tun_flags: No such file or directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note that this comes from ip.old
...
diff -urN /tmp/old/ip-d-tuntap-show /tmp/new/ip-d-tuntap-show
@@ -1,4 +1,4 @@
-tun1: tap user 1004 group 27
-	Attached to processes:
 tun0: tun user 1000 group 27
 	Attached to processes:
+tun1: tap user 1004 group 27
+	Attached to processes:
diff -urN /tmp/old/ip-s-tuntap-show /tmp/new/ip-s-tuntap-show
@@ -1,2 +1,2 @@
-tun1: tap user 1004 group 27
 tun0: tun user 1000 group 27
+tun1: tap user 1004 group 27
diff -urN /tmp/old/ip-tuntap-show /tmp/new/ip-tuntap-show
@@ -1,2 +1,2 @@
-tun1: tap user 1004 group 27
 tun0: tun user 1000 group 27
+tun1: tap user 1004 group 27

So basically only print order for ip tuntap get changes. Rest is intact.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:19:12 -08:00
Serhey Popovych 8960d45fba tuntap: Use netlink to walk through tuntap list
It seems bad idea to depend on sysfs being mounted and reflected to the
current network namespace. Same applies to procfs.

Instead netlink should be used to talk to the kernel and get list of
specific network devices among with their parameters.

Support for kernel netlink message filtering by passing IFLA_INFO_KIND
in RTM_GETLINK request: if kernel does not support filtering by the kind
we will check it in reply anyway. Check for ifi->ifi_type to be either
ARPHRD_NONE or ARPHRD_ETHER to seed up things a bit without kernel level
filtering.

Unfortunately tun driver does not implement dumping it's configuration
via netlink and we still need to use read_prop() which depends on sysfs
to get additional tun device information.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:47 -08:00
Serhey Popovych 3e95393871 iptunnel/ip6tunnel: Use netlink to walk through tunnels list
Both tunnels use legacy /proc/net/dev interface to get tunnel device and
it's statistics. This may cause problems for cases when procfs either
not mounted or not unshare(2)d for given network namespace.

Use netlink to walk through list of tunnel devices which is network
namespace aware and provides additional information such as statistics
in the dump message.

Since both address family specific variants of do_tunnels_list() nearly
the same, except for tunnel parameters structure initialization,
matching and printing we can introduce common one in tunnel.c.

To implement address family specific parts introduce new data structure
@struct tnl_print_nlmsg_info what contains all necessary information as
well as pointers to ->init(), ->match() and ->print() callbacks.

Annotate data structures by const where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:42 -08:00
Serhey Popovych ec89594080 iptunnel/ip6tunnel: Code cleanups
Use switch () instead of if () to compare tunnel type to fit into 80
columns and make code more readable. Print "\n" using fputc().

In iptunnel.c abstract tunnel parameters matching code in iptunnel.c
into ip_tunnel_parm_match() helper to conform with ip6tunnel.c. Use
memset() to initialize @p1.

In ip6tunnel.c no need to call ll_name_to_index() with name twice: just
use found previously index. Do not initialize @p1: this is done in
ip6_tnl_parm_init().

This is to show real differences between ip and ipv6 do_tunnels_list()
implementations and prepare for upcoming unification of them.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:37 -08:00
Serhey Popovych affb361785 tunnel: Split statistic getting and printing
This is first step to move tunnel code to use rtnl dump interface
instead of /proc/net/dev read.

Make tnl_print_stats() to accept @struct rtnl_link_stats64 parameter,
introduce tnl_get_stats() that will parse line from /proc/net/dev into
@struct rtnl_link_stats64.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:33 -08:00
Serhey Popovych 9a7bd5442b ip: Introduce get_rtnl_link_stats_rta() to get link statistics
Assume all statistics in ip(8) represented either by IFLA_STATS64 or
IFLA_STATS is 64 bit. It is clean that we can store __u32 counters of
@struct rtnl_link_stats in __u64 counters in @struct rtnl_link_stats64.

New get_rtnl_link_stats_rta() follows __print_link_stats() behaviour on
handling of stats attribute: copy no more than size of data structure
and no less than attribute length zeroing rest.

Drop print_link_stats32() as it's functionality can be handled by 64bit
variant. Move code from __print_link_stats() to print_link_stats64() and
finally rename print_link_stats64() to __print_link_stats().

More users of introduced function will come in future.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:28 -08:00
Serhey Popovych fb7b816827 ipaddress: Unify print_link_stats() and print_link_stats64()
To show real differences between these two variants adjust whitespace
intendation and use print_uint() instead of print_int() as all members
in both @struct rtnl_link_stats and @struct rtnl_link_stats64 are
unsigned.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:04 -08:00
David Ahern 306d616093 Merge branch 'route_print_refactor' into iproute2-next
Stephen Hemminger  says:

====================

This patch set breaks up the big print_route function into
smaller pieces for readability and to make later changes
to support JSON and color output easier.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:12:52 -08:00
Stephen Hemminger 1506d8d3f8 iproute: refactor printing of interface
For JSON and colorization, make common code a function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:44 -08:00
Stephen Hemminger f48e14880a iproute: refactor multipath print
Make printing of multipath attributes a function to improve
readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:39 -08:00
Stephen Hemminger a3484a9f20 iproute: refactor newdst, gateway and via printing
Since these fields are printed in both route and multipath case;
avoid duplicating code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:34 -08:00
Stephen Hemminger 5782965a1e iproute: refactor printing flow info
Use common code for printing flow info.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:28 -08:00
Stephen Hemminger 968272e791 iproute: refactor metrics print
Make a separate function to improve readability and enable
easier JSON conversion.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:22 -08:00
Stephen Hemminger 6e41810e1b iproute: refactor cacheinfo printing
Make common function for decoding cacheinfo.
This code may print more info than old version in some cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:16 -08:00
Stephen Hemminger daf30f6fde iproute: make printing IPv4 cache flags a function
More refactoring prior to JSON support.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:10 -08:00
Stephen Hemminger 8cfc2d4739 iproute: make printing icmpv6 a function
Refactor to reduce size of print_route and improve
readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:04 -08:00
Stephen Hemminger b3ab1e68e7 iproute: refactor printing flags
Both next hop and route need to decode flags.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:07:44 -08:00
Leon Romanovsky 5f8265536f rdma: Check return value of strdup call
Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 860676b424 rdma: Document resource tracking
Spartan version of resource tracking documentation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 8ecac46a60 rdma: Add QP resource tracking information
This patch adds ss-similar interface to view various resource
tracked objects. At this stage, only QP is presented.

1. Get all QPs for the specific device:
$ rdma res show qp link mlx5_4
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

$ rdma res show qp link mlx5_4/
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

2. Provide illegal port number (0 is illegal):
$ rdma res show qp link mlx5_4/0
Wrong device name

3. Get QPs of specific port:
$ rdma res show qp link mlx5_4/1
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

4. Get QPs which have not assigned port yet:
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]

5. Limit to specific Local QPNs:
$ rdma res show qp link mlx5_4/1 lqpn 1-3,7
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

. Filter types (strings):
$ rdma res show qp link mlx5_4/1 type UD,gSi
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 923aa825ff rdma: Add resource tracking summary
The global resource summary information. The object names, current utilization
and maximum numbers are received as is from the kernel.

$ rdma res
1: mlx5_0: pd 3 cq 5 qp 4
2: mlx5_1: pd 3 cq 5 qp 4
3: mlx5_2: pd 3 cq 5 qp 4
4: mlx5_3: pd 2 cq 3 qp 2
5: mlx5_4: pd 3 cq 5 qp 4

$ rdma res show mlx5_4
5: mlx5_4: pd 3 cq 5 qp 44

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 684d82094b rdma: Allow external usage of compare string routine
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 6e0de6e886 rdma: Set pointer to device name position
The dev and link execution callbacks expects that next
command line argument is device or port name.

Set pointer to device or port name position prior calls to
rd_exec_dev()/rd_exec_link().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 1174be72d1 rdma: Add filtering infrastructure
This patch adds general infrastructure to RDMAtool to handle various
filtering options needed for the downstream resource tracking patches.

The infrastructure is generic and stores filters in list of key<->value
entries. There are three types of filters:

1. Numeric - the values are intended to be digits combined with '-' to
mark range and ',' to mark multiple entries, e.g. pid 1-100,234,400-401
is perfectly legit filter to limit process ids.

2. String - the values are consist from strings and "," as a denominator.

3. Link - special case to allow '/' in string to provide link name, e.g.
link mlx4_1/2.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 5f4892e2c8 rdma: Make visible the number of arguments
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 6416d1a01f rdma: Add option to provide "-" sign for the port number
According to the IBTA spec [1], the physical connected port is provided
for the QP in RTR-to-INIT stage performed by modify_qp(). It causes
to do not have port number for newly created QPs.

The following patch adds "-" sign to present absence of port, because
QPs are going to be associated with rdmatool link object, which needs
port number as an index.

[1] InfiniBand Architecture Release 1.3 -
	"Table 96 QP State Transition Properties"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Stephen Hemminger c30c7b6e35 include: update UAPI types.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:21:27 -08:00
Stephen Hemminger 25226777b8 include: update interface UAPI from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:21:01 -08:00
Stephen Hemminger a0fc63ed68 include: update rdma uapi from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:20:14 -08:00
Stephen Hemminger d707207f4d include: update netfilter headers from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:19:32 -08:00
Stephen Hemminger d857a7fd4b include: update uapi with BPF from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:18:53 -08:00
Serhey Popovych c14f9d92ee treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
We have helper routines to support nested attribute addition into
netlink buffer: use them instead of open coding.

Use addattr_nest_compat()/addattr_nest_compat_end() where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 15:01:09 -08:00
Serhey Popovych 28254695d1 ip: Minor cleanups
1) Rename @hdr parameter to @n to be coherent with rest of the parsing
     code.

  2) Use NLMSG_DATA() to get pointer to the data after nlmsghdr instead
     of calculating it directly in ip/tunnel code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 14:58:26 -08:00
Serhey Popovych f7af0dc580 ip: Consolidate ip, xdp and lwtunnel parse/dump prototypes in ip_common.h
Having iplink_parse() and @struct iplink_req in include/utils.h does not
reflect it's IP nature: move to ip/ip_common.h.

Move contents of ip/iplink_xdp.h and ip/iproute_lwtunnel.h to
ip/ip_common.h since they are small (i.e. only two function prototypes):
ip/iplink_bridge.c and ip/iplink_vrf.c prototypes already there.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 14:55:12 -08:00
Serhey Popovych 2cd0578fcb Revert "ip address: Change print_linkinfo_brief to take filter as an input"
This reverts commit 63891c7013.

It seems print_linkinfo_brief() never accepts filter different than
default one and David Ahern suggests to revert it instead of making
new change that actually do revert.

Conflicts:
	ip/ipaddress.c
	ip/iplink.c

These are caused by JSON support addition after commit we reverting.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-01 08:27:29 -08:00
David Ahern 1e24e773f1 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-29 08:24:57 -08:00
Stephen Hemminger 50b8a842e8 v4.15.0 2018-01-29 08:08:52 -08:00
Jakub Kicinski 44c7655186 tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f366 ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski 7f536df7f3 ip: address: fix stats64 JSON object name
The JSON object name for statistics in ip link show is "stats644".
Looks like a typo, commit d0e720111a ("ip: ipaddress.c: add support
for json output") contains an example with the expected "stats64" name.

The fact that no one has noticed until now is probably an indication
that no one is using this object.  Hopefully it's not too late to fix
this, although IIUC this has already been in 4.13 and 4.14 releases :S

Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski c061b75895 tc: prio: JSON-ify prio output
Make JSON output work with prio Qdiscs.  This will also make
other qdiscs which reuse the print_qopt work, like mqprio or
pfifo_fast.

Note that there is a double space between "priomap" and first
prio number.  Keep this original behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 13:00:18 -08:00
Jakub Kicinski 097415d510 tc: red: JSON-ify RED output
Make JSON output work with RED Qdiscs.  Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 12:59:55 -08:00
David Ahern db9fd71038 Merge branch 'get_addr_rta' into iproute2-next
Serhey Popovych  says:

====================

Now we enhance get_addr() to return additional information about address
(e.g. if it unspecified or multicast) we want to have same functionality
for attributes in netlink message.

Introduce and use get_addr_rta() that parses given netlink attribute
into @inet_prefix data structure in the same way similar get_addr()
parses address from it's string representation.

Use attribute length to guess address family: force it by giving non
AF_UNSPEC @family to get_addr_rta() to ensure address is of expected
family.

Introduce and use inet_addr_match_rta() to further simplify and unify
code where get_addr_rta() intended to be used together with
inet_addr_match().

This is next step in ipv4 and ipv6 modules unification to prepare for
merge in the future.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:32:27 -08:00
Serhey Popovych b761fc4113 ip/tunnel: Unify local/remote endpoint address printing
Introduce and use tnl_print_endpoint() helper to print of tunnel
endpoint address.

Note that for AF_INET and AF_INET6 inet_ntop(3) is used that may return
NULL in case of failure and while unlikely format_host_rta() might
return NULL too. Handle this case when passing local/remote to
print_string().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:29 -08:00
Serhey Popovych 228f2e97ba tcp_metric: Use get_addr_rta()
While there remove & from inet_prefix.data when since it is array.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:27 -08:00
Serhey Popovych 62f9f94acf ipl2tp: Use get_addr_rta()
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:25 -08:00
Serhey Popovych a4270fd8ae ipneigh: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:24 -08:00
Serhey Popovych ba6052df6d ipmroute: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:22 -08:00
Serhey Popovych 746035b4d1 iprule: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:21 -08:00
Serhey Popovych c4de9adaf5 ipaddress: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:19 -08:00
Serhey Popovych 27c523e209 utils: Introduce get_addr_rta() and inet_addr_match_rta()
First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.

Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:16 -08:00
David Ahern 68944e6f93 Merge branch 'unify_external' into iproute2-next
Serhey Popovych  says:

====================

With this series I want to unify collect metadata
handling in tunnels:

  1) Use "external" name for JSON and non-JSON output.

     Do not *print* any options when tunnel in
     collect metadata mode: gre6 already do
     this, so just apply to others.

  2) Do not *add* any attributes when configuring
     gre tunnel in collect metadata mode.

     Other tunnels (e.g. gre6, iptnl, ip6tnl)
     alredy do that.

This is next step in ipv4 and ipv6 modules
unification to prepare for merge in the future.

Any comments, suggestions and criticism as always
welcome.

v2
  For all tunnels implementing collect metadata
  use "external" keyword for both JSON. Thanks
  to Jiri Benc for detailed explanation.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:02:27 -08:00
Serhey Popovych de54cdd3de gre/gre6: Unify attribute addition to netlink buffer
There are couple of minor improvements:

  1) Check erspan_ver == 2 in gre6. It still could
     be 1 if erspan_idx is 0.

  2) Add tunnel encapsulation attributes only when
     collect metadata not in effect in gre.

  3) Trivial: address checkpatch issues.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:01:30 -08:00
Serhey Popovych 00ff4b8e31 ip/tunnel: Be consistent when printing tunnel collect metadata
Print only "external" if collect meta data attribute
is given: rest of parameters are irrelevant. This is
to follow gre6.

For both JSON and non-JSON output use "external" for
all tunnels including vxlan and geneve.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:01:26 -08:00
David Ahern 6517b5c0ac Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 09:59:03 -08:00
Wolfgang Bumiller 7ac29190db tc/lexer: let quotes actually start strings
The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-01-24 08:49:10 -08:00
Serhey Popovych 7a14358b16 iplink: Use ll_name_to_index() instead of if_nametoindex()
While benefit from using ll_name_to_index() with populated
cache can potentially be exploited only in few places
(e.g. bridge fdb/mdb/vlan show routines) there is another
advantage of ll_name_to_index() over plain if_nametoindex():

  in case of if_nametoindex() failure ll_name_to_index()
  will attempt to get index from common name in form "if%d"
  that may be returned from ll_index_to_name().

This makes output from ip(8) coherent with it's input.

Note that most of the code already switched from plain
if_nametoindex() to ll_name_to_index() to cached variant.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych 9dc04a4fd2 vti/vti6: Minor improvements
In prepare of link_vti.c and link_vti6.c merge:

  1) Make @fwmark of __u32 type instead of unsigned int
     in vti to match with rest tunneling code.

  2) Report when unable to translate @link network device
     name to index instead of silently exiting in vti6.

  3) Remove newline separating local/remote attributes
     from the ikey/okey in vti6 to match vti module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych c4743c4d9b iptnl/ip6tnl: Unify ttl/hoplimit parsing routines
Handle "inherit" case properly for gre6 and ip6tnl.

Use get_u8() in gre to parse ttl/hoplimit.

Be consistent about "hlim" alias to ttl/hoplimit
support.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych b53835de38 tunnel: Add space between encap-dport and encap-sport in non-JSON output
Fixes: bad76e6b1f ("ip/tunnel: Abstract tunnel encapsulation options printing")
Fixes: e2d4588331 ("ip: link_gre.c: add json output support")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych 3cb92eb9ab gre/gre6: Post merge fixes
Few minor changes after merge of 'master' into 'net-next' branch:

  1) Follow 80 line length for printing erspan_index parameter
     as we did in master with commit 2a8d0f6e9c ("gre/tunnel:
     Print erspan_index using print_uint()").

  2) Remove remnants of encapsulation option printing: now it
     is done using tnl_print_encap() helper in commit bad76e6b1f
     ("ip/tunnel: Abstract tunnel encapsulation options printing").

Fixes: 8c75f69411 ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-22 09:30:09 -08:00
David Ahern c2e368df0a Merge branch 'shared_block' into net-next
Jiri Pirko  says:

====================

From: Jiri Pirko <jiri@mellanox.com>

Kernel allows to share all filters between qdiscs with use
of shared block.

Example:

block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^

If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1

To make is more visual, the situation looks like this:

   ens7 ingress qdisc                 ens7 ingress qdisc
          |                                  |
          |                                  |
          +---------->  block 22  <----------+

Unlimited number of qdiscs may share the same block.

Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24

We can add filter using the block index:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

Note we cannot use the qdisc for filter manipulations of shared blocks:

$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.

We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:

$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 11:20:56 -08:00
Jiri Pirko 063463efd7 tc: implement ingress/egress block index attributes for qdiscs
During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:57 -08:00
Jiri Pirko 0c7cef9669 tc: introduce support for block-handle for filter operations
So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:53 -08:00
Jiri Pirko d0bcedd549 tc: introduce tc_qdisc_block_exists helper
This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:35 -08:00
David Ahern 40cf5b0959 Merge branch 'inet_get_addr' into net-next
Serhey Popovych  says:

====================

It looks confusing to have multiple independent
routines to get internet address from it's string
representation: get_addr() and inet_get_addr().

Most complicated users of inet_get_addr() is
iplink_geneve.c and iplink_vxlan.c because they
required to handle both AF_INET and AF_INET6
for their local/remote endpoints.

On the other hand get_addr() does not provide
additional information like address type: need
to address this. to get rid of current and
possible future code duplications. Note that
this functionality is first step to make proto
independent handling of local/remote endpoints
in ip/tunnel code (there will be additional
series based on this one).

Also fix get_addr_1() and get_prefix() to make
sure it always provide correct ->family and
->bitlen.

As always comments, suggestions and criticism
are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:11:07 -08:00
Serhey Popovych 6caad8f505 ip: Get rid of inet_get_addr()
Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:26 -08:00
Serhey Popovych 1e9b8072de iplink_vxlan: Get rid of inet_get_addr()
Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:23 -08:00
Serhey Popovych 6c4b672738 iplink_geneve: Get rid of inet_get_addr()
Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:22 -08:00
Serhey Popovych 7bf5e876d0 utils: Fast inet address classification after get_addr()
It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.

For now following information can be returned:

  1) address is unspecified (zero)
  2) address is multicast
  3) address is internet: family is either AF_INET or
     AF_INET6.

More information can be added in the future.

Introduce inline helpers to make code using this new
address classification interface more self explaining:

  bool is_addrtype_inet(inet_prefix *addr)
    true if @addr is inet address

  bool is_addrtype_inet_unspec(inet_prefix *addr)
    true if @addr is unspecified inet address

  bool is_addrtype_inet_multi(inet_prefix *addr)
    true if @addr is multicast inet address

  bool is_addrtype_inet_not_unspec(inet_prefix *addr)
    true if @addr is not unspecified inet address
    false if @addr is not inet or unspecified inet

  bool is_addrtype_inet_not_multi(inet_prefix *addr)
    true if @addr is not multicast inet address
    false if @addr is not inet or multicast inet

Last two are useful for case when we need inet address
that is not unspecified or multicast.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:21 -08:00
Serhey Popovych 93fa12418d utils: Always specify family and ->bytelen in get_prefix_1()
Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.

Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.

Reduce intendation by one level and get rid of goto/label
to make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:19 -08:00
Serhey Popovych f2522007d8 utils: Always specify family for address in get_addr_1()
Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).

Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.

Still assume AF_INET when @family is AF_UNSPEC.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:17 -08:00
David Ahern 8c75f69411 Merge branch 'master' into net-next
Conflicts:
	ip/link_gre.c
	ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:37:39 -08:00
Jakub Kicinski 5691e6bc58 bpf: support map offload
When program is loaded with a specified ifindex, use that
ifindex also when creating maps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:41 -08:00
Jakub Kicinski e0850bdedc tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:23 -08:00
David Ahern c0788a09d4 Update kernel headers to 4.15-rc8
Update kernel headers to commit 30c3e9d47035
("l2tp: remove switch block in l2tp_nl_cmd_session_create()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:33:41 -08:00
Serhey Popovych c9391f120e tunnel: Return constant string without copying it
We return constant string from tnl_strproto(), no need
to copy it to temporary buffer and then return such
buffer as const: return constant string instead.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych b8dc6c5b0e vti6/tunnel: Unify and simplify link type help functions
Both of these two changes are missing for link_vti6.c:

  commit 8b47135474 ("ip: link: Unify link type help functions a bit")
  commit 561e650eff ("ip link: Shortify printing the usage of link type")

Replay them on link_vti6.c to bring link type help functions
inline with other tunneling code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych 34a8c54d6d vti/tunnel: Unify ikey/okey printing
For vti6 tunnel we print [io]key in dotted-quad notation
(ipv4 address) while in vti we do that in hex format.

For vti tunnel we print [io]key only if value is not
zero while for vti6 we miss such check.

Unify vti and vti6 tunnel [io]key output.

While here enlarge s2 buffer to the same size as in rest
of tunnel support code (64 bytes) and check return from
inet_ntop().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych 2a8d0f6e9c gre/tunnel: Print erspan_index using print_uint()
One is missing in JSON output because fprintf()
is used instead of print_uint().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych bad76e6b1f ip/tunnel: Abstract tunnel encapsulation options printing
Get rid of code duplications and consolidate encapsulation
options printing in single function - tnl_print_encap().

Introduce and use tnl_encap_str() to format encapsulation
option string according to tempate and given values to avoid
code duplication and simplify it.

Use print_string() instead of fputs() and fprintf() to
print encapsulation for !is_json_context().

Print "unknown" parameter for "encap" type in PRINT_FP
context using "%s " format specifier and benefit from
complite time string merge.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych e97ad3d248 ip/tunnel: Use print_0xhex() instead of print_string()
No need for custom SPRINT_BUF() and snprintf() 0x%x
value to this buffer: we can use print_0xhex() instead
of print_string().

In link_iptnl.c use s2 instead of s1 buffer and remove
s1.

While there adjust fwmark option print order in iptnl
and ip6tnl to get it match each other.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 3caa526c7b ip/tunnel: Simplify and unify tos printing
For ip tunnels tos can be 0 when not configured, 1 when
inherited from encapsulated packet and rest specifying
diffserv (rfc2474) or tos (rfc1349) bits. It is stored
in packet tos/diffserv field and returned in tos
netlink attribute to userspace.

Simplify and unify tos printing by using print_0xhex()
and print_string() instead of fprintf() to output values.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 375560c4ab ip/tunnel: Correct and unify ttl/hoplimit printing
Both ttl/hoplimit is from 1 to 255. Zero has special meaning:
use encapsulated packet value. In ip-link(8) -d output this
looks like "ttl/hoplimit inherit". In JSON we have "int" type
for ttl and therefore values from 0 (inherit) to 255.

To do the best in handling ttl/hoplimit we need to accept
both cases: missing attribute in netlink dump and zero value
for "inherit"ed case. Last one is broken since JSON output
introduction for gre/iptnl versions and was never true for
gre6/ip6tnl.

For all tunnels, except ip6tnl change JSON type from "int" to
"uint" to reflect true nature of the ttl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 45d3a6efb2 iplink: Use ll_index_to_name() instead of if_indextoname()
There are two reasons for switching to cached variant:

  1) ll_index_to_name() may return result from cache,
     eliminating expensive ioctl() to the kernel.

     Note that most of the code already switched from plain
     if_indextoname() to ll_index_to_name() to cached variant
     in print path because in most cases cache populated.

  2) It always return name in the form "if%d", even if
     entry is not in cache and ioctl() fails. This drops
     "link_index" from JSON output.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:37 -08:00
Arkadi Sharshevsky c619f2be8b devlink: Ignore unknown attributes
In case of extending the UAPI old packages would break.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:30:36 -08:00
Serhey Popovych c4fb35bdfc iplink: Fix "alias" parameter length calculations
We need NEXT_ARG() to get *argv pointing to "alias"
parameter value. Overwise we get and check "alias"
string length.

Fixes: f88becf35e ("iplink: Process "alias" parameter correctly")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:24:43 -08:00
Gal Pressman c7db3921ec man: Document the meaning of zero in min/max_tx_rate parameters
Zero value in min/max_tx_rate has a special meaning of no rate limit,
document it.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Gal Pressman 39315157ab ipaddress: Make sure VF min/max rate API is supported before using it
When using the new minimum rate API and providing only one parameter
(minimum rate/maximum rate), we query the VF min and max rate regardless
of kernel support.
This resulted in segmentation fault in ipaddr_loop_each_vf, which tries
to access NULL pointer.

This patch identifies such cases by testing the VF table for NULL
pointer in IFLA_VF_RATE, and aborts the operation.
Aborting on the first VF is valid since if the kernel does not support
the new API for the first VF, it will not support it for the other VFs
as well.

Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Gal Pressman 04be08e0bd iplink: Validate minimum tx rate is less than maximum tx rate
According to the documentation (man ip-link), the minimum TXRATE should
be always <= Maximum TXRATE, but commit f89a2a05ff ("Add support to
configure SR-IOV VF minimum and maximum Tx rate through ip tool") didn't
enforce it.

Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Serhey Popovych a3e0229e25 ipaddress: Use family_name() for better code reuse
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-17 10:42:17 -08:00
Phil Sutter 6f7df6b2a1 tc: Optimize gact action lookup
When adding a filter with a gact action such as 'drop', tc first tries
to open a shared object with equivalent name (m_drop.so in this case)
before trying gact. Avoid this by matching the action name against those
handled by gact prior to calling get_action_kind().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-01-17 10:27:47 -08:00
David Ahern e209d4c83d Merge branch 'tc-batch' into net-next
Chris Mi says:

====================
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this patchset, at most 128
commands can be accumulated before sending to kernel.

We introduced a new function in patch 1 to support for sending
multiple messages. In patch 2, we add this support for filter
add/delete/change/replace and actions add/change/replace commands.

But please note that kernel still processes the requests one by one.
To process the requests in parallel in kernel is another effort.
The time we're saving in this patchset is the user mode and kernel mode
context switch. So this patchset works on top of the current kernel.

Using the following script in kernel, we can generate 1,000,000 rules.
	tools/testing/selftests/tc-testing/tdc_batch.py

Without this patchset, 'tc -b $file' exection time is:

real    0m15.555s
user    0m7.211s
sys     0m8.284s

With this patchset, 'tc -b $file' exection time is:

real    0m12.360s
user    0m6.082s
sys     0m6.213s

The insertion rate is improved more than 10%.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-15 08:27:20 -08:00
Chris Mi 485d0c6001 tc: Add batchsize feature for filter and actions
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this support, at most 128
commands can be accumulated before sending to kernel.

Now it only works for the following successive commands:
1. filter add/delete/change/replace
2. actions add/change/replace

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:35 -08:00
Chris Mi 72a2ff3916 lib/libnetlink: Add a new function rtnl_talk_iov
rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:33 -08:00
Christian Ehrhardt 9bed02a5d5 tests: make sure rand_dev suffix has 6 chars
The change to limit the read size from /dev/urandom is a tradeoff.
Reading too much can trigger an issue, but so it could if the
suggested 250 random chars would not contain enough [:alpha:] chars.
If they contain:
 a) >=6 all is ok
 b) [1-5] the devname would be shorter than expected (non fatal).
 c) 0 things would break

In loops of hundreds of thousands it always was (a) for my, but since
if occuring in a test it might be hard to track what happened avoid
this issue by retrying on the length condition.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:29:51 -08:00
Christian Ehrhardt 4afbeaeeaf tests: read limited amount from /dev/urandom
In some test environments like e.g. Ubuntu & Debian autopkgtest it
can happen that while generating random device names the pipes
between tr and head are considered dead while processing.
That prints (non fatal) issues like:
  Running ip/link/new_link.t [iproute2-this/4.13.0-17-generic]: tr:
write error: Broken pipe
  tr: write error
  PASS

This only happens if reading an infinite amount of chars with the
read from urandom, so reading a defined amount fixes the issue.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:29:51 -08:00
Mike Frysinger a8b970d7d2 ifcfg/rtpr: convert to POSIX shell
These files are already mostly written in POSIX shell, so convert their
shebangs to /bin/sh and tweak the few bashisms in here.

URL: https://crbug.com/756559
Reported-by: Pat Erley <perley@chromium.org>
Signed-off-by: Mike Frysinger <vapier@chromium.org>
2018-01-10 08:26:09 -08:00
Mike Frysinger 54f5991acd mark shell scripts +x
This makes it easier to execute locally for testing.

Signed-off-by: Mike Frysinger <vapier@chromium.org>
2018-01-10 08:23:49 -08:00
Stephen Hemminger 7d63671030 tc: remove no longer relevant README
This document described how kernel and tc used to handle
timing. In last two years, kernel has switched over to using
ktime. Nothing to see here, move along.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:21:22 -08:00
Serhey Popovych cc899123cc ip6tnl/tunnel: Output hoplimit before encapsulation limit
To follow gre6 output print hoplimit before encapsulation
limit in link_ip6tnl.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 763cf4956d gre6/tunnel: Output flowlabel after tclass
To follow ip6tnl output print flowlabel after tclass
in link_gre6.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych e3945d92b0 ip6/tunnel: Unify encap_limit printing
Use %u format specifier to print it in link_gre6.c and
make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych a0fd0c3a30 ip6/tunnel: Unify flowlabel printing
Use @s2 buffer to store string representation of
flowlabel and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

Omit flowlabel from output if no flowinfo attribute
is given and IP6_TNL_F_USE_ORIG_FLOWLABEL isn't set.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 090524f899 ip6/tunnel: Unify tclass printing
Use @s2 buffer to store string representation of
tclass and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

While there use __u32 for flowinfo in link_gre6.c
and check for IFLA_GRE_FLOWINFO attribute presense.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 4dc6665b6b ip6tnl/tunnel: Do not print obscure flowinfo
It is implementation internal and main purpose
of printing it seems debugging.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych b76b24006c ip6/tunnel: Fix tclass output
In link_gre6.c it seems copy paste error: tclass is 8 bits,
not 20 as flowlabel.

In link_iptnl.c rename "flowinfo_tclass" to "tclass" as it
correct name since flowinfo is implementation internal name
used to label combined within u32 attribute tclass and
flowlabel.

Fixes: 1facc1c61c ("ip: link_ip6tnl.c: add json output support")
Fixes: 2e706e12d9 ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:11 -08:00
Jamal Hadi Salim 24a5a48e27 tc: Fix filter protocol output
Fixes: 249284ff5a ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-01-09 08:09:10 -08:00
Stephen Hemminger a366508913 include: update ethernet headers
Incorporate upstream changes to fix compliation with MUSL.
See commit 6926e041a892
 ("uapi/if_ether.h: prevent redefinition of struct ethhdr")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:08:03 -08:00
Antonio Quartulli ebbb219c92 ss: fix NULL pointer access when parsing unix sockets with oldformat
When parsing and printing the unix sockets in unix_show(),
if the oldformat is detected, the peer_name member of the sockstat
object is left uninitialized (NULL).
For this reason, if a filter has been specified on the command line,
a strcmp() will crash when trying to access it.

Avoid crash by checking that peer_name is not NULL before
passing it to strcmp().

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:02:46 -08:00
Antonio Quartulli 192be8fccb ss: fix crash when skipping disabled header field
When the first header field is disabled (i.e. when passing the -t
option), field_flush() is invoked with the `buffer` global variable
still zero'd.
However, in field_flush() we try to access buffer.cur->len
during variables initialization, thus leading to a SIGSEGV.

It's interesting to note that this bug appears only when the code
is compiled with -O0, because the compiler is smart
enough to immediately jump to the return statement if optimizations
are enabled and skip the faulty instruction.

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:02:46 -08:00
Filip Moc 33f6dd23a5 ip fou: pass family attribute as u8
This fixes fou on big-endian systems.

Signed-off-by: Filip Moc <dev@moc6.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 07:58:37 -08:00
David Ahern ac6561417a Restore --no-print-directory option for silent builds
Commit 69fed534a5 ("change how Config is used in Makefile's") removed
Config from Makefile. Config had the checks to set VERBOSE based on user
request and VERBOSE is used to add the --no-print-directory argument.
Since Config is gone, add the relevant setup for VERBOSE to Makefile
to restore quieter builds by default.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 07:58:00 -08:00
David Ahern 6a21ca8a4a Merge branch 'master' into net-next
Conflicts:
	man/man8/ip-link.8.in

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-08 10:10:45 -08:00
Serhey Popovych 9deb754283 link_iptnl: Open "encap" JSON object
It seems missing pair of open_json_object()/close_json_object()
in iptnl implementation.

Note that we open "encap" JSON object in ip6tnl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Serhey Popovych d9aefbc0b8 link_iptnl: Print tunnel mode
Tunnel mode does not appear in parameters print for iptnl
supported tunnels like ipip and sit, while printed for
ip6tnl.

Print tunnel mode as "proto" field name for JSON and
without any name when printing to cli to follow ip6tnl
behaviour.

For non JSON output we have:

   $ ip -d link show dev sit1

Before:
-------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit remote any local X.X.X.X ...
        ~~~

After:
------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit any remote any local X.X.X.X ...
        ^^^

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Serhey Popovych 68a7f5ed47 link_iptnl: Kill code duplication
Both sit and ipip "mode" parameter handling nearly the same.
Except for sit we have "ip6ip" mode: check it only when
configuring sit.

Note that there is no need strcmp(lu->id, "ipip"): if it is
not sit it is "ipip" because we have only these two link util
defined in module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Matthias Schiffer cfd6ccbfd0 devlink, rdma, tipc: properly define TARGETS without HAVE_MNL
Leaving a variable with a generic name such as TARGETS undefined would lead
to Make picking up its value from the environment. Avoid this by always
defining TARGETS in the Makefiles.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
2018-01-05 16:32:17 -08:00
Jakub Kicinski 7d424c7193 ip: link: add support for netdevsim device type
netdevsim is a new software device for testing kernel APIs
without any hardware attached.  Allow users to create such
devices.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-01-02 20:46:19 -08:00
Luca Boccassi c7d6cbaf85 man: fix small formatting errors
Lintian detected the following formatting errors:

 man/man8/devlink-sb.8.gz 230: warning: macro `b' not defined
 man/man8/ip-link.8.gz 1243: warning: macro `in-8' not defined
  (possibly missing space after `in')
 man/man8/tc-u32.8.gz `R' is a string (producing the registered sign),
  not a macro.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-02 11:29:39 -08:00
Luca Boccassi 36c1d2383a man: routel/routef: don't mention filesystem paths
The filesytem paths to these scripts might be different on various
distros, so don't mention it in the manpages. It is not really useful
information anyway.

Originally submitted as Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424

Reported-by: jidanni@jidanni.org
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi fe2ab15d2c man: ip-address: document 15-char limit for LABEL
Trying to set a label longer than 15 characters returns an error:
 RTNETLINK answers: Numerical result out of range

Document the limit in the manpage.

Originally reported as a Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886

Reported-by: Gabor Kiss <kissg@ssg.ki.iif.hu>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi be78fade55 man: add more keywords to ip.8 short description
A Debian user suggested adding more network-related keywords to the
ip manpage, so that manpage-scraping and indexing software like
apropos can do a better job of categorizing the programs.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983

Suggested-by: Lynoure Braakman <lynoure@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi cd25876440 man: drop references to Debian-specific paths
Documentation should be distribution-agnostic - any specific quirks
should be handled by downstream maintainers, if necessary.
Remove mentions of Debian paths and package names.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Serhey Popovych b760a8823a ip/tunnel: Document "external" parameter
Add it to ip-link(8) "type gre" output help message
as well as to ip-link(8) page.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-28 09:40:02 -08:00
Serhey Popovych 8c0b19d178 vxcan,veth: Forbid "type" for peer device
It is already given for original device we configure this
peer for.

Results from following command before/after change applied
are shown below:

  $ ip link add dev veth1a type veth peer name veth1b \
                           type veth peer name veth1c

Before:
-------

<no output, no netdevs created>

After:
------

Error: duplicate "type": "veth" is the second value.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-28 09:35:27 -08:00
Yuval Mintz b97c6fa71d qdisc: print offload indication
Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-27 13:55:16 -08:00
Serhey Popovych afdf9277eb gre6/tunnel: Do not submit garbage in flowinfo
We always send flowinfo to the kernel. If flowlabel/tclass
was set first to non-inherit value and then reset to
inherit we do not clear flowlabel/tclass part in flowinfo,
send it to kernel and can get from the kernel back.

Even if we check for IP6_TNL_F_USE_ORIG_TCLASS and
IP6_TNL_F_USE_ORIG_FLOWLABEL when printing options
sending invalid flowlabel/tclass to the kernel seems
bad idea.

Note that ip6tnl always clean corresponding flowinfo
parts on inherit.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-27 13:45:37 -08:00
Serhey Popovych 147ade01b0 gre,ip6tnl/tunnel: Fix noencap- support
We must clear bit, not set all but given bit.

Fixes: 858dbb208e ("ip link: Add support for remote checksum offload to IP tunnels")
Fixes: 73516e128a ("ip6tnl: Support for fou encapsulation"
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-27 13:41:42 -08:00
Leon Romanovsky 874c734c1c rdma: Move link execution logic to common code
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 5fc17280b1 rdma: Rename rd_free_devmap to be rd_free
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 7109f4b212 rdma: Rename free function to be rd_cleanup
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky b1e6bc437f rdma: Get rid of dev_map_free call
The dev_map_free() is called once only and it is short,
so it is better to integrate it into the caller's site.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky b55644412f rdma: Print supplied device name in case of wrong name
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky e3dee3c81f rdma: Check that port index exists before operate on link layer
Link layer operates on port layer, hence it should check
it existence before execution commands.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 4e2eb9fdf9 rdma: Fix misspelled SYS_IMAGE_GUID
SYS_IMAGE_GUIG is actually SYS_IMAGE_GUID.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 8b92a2c930 rdma: Move per-device handler function to generic code
Most of the proposed objects are working in the scope "dev"
and will implement the same logic. Move the code to utils.c,
so other objects will be able to reuse the code.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 99da90326e rdma: Protect dev_map_lookup from wrong input
Despite the fact that all callers to dev_map_lookup are ensuring that
there is always device name prior to call to that function, it is better
and safer to check that in the dev_map_lookup itself.

Fixes: 40df8263a0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 0fc8c30b4e rdma: Reduce scope of _dev_map_lookup call
There is no external users of _dev_map_lookup function,
so let's limit its scope to be local.

Fixes: 40df8263a0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
William Tu 1e4be52bf2 erspan: add erspan usage description
The patch adds erspan usage description, so 'ip link help erspan'
and 'ip link help ip6erspan' shows the options.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Serhey Popovych 08ede25fda ip/tunnel: No need to free answer after rtnl_talk() on error
Since rtnl_talk() never returns with answer buffer allocated
on error we do not need to release it manually. After this
initializing answer with NULL before rtnl_talk() is useless.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-26 09:07:43 -08:00
Serhey Popovych 1ed8a5ca87 utils: ll_addr: Handle ARPHRD_IP6GRE in ll_addr_n2a()
ll_addr_n2a() correctly prints tunnel endpoints for gre, ipip, sit
and ip6tnl, but not for ip6gre. Fix this by adding ARPHRD_IP6GRE to
IPv6 tunnel endpoing address conversion.

Before:
-------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd \
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

After:
------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 :: brd ::

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-26 09:07:42 -08:00
William Tu 2897636267 erspan: add erspan version II support
The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation.  Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph.  Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-20 15:18:04 -08:00
David Ahern f82517f80d Update headers from 4.15-rc3
Update kernel headers to commit f39a5c01c3d2 ("Merge branch
'nfp-flower-add-Geneve-tunnel-support'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-20 15:17:59 -08:00
Alexander Zubkov 9135c4d603 iproute: "list/flush/save default" selected all of the routes
When running "ip route list default" and not specifying address family,
one will get all of the routes instead of just default only. The same
is for "exact default" and "match default".

It behaves in such a way because default route with unspecified family
has the same all-zeroes value like no prefix specified at all. Thus
following code blindly ignores the fact, that prefix was actually
specified.

This patch adds the flag PREFIXLEN_SPECIFIED to the default route too.
And then checks its value when filtering routes.

Signed-off-by: Alexander Zubkov <green@msu.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:23:09 -08:00
Alexander Zubkov ed7fdc950d iproute: list/flush/save filter also by metric
Metric is one of the "unique key" fields of the route in Linux. But
still one can not use its value in filter while running ip list.
Because of this writing checks in scripts for example is incovenient.

Signed-off-by: Alexander Zubkov <green@msu.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:16:10 -08:00
Serhey Popovych 560cf61253 link_vti6: Always add local/remote endpoint attributes
All tunnels already support for parsing/adding zero
endpoints and vti6 isn't an exception.

This check was added as part of commit 2a80154fde
(vti6: fix local/remote any addr handling) and looks
too restrictive as purpose of change is to avoid
endpoint configuration from uninitialized data.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:59 -08:00
Serhey Popovych 95614cc8a3 link_ip6tnl: Use IN6ADDR_ANY_INIT to initialize local/remote endpoints
Use specialized helper to initialize endpoint addresses with
zeros instead of open coding this. This unifies initialization
style with other ipv6 tunnel variants (i.e. gre6 and vti6).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:01 -08:00
Serhey Popovych 1f44b93744 ip/tunnel: Use tnl_parse_key() to parse tunnel key
It is added with
commit a7ed1520ee ("ip/tunnel: introduce tnl_parse_key()")
to avoid code duplication in ip6?tunnel.c.

Reuse it for gre/gre6 and vti/vti6 tunnel rtnl
configuration interface with the same purpose
it is used in tunnel ioctl interface in ip6?tunnel.c.

While there change type of key variables from
unsigned integer to __be32 to reflect nature of the
value they store and place error message in
tnl_parse_key() on a single line to make single
call to fprintf().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:01 -08:00
Serhey Popovych dac9ff35ea iplink: Kill redundant network device name checks
Since commit 625df645b7 (Check user supplied interface name lengths)
iplink_parse() validates network device name using check_ifname()
helpers.

Remove redundant "name" length checks from iplink_parse() callers.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:12:41 -08:00
Serhey Popovych f88becf35e iplink: Process "alias" parameter correctly
Do not stop parameters processing after "alias" parameter: it might
not be a last one. Seems copy pasted from "type" parameter code.

Check it's length does not exceed IFALIASZ - 1. Better we warn
than get RTNL error.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:11:38 -08:00
Serhey Popovych b7ea12ae43 iplink: Improve index parameter handling
Correctly check for valid network device index supplied on
command line: indexes are always greather than zero. Check
for duplicate "index" argument.

Initialize @index to 0 to simplify handling it in iplink_modify().
Other callers (link_veth.c, iplink_vxcan.c) already did so.

No need to initialize ifi_index with 0 since it is already
initialized at the @struct req initialization time and not
modified in iplink_parse().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:11:38 -08:00
Stephen Hemminger bd9cea5d8c utils: fix makeargs stack overflow
The makeargs() function did not handle end of string correctly
and would reference past end of string.

Found by fuzzing with ASAN.

Reported-by:Bug Basher <iamliketohack@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-18 11:19:48 -08:00
Stephen Hemminger 5073581835 ss: fix crash with invalid command input file
If given an invalid input file with -F flag, ss would crash.
Examples of invalid input are line to long, or null file.

Found by fuzzing with ASAN.

Reported-by:Bug Basher <iamliketohack@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-18 11:18:55 -08:00
Stephen Hemminger ae8e1cb83b ip: validate vlan value for vlan info
The VLAN tag must be 0..4095 to be valid.
Better to trap it here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 13:14:38 -08:00
Serhey Popovych a6addd5cdc ip: gre: fix IFLA_GRE_LINK attribute sizing
Attribute IFLA_GRE_LINK is 32 bit long, not 8 bit.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Serhey Popovych 9aceaad71b ip/tunnel: Use get_addr() instead of get_prefix() for local/remote endpoints
Manual page ip-link(8) states that both local and remote accept
IPADDR not PREFIX. Use get_addr() instead of get_prefix() to
parse local/remote endpoint address correctly.

Force corresponding address family instead of using preferred_family
to catch weired cases as shown below.

Before this patch it is possible to create tunnel with commands:

  ip    li add dev ip6gre2 type ip6gre local fe80::1/64 remote fe80::2/64
  ip -4 li add dev ip6gre2 type ip6gre local 10.0.0.1/24 remote 10.0.0.2/24

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Serhey Popovych 57daab1e70 ip/tunnel: Unify setup and accept zero address for local/remote endpoints
It is fully legal to submit zero (INADDR_ANY/IN6ADDR_ANY_INIT)
value for local and/or remote endpoints for all tunnel drivers:
no need additionally check this in userspace.

Note that all tunnel specific code already can pass zero address
to the kernel.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Oliver Hartkopp 1eccc57341 ip: add vxcan/veth to ip-link man page
veth and vxcan both create a vitual tunnel between a pair of virtual network
devices. This patch adds the content for the now supported vxcan netdevices
and the documentation to create peer devices for vxcan and veth.

Additional remove 'can' that accidently was on the list of link types which
can be created by 'ip link add' as 'can' devices are real network devices.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:04:33 -08:00
Roman Mashak 3d791a326b ss: add missing path MTU parameter
v3:
   Rebase and use out() instead of printf().
v2:
   Print the path MTU immediately after the MSS, as it is easier to parse
   for humans (suggested by Neal Cardwell).

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:02:34 -08:00
Stephen Hemminger 2c6aaad949 include: qdisc offload defines
UAPI changes from upstream:
	net: sched: Add TCA_HW_OFFLOAD
	pkt_sched: Remove TC_RED_OFFLOADED from uapi

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:00:43 -08:00
Stephen Hemminger c189177efc Merge branch 'master' into net-next 2017-12-14 21:19:54 -08:00
William Tu 6231c5bec6 gre6: add collect metadata support
The patch adds 'external' option to support collect metadata
gre6 tunnel.  The 'external' keyword is already used to set the
device into collect metadata mode such as vxlan, geneve, ipip,
etc.  This patch extends support for ipv6 gre and gretap.
Example of L3 and L2 gre device:
bash:~# ip link add dev ip6gre123 type ip6gre external
bash:~# ip link add dev ip6gretap123 type ip6gretap external

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 21:19:49 -08:00
Chris Mi 83cf5bc73b tc: fix command "tc actions del" hang issue
If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
not be called. Command tc will wait for the reply for ever.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 21:17:04 -08:00
Stephen Hemminger 08f9d166c3 iplink: add definitions for GSO_MAX
Until kernel exports these, add GSO_MAX values into iplink
rather than assuming they are UINT_MAX + 1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 18:22:56 -08:00
Solio Sarabia 051274b4db iplink: validate maximum gso_max_size
Validate the upper limit for gso_max_size, valid range is [0-65,536]
inclusive. Fix minor whitespace in iplink man page.

Signed-off-by: Solio Sarabia <solio.sarabia@intel.com>
2017-12-14 18:12:14 -08:00
Jiri Pirko 1876ab0779 tc: fix json array closing
Fixes: 2704bd6255 ("tc: jsonify actions core")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-12-13 18:16:27 -08:00
Oliver Hartkopp 7827b37603 ip: add vxcan to help text
Add missing tag 'vxcan' inside the help text which was missing in commit
efe459c76d ('ip: link add vxcan support').

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2017-12-13 18:16:22 -08:00
Phil Dibowitz 7b17832445 Show 'external' link mode in output
Recently `external` support was added to the tunnel drivers, but there is no way
to introspect this from userspace. This adds support for that.

Now `ip -details link` shows it:

```
7: tunl60@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group
default qlen 1
    link/tunnel6 :: brd :: promiscuity 0
    ip6tnl external any remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000) addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```

Signed-off-by: Phil Dibowitz <phil@ipom.com>
2017-12-13 18:15:51 -08:00
Stephen Hemminger 3a2fbf007b Merge branch 'master' into net-next 2017-12-12 12:12:20 -08:00
Davide Caratti 88b428f03f tc: bash-completion: add missing 'classid' keyword
users of 'matchall' filter can specify a value for the class id: update
bash-completion accordingly.

Fixes: b32c0b64fa ("tc: bash-completion: Add support for matchall")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-12-12 12:11:37 -08:00
Stefano Brivio 87b1a7aec7 ss: Implement automatic column width calculation
Group fitting fields into lines and space them equally using the
remaining screen width for each line. If columns don't fit on
one line, break them into the least possible amount of lines and
keep them aligned across lines.

This is done by:
 - recording the length of the longest item in each column during
   formatting and buffering (which was added in the previous patch)
 - fitting as many fields as possible on each line of output
 - distributing the remaining padding space equally between the
   columns

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 691bd854bf ss: Buffer raw fields first, then render them as a table
This allows us to measure the maximum field length for each
column before printing fields and will permit us to apply
optimal field spacing and distribution. Structure of the output
buffer with chunked allocation is described in comments.

Output is still unchanged, original spacing is used.

Running over one million sockets with -tul options by simply
modifying main() to loop 50,000 times over the *_show()
functions, buffering the whole output and rendering it at the
end, with 10 UDP sockets, 10 TCP sockets, while throwing
output away, doesn't show significant changes in execution time
on my laptop with an Intel i7-6600U CPU:

- before this patch:
$ time ./ss -tul > /dev/null
real	0m29.899s
user	0m2.017s
sys	0m27.801s

- after this patch:
$ time ./ss -tul > /dev/null
real	0m29.827s
user	0m1.942s
sys	0m27.812s

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 59f46b7b5b ss: Introduce columns lightweight abstraction
Instead of embedding spacing directly while printing contents,
logically declare columns and functions to buffer their content,
to print left and right spacing around fields, to flush them to
screen, and to print headers.

This makes it a bit easier to handle layout changes and prepares
for full output buffering, needed for optimal spacing in field
output layout.

Columns are currently set up to retain exactly the same output
as before. This needs some slight adjustments of the values
previously calculated in main(), as the width value introduced
here already includes the width of left delimiters and spacing
is not explicitly printed anymore whenever a field is printed.
These calculations will go away altogether once automatic width
calculation is implemented.

We can also remove explicit printing of newlines after the final
content for a given line is printed, flushing the last field on
a line will cause field_flush() to print newlines where
appropriate.

No changes in output expected here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 90351722cb ss: Replace printf() calls for "main" output by calls to helper
This is preparation work for output buffering, which will allow
us to use optimal spacing and alignment of logical "columns".

The new out() function is just a re-implementation of a typical
libc's printf(), except that the return value of vfprintf() is
ignored as no callers use it. This implementation will be
replaced in the next patches to provide column width adjustment
and adequate spacing.

All printf() calls that output parts of the socket list are now
replaced by calls to out(). Output of summary and version is
excluded from this.

No functional differences here, output not affected.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stephen Hemminger 81724d6142 Merge branch 'master' into net-next 2017-12-11 16:06:11 -08:00
Stephen Hemminger 4b072e9b49 uapi: tun add eBPF based queue selection method
Upstream commit 96f84061620c6325a2ca9a9a05b410e6461d03c3
    tun: add eBPF based queue selection method

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 16:03:27 -08:00
Stephen Hemminger b7f5fd3698 uapi: add access to snd_cwnd and other sock_ops
From upstream kernel commit f19397a5c65665d66e3866b42056f1f58b7a366b
    bpf: Add access to snd_cwnd and others in sock_ops

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 16:01:17 -08:00
Roman Mashak 9f1a9ae888 ss: remove duplicate assignment
Fixes: 8250bc9ff4 ("ss: Unify inet sockets output")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 15:56:10 -08:00
Stephen Hemminger c2db423f7c iplink: allow configuring GSO max values
This allows sending GSO maximum values when configuring a device.
The values are advisory. Most devices will ignore them but for some
pseudo devices such as veth pairs they can be set.

Example:
	# ip link add dev vm1 type veth peer name vm2 gso_max_size 32768

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-12-08 21:33:08 -08:00
Stephen Hemminger 5c6e3478ac Merge branch 'master' into net-next 2017-12-08 21:32:33 -08:00
Michal Privoznik 3572e01a09 tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()
Not all callers want parse_action_control*() to advance the
arguments. For instance act_parse_police() does the argument
advancing itself.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-12-08 10:29:01 -08:00
Wei Wang 00ac78d39c ss: print tcpi_rcv_ssthresh
tcpi_rcv_ssthresh is an important stats when debugging receive side
behavior.
Add it to the ss output.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
2017-12-08 10:27:57 -08:00
Stephen Hemminger 39be47fb5e update headers from 4.15-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-05 17:30:29 -08:00
Phil Sutter 6bf156415a man: tc-csum.8: Fix inconsistency in example description
Commit 6bbe5e6290 ("man: tc-csum.8: Fix example") changed both source
and destination IP addresses in example code but missed to update the
example's description accordingly.

Fixes: 6bbe5e6290 ("man: tc-csum.8: Fix example")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-29 10:14:51 -08:00
Stephen Hemminger b38778bb5e update bpf header from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-28 18:16:51 -08:00
Stephen Hemminger f6351157b9 Merge branch 'master' into net-next 2017-11-28 09:53:28 -08:00
Jiri Pirko 615634c30e man: add -json option to tc manpage
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-28 09:52:26 -08:00
Robert Shearman b6fae7887f vxlan: Make id optional when modifying a link
Specifying the IFLA_VXLAN_LINK attribute on a vxlan link modify is
optional in the kernel, so make the id argument optional for "ip link
set ..." to avoid a user needing to specify it when changing another
attribute.

Signed-off-by: Robert Shearman <rs823p@att.com>
2017-11-28 09:48:26 -08:00
Robert Shearman 079e67816e gre: Fix ttl inherit option
Specifying "... ttl inherit" currently does nothing on a GRE link
modify since the previous ttl value is retrieved up front. Fix this by
explicitly setting ttl to 0 when "inherit" is specified for the
option, since 0 represents the semantics of inherit.

Signed-off-by: Robert Shearman <rs823p@att.com>
2017-11-28 09:48:22 -08:00
Phil Sutter 56708ae7c9 link_gre6: Detect invalid encaplimit values
Looks like a typo: get_u8() returns 0 on success and -1 on error, so the
error checking here was ineffective.

Fixes: a11b7b71a6 ("link_gre6: really support encaplimit option")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-28 09:48:13 -08:00
Stephen Hemminger c6a656f4f9 m_mirred: style cleanups
Fix whitespace and long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:42:17 -08:00
Stephen Hemminger 5c235ac27e m_gact: whitespace cleanup
Fix whitespace errors reported by checkpatch

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:38:21 -08:00
Stephen Hemminger ed4856919f m_action: style cleanup
Break long lines, and use bool where possible.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:36:15 -08:00
Stephen Hemminger eb4bccf12b m_vlan: style cleanups
Break long lines and make duplicated code into function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:28:55 -08:00
Jiri Pirko b021ee40f6 tc: jsonify vlan action
Add json output to vlan action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 502c4adf19 tc: jsonify mirred action
Add json output to mirred action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 66fedb6df0 tc: jsonify gact action
Add json output to gact action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 2704bd6255 tc: jsonify actions core
Add json output to actions core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 619ca351e3 tc: jsonify matchall filter
Add json output to matchall filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko e28b88a464 tc: jsonify flower filter
Add json output to flower filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 249284ff5a tc: jsonify filter core
Add json output to filter core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko f354fa6aa5 tc: jsonify htb qdisc
Add json output to htb qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 378ac491f5 tc: jsonify fq_codel qdisc
Add json output to fq_codel qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 4fcec7f366 tc: jsonify stats2
Add json output to stats2.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko c91d262f41 tc: jsonify qdisc core
Add json output to qdisc core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 81051c60c2 tc: remove action cookie len from printout
Make the output same as input and avoid printout of unnecessary len.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jiri Pirko abff45b802 tc: move action cookie print out of the stats if
Cookie print was made dependent on show_stats for no good reason. Fix
this bu pushing cookie print ot of the stats if.

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jakub Kicinski 4f2eb14f71 iplink: communicate ifindex for xdp offload
When xdpoffload option is used, communicate the ifindex down
to the kernel to trigger device-specific load.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:58 -08:00
Jakub Kicinski eb91c55731 f_bpf: communicate ifindex for eBPF offload
Split parsing and loading of the eBPF program and if skip_sw is set
load the program for ifindex, to which the qdisc is attached.

Note that the ifindex will be ignored for programs which are already
loaded (e.g. when using pinned programs), but in that case we just
trust the user knows what he's doing.  Hopefully we will get extack
soon in the driver to help debugging this case.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 01ea76b1cf tc_filter: resolve device name before parsing filter
Move resolving device name into an ifindex before calling filter
specific callbacks.  This way if filters need the ifindex, they
can read it from the request.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 67c857df80 {f, m}_bpf: don't allow specifying multiple bpf programs
Both BPF filter and action will allow users to specify run
multiple times, and only the last one will be considered by
the kernel.  Explicitly refuse such command lines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 65fdae3d18 bpf: allow loading programs for a specific ifindex
For BPF offload we need to specify the ifindex when program is
loaded now.  Extend the bpf common code to accommodate that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 4a847fcb51 bpf: expose bpf_parse_common() and bpf_load_common()
Expose bpf_parse_common() and bpf_load_common() functions
for those users who may want to modify the parameters to
load after parsing is done.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 399db8392b bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
bpf_parse_common() parses and loads the program.  Rename it
accordingly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 3f0b9e620c bpf: split parse from program loading
Parsing command line is currently done together with potentially
loading a new eBPF program.  This makes it more difficult to
provide additional parameters for loading (which may come after
the eBPF program info on the command line).

Split the two (only internally for now).  Verbose parameter
has to be saved in struct bpf_cfg_in to be carried between
the stages.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 51be754690 bpf: allocate opcode table in struct bpf_cfg_in
struct bpf_cfg_in already carries a pointer to sock_filter ops.
It's currently set to a local variable in bpf_parse_opt_tbl(),
shared between parsing and loading stages.  Move the array
entirely to struct bpf_cfg_in, this will allow us to split
parsing and loading.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski f20ff2f195 bpf: keep parsed program mode in struct bpf_cfg_in
bpf_parse() will parse command line arguments to find out the
program mode.  This mode will later be needed at loading time.
Instead of keeping it locally add it to struct bpf_cfg_in,
this will allow splitting parsing and loading stages.

enum bpf_mode has to be moved to the header file, because C
doesn't allow forward declaration of enums.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 658cfebc27 bpf: pass program type in struct bpf_cfg_in
Program type is needed both for parsing and loading of
the program.  Parsing may also induce the type based on
signatures from __bpf_prog_meta.  Instead of passing
the type around keep it in struct bpf_cfg_in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Stephen Hemminger 6054c1ebf7 SPDX license identifiers
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 12:21:35 -08:00
Stephen Hemminger 859af0a5dc tc: break long lines
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:31:36 -08:00
Nishanth Devarajan 927e3cfb52 tc: B.W limits can now be specified in %.
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.

Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.

Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
2017-11-24 11:22:13 -08:00
Stephen Hemminger b317557f58 tc: replace magic constant 16 with #define
For places where tc is expecting device name use IFNAMSIZ.
For others where it is a filter name, introduce a new constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:19:18 -08:00
Stephen Hemminger b2a2d9530c update bpf header from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:23:13 -08:00
Stephen Hemminger a03c704b2f ila: fix formatting of help message
Make ip ila help look like ip route help

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:21:43 -08:00
Tom Herbert 010260a717 ila: create ila_common.h
Move common functions related to checksum, identifier and hook-type
parsing to a common include file.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 86905c8f05 ila: support for configuring identifier and hook types
Expose identifier type and hook types in ILA configuraiton
and reporting. This adds support in both ip ila ILA LWT.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 1177552398 ila: support to configure checksum neutral-map-auto
Configuration support in both ip ila and ip LWT for checksum
neutral-map-auto. This is a mode of ILA where checksum
neutral mapping is assumed for packets (there is no C-bit
in the identifier to indicate checksum neutral).

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 2a1bc2fb7c ila: added csum neutral support to ipila
Add checksum neutral to ip ila configuration. This control whether
the C-bit is interpreted as checksum neutral bit.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert d3357cfc7b ila: Fix reporting of ILA locators and locator match
Fix retrieval of locator value for RTA to get 64 bits instead of 32.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Stephen Hemminger 7b8c436c30 update headers from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:07:42 -08:00
Jakub Kicinski f6a54d72a5 bpf: initialize the verifier log
If program loading fails before verifier prints its first
message, the verifier log will not be initialized.  Always
set the first character of the log buffer to zero to make
sure we don't dump non-printable characters to the terminal.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-23 20:47:38 -08:00
Simon Ruderich de3ddbc27d man: document ip xfrm policy nosock
Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Simon Ruderich 7662e20161 man: document ip fou show
This was forgotten in cf4caf336a (2017-11-16, Add "show" subcommand to
"ip fou").

Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Simon Ruderich 2fc8883b9a man: document ip route get mark
Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Lorenzo Colitti 05b3b344b2 iproute2: fixes to compile on some systems.
1. Put the declarations of strlcpy and strlcat inside
   an #ifdef NEED_STRLCPY. Their declarations were already in a
   similar #ifdef.
2. In bpf_scm.h, include sys/un.h for struct sockaddr_un.
3. In utils.h, include time.h for struct timeval.

Tested: builds on ubuntu 14.04 with "make clean distclean; ./configure && make -j64"
Tested: 4.14.1 builds on Android with Android-specific #ifndefs for missing library code
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2017-11-20 10:38:58 -08:00
Amritha Nambiar 2e67b57a43 man: tc-flower: add explanation for hw_tc option
Add details explaining the hw_tc option.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-18 15:59:00 -08:00
Amritha Nambiar f63783c7bf man: tc-mqprio: add documentation for new offload options
This patch adds documentation for additional offload modes and
associated parameters in tc-mqprio.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-18 15:59:00 -08:00
Greg Greenway cf4caf336a Add "show" subcommand to "ip fou"
Sample output:

$ sudo ./ip/ip fou add port 111 ipproto 11
$ sudo ./ip/ip fou add port 222 ipproto 22 -6
$ ./ip/ip fou show
port 222 ipproto 22 -6
port 111 ipproto 11

Signed-off-by: Greg Greenway <ggreenway@apple.com>
2017-11-16 17:05:07 -08:00
Phil Sutter 66942e522e tc_util: Silence spurious compiler warning
GCC version 7.2.1 complains that 'result1' may be used uninitialized in
parse_action_control_slash_spaces(). This should not be possible in
practice, so the actual value 'result1' is initialized with does not
matter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Phil Sutter b7c61286de tc_util: Drop needless pointer check
The function parse_action_control_slash() returns early if 'p' is NULL,
so after the first call to action_a2n(), 'p' is guaranteed not to be
NULL. Otherwise, the assignment '*p = 0' above would dereference the
NULL pointer already anyway, so just drop this check here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Jon Maloy aab3661bd2 tipc: change family attribute from u32 to u16
commit 28033ae4e0f ("net: netlink: Update attr validation to require
exact length for some types") introduces a stricter control on attributes
of type NLA_U* and NLA_S*.

Since the tipc tool is sending a family attribute of u32 instead of as
expected u16 the tool is now effectively broken.

We fix this by changing the type of the said attribute.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
2017-11-16 15:58:48 -08:00
Stephen Hemminger a60742aaf4 Merge branch 'master' into net-next 2017-11-13 10:35:17 -08:00
Stephen Hemminger 212b52299e v4.14.1 2017-11-13 10:09:57 -08:00
Stephen Hemminger b867d46daf utils: remove duplicate include of ctype.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-13 10:08:54 -08:00
Leon Romanovsky aba736dc25 ip: Fix compilation break on old systems
As was reported [1], the iproute2 fails to compile on old systems,
in Cong's case, it was Fedora 19, in our case it was RedHat 7.2, which
failed with the following errors during compilation:

ipxfrm.c: In function ‘xfrm_selector_print’:
ipxfrm.c:479:7: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
  case IPPROTO_MH:
       ^
ipxfrm.c:479:7: note: each undeclared identifier is reported only once
for each function it appears in
ipxfrm.c: In function ‘xfrm_selector_upspec_parse’:
ipxfrm.c:1345:8: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
   case IPPROTO_MH:
        ^                                                                                                                                                            make[1]: *** [ipxfrm.o] Error 1

The reason to it is the order of headers files. The IPPROTO_MH field is
set in kernel's UAPI header file (in6.h), but only in case
__UAPI_DEF_IPPROTO_V6 is set before. That define comes from other kernel's
header file (libc-compat.h) and is set in case there are no previous
libc relevant declarations.

In ip code, the include of <netdb.h> causes to indirect inclusion of
<netinet/in.h> and it sets __UAPI_DEF_IPPROTO_V6 to be zero and prevents from
IPPROTO_MH declaration.

This patch takes the simplest possible approach to fix the compilation
error by checking if IPPROTO_MH was defined before and in case it
wasn't, it defines it to be the same as in the kernel.

[1] https://www.spinics.net/lists/netdev/msg463980.html

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Riad Abo Raed <riada@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-11-13 10:07:25 -08:00
Stephen Hemminger 9edf7016e8 Merge branch 'master' into net-next 2017-11-12 16:30:14 -08:00
Stephen Hemminger 7d14d00795 v4.14.0 2017-11-12 16:29:43 -08:00
Stephen Hemminger 913352fe54 drop unneeded include of syslog.h
Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 16:22:36 -08:00
Stephen Hemminger d72ac5a17b Merge branch 'master' into net-next 2017-11-12 16:17:37 -08:00
Ivan Vecera 3e897912cb devlink: add batch command support
The patch adds support to batch devlink commands.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-11-12 16:15:23 -08:00
Ivan Vecera 6648853975 lib: make resolve_hosts variable common
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2017-11-12 16:15:23 -08:00
Stephen Hemminger cd458a7764 update kernel headers from 4.14 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 15:58:11 -08:00
Roman Mashak 274b63ae21 tc: distinguish Add/Replace qdisc operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-12 15:57:08 -08:00
Stephen Hemminger 840d95d348 update kernel headers
To 4.14 final kernel version
Note: SPDX tag added by upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 15:55:49 -08:00
Jesus Sanchez-Palencia 1915af404f man: Clarify idleslope calculation for tc-cbs
In order to calculate the idleSlope parameter of CBS correctly, users
must take into account the entire packet size, including the overhead
from all layers.

Add some more details to the man page to clarify that, giving one
simple example and pointing users to the correct 802.1Q section for
further clarifications if needed.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
2017-11-12 15:51:23 -08:00
William Tu 8595cc40e9 ip6_gre: add support for ERSPAN tunnel
The patch adds ERSPAN type II tunnel support for IPv6.

Signed-off-by: William Tu <u9012063@gmail.com>
2017-11-09 09:53:34 +09:00
David Ahern 844c37b423 libnetlink: Handle extack messages for non-error case
Kernel can now return non-fatal error messages in extack facility.
Update iproute2 to dump to use if present.
- rename nl_dump_ext_err to nl_dump_ext_ack
- rename errmsg to msg
- add call to nl_dump_ext_ack in rtnl_dump_done and __rtnl_talk for
  non-error path

Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2017-11-09 09:46:50 +09:00
Stephen Hemminger b158c1790f Merge branch 'master' into net-next 2017-11-09 09:45:17 +09:00
Stephen Hemminger e4beb52787 netem: use fixed rather than floating point for scaling
Don't need to do floating point math to compute scaled random.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:15:34 +09:00
Thomas Egerer 0c7d651b38 xfrm_{state, policy}: Allow to deleteall polices/states with marks
Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Thomas Egerer 5474d440b8 xfrm_policy: Do not attempt to deleteall a socket policy
Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Thomas Egerer 20e4840a0a xfrm_policy: Add filter option for socket policies
Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Amritha Nambiar 0d575c4dac flower: Represent HW traffic classes as classid values
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.

HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.

Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1/32\
  ip_proto udp dst_port 12000 skip_sw\
  hw_tc 1

# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  skip_sw
  in_hw

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-07 11:04:54 +09:00
Stephen Hemminger ba914908eb Update kernel headers with new SPDK identifier
The kernel header sanitizisation process now puts SPDK GPLv2
license comment on files.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:02:41 +09:00
Stephen Hemminger 665ef5a5c0 Update kernel headers from 4.14-rc8 nete-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:02:08 +09:00
Roopa Prabhu 86d0988b16 bridge: fdb: print NDA_SRC_VNI if available
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-11-01 22:31:50 +01:00
Vinicius Costa Gomes d652988920 man: Add initial manpage for tc-cbs(8)
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Vinicius Costa Gomes c9681ac1b3 tc: Add support for the CBS qdisc
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
   		hicredit <HICREDIT> sendslope <SENDSLOPE>
		idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Amritha Nambiar e1ac5b06f2 tc/mqprio: Offload mode and shaper options in mqprio
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.

Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.

# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
  min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             mode:channel
             shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit

v2: Avoid buffer overrun and minor cleanup.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-01 22:20:06 +01:00
Mahesh Bandewar 1ef5c95201 ip/ipvlan: enhance ability to add mode flags to existing modes
IPvlan supported bridge-only functionality prior to commits
a190d04db937 ('ipvlan: introduce 'private' attribute for all
existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode').
These two commits allow to configure the VEPA and private modes now.
This patch adds those options in ip command.

e.g.
  bash:~# ip link add link eth0 name ipvl0 type ipvlan mode l2 private
  -or-
  bash:~# ip link add link eth0 type ipvl0 type ipvlan mode l2 vepa

Also the output will reflect the mode and the mode-flag accordingly.
e.g.
  bash:~# ip -details link show ipvl0
  4: ipvl0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc ...
     link/ether 00:1a:11:44:a5:3e brd ff:ff:ff:ff:ff:ff promiscuity 0
     ipvlan  mode l2 private addrgenmode eui64 numtxqueues 1 ...

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
2017-11-01 22:17:01 +01:00
Stephen Hemminger fe388b9e0c update kernel headers from 4.14-rc7 net-next 2017-11-01 22:15:50 +01:00
Stephen Hemminger 5ee63855dc Merge branch 'master' into net-next 2017-11-01 22:15:00 +01:00
Stefano Brivio 4357f5c31a ss: Fix width calculations when Netid or State columns are missing
If Netid or State columns are missing, we must not subtract one
for each of these two columns from the remaining screen width,
while distributing available space to columns. This one
character corresponding to one delimiting space has to be
subtracted only if the columns are actually printed.

Further, in the existing implementation, if the screen width is
an odd number, one additional character is added to the width of
one of the two columns.

But if both are not printed, this filling character needs to be
added somewhere else, in order to have the right spacing
allowing us to fill lines completely.

Address and port fields are printed in pairs (local and remote),
so we can't distribute the space to any of them, because it
would be doubled. Instead, print this additional space to the
right of the Send-Q column, to keep code changes to a minimum.

This is particularly visible with 'ss -f netlink -Z'. Before
this patch, with an 80 column terminal, we have:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

and with an 81 column terminal:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

After this patch, in both cases, the output is:
$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0             rtnl:evolution-calen/2049            *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0             rtnl:clock-applet/1944               *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Stefano Brivio 22658ff53a ss: Streamline process context printing in netlink_show_one()
There's no need to check 'pid_context' before calling free().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Stefano Brivio 38509fa903 ss: Remove useless width specifier in process context print
Both local address and service, and remote address and service
fields are already printed out in netlink_show_one() before we
start printing process context, by calling sock_addr_print()
twice.

At this point, sock_addr_print() has already forced the remote
service field to be 'serv_width' wide -- that is, 'serv_width'
width has already been consumed, before we print process
context.

Hence, it makes no sense to force the display width of process
context to be 'serv_width' wide again: previous prints have
filled up the line already. Remove the width specifier and
prefix with a space instead, to keep this consistent with fields
which are displayed after the first output line.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Christoph Paasch e54ed38074 ip: add fastopen_no_cookie option to ip route
This patch adds fastopen_no_cookie option to enable/disable TCP fastopen
without a cookie on a per-route basis.

Support in Linux was added with 71c02379c762 (tcp: Configure TFO without
cookie per socket and/or per route).

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
2017-11-01 22:07:51 +01:00
Roman Mashak acbe9118ce ip netns: use strtol() instead of atoi()
Use strtol-based API to parse and validate integer input; atoi() does
not detect errors and may yield undefined behaviour if result can't be
represented.

v2: use get_unsigned() since network namespace is really an unsigned value.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-01 22:06:05 +01:00
Shmulik Ladkani 21440d19d9 ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
IP6_TNL_F_ALLOW_LOCAL_REMOTE allows tunnel traffic on ip6tnl devices
where the remote endpoint is a local host address.

Specifying "[no]allow-localremote" controls the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on ip6tnl interfaces.

This is the user-space counterpart for kernel
commit 908d140a87a7 ("ip6_tunnel: Allow rcv/xmit even if remote address is a local address")

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2017-10-31 18:15:30 +01:00
Roopa Prabhu 8652eeb3ab bridge: vlan: support for per vlan tunnel info
This patch uses kernel bridge vlan attribute
IFLA_BRIDGE_VLAN_TUNNEL_INFO to set/delete/show per vlan tunnel info.

$bridge vlan add dev vxlan0 vid 2000 tunnel_info id 2000
$bridge vlan add dev vxlan0 vid 1000-1001 tunnel_info id 2000-2001

$bridge vlan tunnelshow
port    vlan ids        tunnel id
vxlan0   1000-1001       1000-1001
         2000            2000

$bridge  -j vlan tunnelshow
{
    "dummy0": [],
    "dummy1": [],
    "bridge": [],
    "vxlan0": [{
            "vlan": 1000,
            "vlanEnd": 1001,
            "tunid": 1000,
            "tunidEnd": 1001
        },{
            "vlan": 2000,
            "tunid": 2000
        }
    ]
}

This patch also fixes a json termination bug in print_vlan
when filter vlan is provided by the user.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-31 18:04:30 +01:00
Roopa Prabhu 8cfde5c97f iplink: bridge: support bridge port vlan_tunnel attribute
This config maps to IFLA_BRPORT_VLAN_TUNNEL bridge port netlink
flag attribute. This flag enables vlan to tunnel mapping on a bridge
port. It is off by default.

set vlan_tunnel attribute on bridge port vxlan0:

$ip link set dev vxlan0 type bridge_slave vlan_tunnel on
$ip link set dev vxlan0 type bridge_slave vlan_tunnel off

or via bridge command

$bridge link set dev vxlan0 vlan_tunnel on
$bridge link set dev vxlan0 vlan_tunnel off

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-31 18:04:30 +01:00
Stephen Hemminger 0ac0017a1a Update kernel headers from net-next (4.14-rc6)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-31 18:04:13 +01:00
Stephen Hemminger c1606c44b3 Merge branch 'master' into net-next 2017-10-31 18:03:12 +01:00
Stephen Hemminger e348889289 Update kernel headers based on 4.14-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-31 18:01:51 +01:00
Alexander Aring 25a24934ab tc: m_ife: fix match tcindex parsing
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-10-31 17:56:58 +01:00
Roman Mashak 103bc5f11d ip: added missing newline in man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-31 17:24:45 +01:00
Stephen Hemminger 106753c937 Merge branch 'master' into net-next 2017-10-27 09:27:43 +02:00
Stephen Hemminger bcddcddd29 bridge: checkpatch related cleanups
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 09:15:23 +02:00
Stephen Hemminger 21fef525fa iproute: source code cleanup
Break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:52:48 +02:00
Stephen Hemminger 1d2cfcf8b5 update kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:31:26 +02:00
Stephen Hemminger 7fde8cfddc include: add TCP fastopen option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:30:48 +02:00
Stephen Hemminger fa19d6bc01 bpf: update header file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:28:36 +02:00
Roman Mashak fab9a18a2e bridge: request vlans along with link information
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Roman Mashak 52fd1fe36c bridge: dump vlan table information for link
Kernel also reports vlans a port is member of, so print it. Since vlan
table can be quite large, dump it only when detailed information is
requested.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Roman Mashak b97c679c9f bridge: isolate vlans parsing code in a separate API
IFLA_BRIDGE_VLAN_INFO parsing logic will be used in link and vlan
processing code, so it makes sense to move it in the separate function.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Hangbin Liu 86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Hangbin Liu 2d34851cd3 lib/libnetlink: re malloc buff if size is not enough
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.

We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.

Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
yupeng 5a9bca7145 man: add additional explainations for ss
Add detail explains of -m, -o, -e and -i options, which are not documented anywhere

Signed-off-by: yupeng <yupeng0921@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-10-26 12:25:42 +02:00
Stephen Hemminger 66e40a4a86 update headers for TC and TIPC from net-next 2017-10-25 12:40:47 +02:00
Stephen Hemminger 2ac0c6c2c1 Merge branch 'master' into net-next 2017-10-25 12:39:18 +02:00
Jamal Hadi Salim 35f2a7639d tc/actions: introduce support for jump action
Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
	action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
	ref 2 bind 1 installed 198 sec used 2 sec
	Action statistics:
	Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
	backlog 0b 0p requeues 0

	action order 2:  skbedit mark 11 pass
	 index 11 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 3:  skbedit mark 12 pass
	 index 12 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-25 12:33:46 +02:00
Nikolay Aleksandrov a5e3f41b4d ip: bridge_slave: add neigh_suppress to the type help and
Add neigh_suppress to the type help and document it in ip-link's man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-23 14:46:24 +02:00
Stephen Hemminger 702631416e Merge branch 'master' into net-next 2017-10-23 14:44:55 +02:00
Roman Mashak c4be5febaa ss: initialize 'fackets' member of tcpstat structure
'fackets' has never been initialized with kernel extracted information, thus
never really printed.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-23 14:43:11 +02:00
Michal Kubecek 21503ed2af ip maddr: fix filtering by device
Commit 530903dd90 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison.  This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.

Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.

Fixes: 530903dd90 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
2017-10-21 15:02:24 +02:00
Phil Sutter 572e893613 ss: Detect IPPROTO_ICMPV6 sockets
Prefix IPPROTO_ICMPV6 sockets with 'icmp6' instead of '???'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 15:00:16 +02:00
Phil Sutter 1267c0b924 ss: Distinguish between IPv4 and IPv6 wildcard sockets
Commit aba9c23a6e ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:

| *:80    << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80  << IPV4_ONLY

Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.

Cc: Humberto Alves <hjalves@live.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 14:59:29 +02:00
Stephen Hemminger 4b4dde0ae6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2017-10-18 17:11:50 -07:00
Nikolay Aleksandrov fdbdd356f0 ip: bridge_slave: add support for per-port group_fwd_mask
This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-10-16 09:26:05 -07:00
Stephen Hemminger 75209f840b Merge branch 'master' into net-next 2017-10-16 09:25:56 -07:00
Petr Vorel 4b73d52f8a color: Rename enum
COLOR_NONE is more descriptive than COLOR_CLEAR.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel 99b89c518e color: Cleanup code to remove "magic" offset + 7
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel 24b058a2a4 color: Fix another ip segfault when using --color switch
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.

Reproduce with:
$ ip -c a

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel e6849a5722 color: Fix ip segfault when using --color switch
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.

Reproduce with:
$ COLORFGBG='0;15' ip -c a

NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel f1241a7e3b tests: Revert back /bin/sh in shebang
This was added by mistake in commit ecd44e68
("tests: Remove bashisms (s/source/.)")

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:22:01 -07:00
Stephen Hemminger 4c6080b5c4 Merge branch 'master' into net-next 2017-10-12 09:06:10 -07:00
Stephen Hemminger 268a9eee98 netem: fix code indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 18:08:15 -07:00
Stephen Hemminger 4999c57733 Merge branch 'master' into net-next 2017-10-11 11:07:20 -07:00
Ivan Delalande da9cc6ab90 ss: print MD5 signature keys configured on TCP sockets
These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:

	md5keys:fe80::/64=signing_key,10.1.2.0/24=foobar,::1/128=Test

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Ivan Delalande 7c72df5a95 utils: add print_escape_buf to format and print arbitrary bytes
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Baruch Siach 4f6b73380d lib: fix multiple strlcpy definition
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.

This fixes the following static link error with uClibc-ng:

.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2017-10-11 11:02:13 -07:00
Petr Vorel ecd44e6805 tests: Remove bashisms (s/source/.)
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-11 10:59:50 -07:00
Roopa Prabhu 41973a47dd iplink: new option to set neigh suppression on a bridge port
neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-11 10:56:36 -07:00
Yotam Gigi 2055bf15f1 ip: mroute: Print offload indication
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.

Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1)      Iif: sw1p7  Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1)  Iif: sw1p7  Oifs: sw1p4 State: resolved offload

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-10-11 10:54:27 -07:00
Stefan Hajnoczi c759116a0b ss: add AF_VSOCK support
The AF_VSOCK address family is a host<->guest communications channel
supported by VMware, KVM, and Hyper-V.  Initial VMware support was
released in Linux 3.9 in 2013 and transports for other hypervisors were
added later.

AF_VSOCK addresses are <u32 cid, u32 port> tuples.  The 32-bit cid
integer is comparable to an IP address.  AF_VSOCK ports work like
TCP/UDP ports.

Both SOCK_STREAM and SOCK_DGRAM socket types are available.

This patch adds AF_VSOCK support to ss(8) so that sockets can be
observed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:51:03 -07:00
Stefan Hajnoczi b338a3e7e7 ss: allow AF_FAMILY constants >32
Linux has more than 32 address families defined in <bits/socket.h>.  Use
a 64-bit type so all of them can be represented in the filter->families
bitmask.

It's easy to introduce bugs when using (1 << AF_FAMILY) because the
value is 32-bit.  This can produce incorrect results from bitmask
operations so introduce the FAMILY_MASK() macro to eliminate these bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:50:20 -07:00
Stephen Hemminger e9b0d82dfa uapi: add include linux/vm_sockets_diag.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:49:25 -07:00
Stephen Hemminger 07682b88d8 Merge branch 'master' into net-next 2017-10-11 10:47:55 -07:00
Stephen Hemminger 237a52731b rdma: move headers to uapi
And update with version from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:47:28 -07:00
Stephen Hemminger f53da99ad7 update uapi headers from 4.14-rc4 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:43:38 -07:00
Stephen Hemminger 92503441cc Merge branch 'master' into net-next 2017-10-11 10:43:13 -07:00
Lorenzo Colitti 596b1c94aa iproute: build more easily on Android
iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.

For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.

Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.

Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc.  However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.

This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.

Tested: compiles on ubuntu trusty (glibc)

Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2017-10-11 10:35:45 -07:00
Stephen Hemminger b0af8fc1aa tipc: don't need custom CFLAGS
Since libmnl CFLAGS are now handled by config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:35:38 -07:00
Stephen Hemminger 60509b997d Merge branch 'master' into net-next 2017-10-02 08:04:13 -07:00
Stephen Hemminger 1db903def7 update headers from net-next rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-02 08:03:45 -07:00
Phil Sutter 625df645b7 Check user supplied interface name lengths
The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter ee474849c8 tc: flower: No need to cache indev arg
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter 26111ab1db ip{6, }tunnel: Avoid copying user-supplied interface name around
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Michal Kubecek 4c0939a29e ip xfrm: use correct key length for netlink message
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.

We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-10-01 13:44:38 -07:00
Yulia Kartseva 73451259da tc: fix ipv6 filter selector attribute for some prefix lengths
Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
2017-10-01 13:41:29 -07:00
Stephen Hemminger f412357017 Merge branch 'master' into net-next 2017-09-29 12:03:16 -07:00
Phil Sutter e4139268ba ip-route: Fix for listing routes with RTAX_LOCK attribute
This fixes a corner-case for routes with a certain metric locked to
zero:

| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0

Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.

Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-29 12:02:09 -07:00
Stephen Hemminger ee7bfb52a7 Merge branch 'master' into net-next 2017-09-29 10:51:25 -07:00
Stephen Hemminger b2fd7a0e6e doc: drop old ip command documentation
The old IP cross reference manual was very out of date, barely updated
since 1999.  The correct documentation is in the man pages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:51:02 -07:00
Julien Fortin 429f314ef7 lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.

The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

How to reproduce:

$ ip -t mon label link
(gdb) bt
.#0  _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1  0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2  0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
    fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
    at libnetlink.c:761
.#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-29 10:10:47 -07:00
Stephen Hemminger a4cda980bb doc: remove outdated IPv6 flow label document
Not updated since Linux 2.2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:06:50 -07:00
Stephen Hemminger bbf2a3634e doc: remove outdated tc-filters documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:05:25 -07:00
Stephen Hemminger fd1aa86741 ignore generated Config file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:02:45 -07:00
Stephen Hemminger 3e83c095e8 doc: remove outdated nstat/rtstat documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:01:15 -07:00
Stephen Hemminger 760e9830fc doc: remove outdated arpd documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:00:12 -07:00
Stephen Hemminger d77ce080d3 doc: remove outdated ss documentation
The current version is well documented on man page.
The latex documentation is very old and was never upated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 09:58:39 -07:00
Stephen Hemminger 1298403e26 doc: remove obsolete ip-tunnels documentation
This file has not been updated since conversion to git
and is really old and outdated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 09:58:02 -07:00
Julien Fortin 70556c1632 lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.

The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

How to reproduce:

$ ip -t mon label link
(gdb) bt
.#0  _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1  0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2  0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
    fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
    at libnetlink.c:761
.#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-27 09:21:54 +01:00
Stephen Hemminger b7a38c397d Merge branch 'master' into net-next 2017-09-22 10:10:01 -07:00
Thomas Haller 01777e055d man: fix documentation for range of route table ID
Signed-off-by: Thomas Haller <thaller@redhat.com>
2017-09-22 10:09:04 -07:00
Daniel Borkmann bc2d4d838f bpf: properly output json for xdp
After merging net-next branch into master, Stephen asked
to fix up json dump for XDP. Thus, rework the json dump a
bit, such that 'ip -json l' looks as below.

  [{
        "ifindex": 1,
        "ifname": "lo",
        "flags": ["LOOPBACK","UP","LOWER_UP"],
        "mtu": 65536,
        "xdp": {
            "mode": 2,
            "prog": {
                "id": 5,
                "tag": "e1e9d0ec0f55d638",
                "jited": 1
            }
        },
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "loopback",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00"
    },[...]
  ]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-22 10:07:15 -07:00
Daniel Borkmann 0b4b35e1e8 json: move json printer to common library
Move the json printer which is based on json writer into the
iproute2 library, so it can be used by library code and tools
other than ip. Should probably have been done from the beginning
like that given json writer is in the library already anyway.
No functional changes.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-22 10:06:43 -07:00
Stephen Hemminger 58677cc2d3 tc: flower remove unused variable
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:08:43 -07:00
Benjamin LaHaise 7638ee13c1 tc: flower: support for matching MPLS labels
This patch adds support to the iproute2 tc filter command for matching MPLS
labels in the flower classifier.  The ability to match the Time To Live,
Bottom Of Stack, Traffic Control and Label fields are added as options to
the flower filter.

e.g.:
  tc filter add dev eth0 protocol 0x8847 parent ffff: \
    flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
    action drop

Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2017-09-20 18:07:21 -07:00
Julien Fortin 6335c5ff67 ip: ipaddress: fix missing space after prefixlen
Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-20 18:05:03 -07:00
Davide Caratti bc6ba66047 tc: fix typo in tc-tcindex man page
fix mis-typed 'pass_on' keyword.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-09-20 18:01:02 -07:00
Stephen Hemminger 44cf841560 BPF: update headers from 4.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:00:55 -07:00
Eric Dumazet ff28b7519d tc: fq: support low_rate_threshold attribute
TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9

Tested:

lpaa5:/tmp# tc -qd add dev eth1 root fq
lpaa5:/tmp# tc -s qd sh dev eth1
qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
 orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
 initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  116 flows (114 inactive, 0 throttled)
  1 gc, 0 highprio, 0 throttled

lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
7081

lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
858

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
2017-09-12 21:33:31 -07:00
Phil Sutter 1cfcf62c68 ipaddress: Fix segfault in 'addr showdump'
Obviously, 'addr showdump' feature wasn't adjusted to json output
support. As a consequence, calls to print_string() in print_addrinfo()
tried to dereference a NULL FILE pointer.

Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-12 21:27:36 -07:00
Arkadi Sharshevsky b2947f8b2c devlink: Add support for protocol IPv4/IPv6/Ethernet special formats
Add support for protocol IPv4/IPv6/Ethernet special formats.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Arkadi Sharshevsky 31639589f3 devlink: Add support for special format protocol headers
In case of global header (protocol header), the header:field ids are used
to perform lookup for special format printer. In case no printer existence
fallback to plain value printing.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Arkadi Sharshevsky 92b2a5bb76 devlink: Make match/action parsing more flexible
This patch decouples the match/action parsing from printing. This is
done as a preparation for adding the ability to print global header
values, for example print IPv4 address, which require special formatting.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Phil Sutter 50ea3c6438 utils: strlcpy() and strlcat() don't clobber dst
As David Laight correctly pointed out, the first version of strlcpy()
modified dst buffer behind the string copied into it. Fix this by
writing NUL to the byte immediately following src string instead of to
the last byte in dst. Doing so also allows to reduce overhead by using
memcpy().

Improve strlcat() by avoiding the call to strlcpy() if dst string is
already full, not just as sanity check.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-07 15:06:47 -07:00
Stephen Hemminger 01e5409371 Merge branch 'net-next' 2017-09-05 09:48:36 -07:00
Stephen Hemminger 39740278a8 v4.13.0 2017-09-05 09:39:32 -07:00
Stephen Hemminger 4a5b3035de update headers from 4.14 merge
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-05 09:38:31 -07:00
Stephen Hemminger a17a01145f Merge branch 'master' into net-next 2017-09-05 09:33:29 -07:00
Daniel Borkmann a0b5b7cf5c bpf: consolidate dumps to use bpf_dump_prog_info
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.

Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Daniel Borkmann 1b736dc469 bpf: minor cleanups for bpf_trace_pipe
Just minor nits, e.g. no need to fflush() and instead of returning
right away, just break and close the fd.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Simon Horman b75e0f6f4b tc actions: store and dump correct length of user cookies
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel

As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.

Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.

Before:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 RTNETLINK answers: Invalid argument

After:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 # tc filter show dev eth0 ingress
   eth_type ipv4
   ip_proto udp
   not_in_hw
	 action order 1: gact action drop
	  random type none pass val 0
	  index 1 ref 1 bind 1 installed 1 sec used 1 sec
	 Action statistics:
	 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	 backlog 0b 0p requeues 0
	 cookie len 16 0123456789abcdef0123456789abcdef

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-09-05 09:25:46 -07:00
Phil Sutter 7c87c7fed1 lib/bpf: Fix bytecode-file parsing
The signedness of char type is implementation dependent, and there are
architectures on which it is unsigned by default. In that case, the
check whether fgetc() returned EOF failed because the return value was
assigned an (unsigned) char variable prior to comparison with EOF (which
is defined to -1). Fix this by using int as type for 'c' variable, which
also matches the declaration of fgetc().

While being at it, fix the parser logic to correctly handle multiple
empty lines and consecutive whitespace and tab characters to further
improve the parser's robustness. Note that this will still detect double
separator characters, so doesn't soften up the parser too much.

Fixes: 3da3ebfca8 ("bpf: Make bytecode-file reading a little more robust")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-04 12:06:50 -07:00
Stephen Hemminger 731e28cc28 Merge branch 'master' into net-next 2017-09-01 14:15:31 -07:00
Michal Kubecek 460c03f3f3 iplink: double the buffer size also in iplink_get()
Commit 72b365e8e0 ("libnetlink: Double the dump buffer size") increased
the buffer size for "ip link show" command to 32 KB to handle NICs with
large number of VFs. With "dev" filter, a different code path is taken and
iplink_get() still uses only 16 KB buffer.

The size of 32768 is not very future-proof as NICs supporting 120-128 VFs
are already in use so that single RTM_NEWLINK message in the dump can
exceed 30000 bytes. But it's what rtnl_talk() and rtnl_dump_filter_l() use
so let's be consistent. Once this proves insufficient, all three sizes
should be increased.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-09-01 14:15:00 -07:00
Michal Kubecek 6599162b95 iplink: check for message truncation in iplink_get()
If message length exceeds maxlen argument of rtnl_talk(), it is truncated
to maxlen but unlike in the case of truncation to the length of local
buffer in rtnl_talk(), the caller doesn't get any indication of a problem.

In particular, iplink_get() passes the truncated message on and parsing it
results in various warnings and sometimes even a segfault (observed with
"ip link show dev ..." for a NIC with 125 VFs).

Handle message truncation in iplink_get() the same way as truncation in
rtnl_talk() would be handled: return an error.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-09-01 14:15:00 -07:00
Stephen Hemminger 2e706e12d9 Merge branch 'master' into net-next
Needed to add JSON support to tclass.
2017-09-01 12:17:48 -07:00
Phil Sutter bc4a57b879 lnstat_util: Make sure buffer is NUL-terminated
Can't use strlcpy() here since lnstat is not linked against libutil.

While being at it, fix coding style in that chunk as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 9376314b49 tc_util: No need to terminate an snprintf'ed buffer
snprintf() won't leave the buffer unterminated, so manually terminating
is not necessary here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 44cc6c792a ipxfrm: Replace STRBUF_CAT macro with strlcat()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 532b8874fe Convert harmful calls to strncpy() to strlcpy()
This patch converts spots where manual buffer termination was missing to
strlcpy() since that does what is needed.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 18f156bfec Convert the obvious cases to strlcpy()
This converts the typical idiom of manually terminating the buffer after
a call to strncpy().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 8d15e012a3 utils: Implement strlcpy() and strlcat()
By making use of strncpy(), both implementations are really simple so
there is no need to add libbsd as additional dependency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 50f81afd4d link_gre6: Print the tunnel's tclass setting
Print the value analogous to flowlabel. While being at it, also break
the overlong lines to not exceed 80 characters boundary.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:09:42 -07:00
Phil Sutter e7fefb3214 link_gre6: Fix for changing tclass/flowlabel
When trying to change tclass or flowlabel of a GREv6 tunnel which has
the respective value set already, the code accidentally bitwise OR'ed
the old and the new value, leading to unexpected results. Fix this by
clearing the relevant bits of flowinfo variable prior to assigning the
new value.

Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:09:42 -07:00
David Lebrun 9d563d52f6 man: add documentation for seg6 l2encap mode
This patch adds documentation for the seg6 L2ENCAP encapsulation mode.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-30 08:29:36 -07:00
David Lebrun cf87da417b iproute: add support for seg6 l2encap mode
This patch adds support for the L2ENCAP seg6 mode, enabling to encapsulate
L2 frames within SRv6 packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-30 08:29:36 -07:00
Alexander Aring 3ee52855a0 man: tc-ife: add default type note
This patch updates the tc-ife man page that the default IFE ethertype
will be used if it's not specified.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring 38060de1eb tc: m_ife: report about kernels default type
This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring 664f35aa7c tc: m_ife: print IEEE ethertype format
This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring bf338b60d4 tc: m_ife: allow ife type to zero
This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Stephen Hemminger 8707fd8c93 update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-30 08:26:43 -07:00
Stephen Hemminger c5e2692b66 Merge branch 'master' into net-next 2017-08-30 08:24:57 -07:00
Phil Sutter 6c6bbc30f4 ss: Fix for added diag support check
Commit 9f66764e30 ("libnetlink: Add test for error code returned from
netlink reply") changed rtnl_dump_filter_l() to return an error in case
NLMSG_DONE would contain one, even if it was ENOENT.

This in turn breaks ss when it tries to dump DCCP sockets on a system
without support for it: The function tcp_show(), which is shared between
TCP and DCCP, will start parsing /proc since inet_show_netlink() returns
an error - yet it parses /proc/net/tcp which doesn't make sense for DCCP
sockets at all.

On my system, a call to 'ss' without further arguments prints the list
of connected TCP sockets twice.

Fix this by introducing a dedicated function dccp_show() which does not
have a fallback to /proc, just like sctp_show(). And since tcp_show()
is no longer "multi-purpose", drop it's socktype parameter.

Fixes: 9f66764e30 ("libnetlink: Add test for error code returned from netlink reply")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-30 08:18:13 -07:00
Stephen Hemminger b43b5b9acc devlink: header update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:31:57 -07:00
Stephen Hemminger f474588028 Merge branch 'master' into net-next 2017-08-24 15:30:32 -07:00
Stephen Hemminger c4fc474b88 tc: use named initializer for default mqprio options
Use C99 initializer

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:28:15 -07:00
Phil Sutter 893deac4c4 lib/libnetlink: Don't pass NULL parameter to memcpy()
Both addattr_l() and rta_addattr_l() may be called with NULL data
pointer and 0 alen parameters. Avoid calling memcpy() in that case.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter ac3415f5c1 lib/fs: Fix and simplify make_path()
Calling stat() before mkdir() is racey: The entry might change in
between. Also, the call to stat() seems to exist only to check if the
directory exists already. So simply call mkdir() unconditionally and
catch only errors other than EEXIST.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter b5c78e1b2c lib/bpf: Check return value of write()
This is merely to silence the compiler warning. If write to stderr
failed, assume that printing an error message will fail as well so don't
even try.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 92963d136d netem/maketable: Check return value of fscanf()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 0aa03350c0 ss: Make sure scanned index value to unix_state_map is sane
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 4cbf5224f2 ss: Make struct tcpstat fields 'timer' and 'timeout' unsigned
Both 'timer' and 'timeout' variables of struct tcpstat are either
scanned as unsigned values from /proc/net/tcp{,6} or copied from
'idiag_timer' and 'idiag_expries' fields of struct inet_diag_msg, which
itself are unsigned. Therefore they may be unsigned as well, which
eliminates the need to check for negative values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:09 -07:00
Stephen Hemminger 0b5eadc54f bpf: drop unused parameter to bpf_report_map_in_map
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:58 -07:00
Stephen Hemminger 0efa625765 libnetlink: drop unused parameter to rtnl_dump_done
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:48 -07:00
Stephen Hemminger 8f478ec2b3 rdma: fix duplicate initialization in port_names
Build with warnings enable spotted this.
link.c:51:58: note: (near initialization for ‘rdma_port_names[23]’)
   rdma_port_names[] = { RDMA_PORT_FLAGS(RDMA_BITMAP_NAMES) };

Assume that fields were in order and 25 is the missing value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:16 -07:00
Phil Sutter 4b9e917822 lib/ll_map: Choose size of new cache items at run-time
Instead of having a fixed buffer of 16 bytes for the interface name,
tailor size of new ll_cache entry using the interface name's actual
length. This also makes sure the following call to strcpy() is safe.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 56270e5466 tc/m_xt: Fix for potential string buffer overflows
- Use strncpy() when writing to target->t->u.user.name and make sure the
  final byte remains untouched (xtables_calloc() set it to zero).
- 'tname' length sanitization was completely wrong: If it's length
  exceeded the 16 bytes available in 'k', passing a length value of 16
  to strncpy() would overwrite the previously NULL'ed 'k[15]'. Also, the
  sanitization has to happen if 'tname' is exactly 16 bytes long as
  well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter bc27878d21 lnstat_util: Simplify alloc_and_open() a bit
Relying upon callers and using unsafe strcpy() is probably not the best
idea. Aside from that, using snprintf() allows to format the string for
lf->path in one go.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter cfda500a7d lib/inet_proto: Review inet_proto_{a2n,n2a}()
The original intent was to make sure strings written by those functions
are NUL-terminated at all times, though it was suggested to get rid of
the 15 char protocol name limit as well which this patch accomplishes.

In addition to that, simplify inet_proto_a2n() a bit: Use the error
checking in get_u8() to find out whether passed 'buf' contains a valid
decimal number instead of checking the first character's value manually.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter eab4507898 lib/fs: Fix format string in find_fs_mount()
A field width of 4096 allows fscanf() to store that amount of characters
into the given buffer, though that doesn't include the terminating NULL
byte. Decrease the value by one to leave space for it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 45c2ec9e95 ipntable: Avoid memory allocation for filter.name
The original issue was that filter.name might end up unterminated if
user provided string was too long. But in fact it is not necessary to
copy the commandline parameter at all: just make filter.name point to it
instead.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 70a6df3962 tipc/bearer: Prevent NULL pointer dereference
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter 75716932a0 tc/tc_filter: Make sure filter name is not empty
The later check for 'k[0] != 0' requires a non-empty filter name,
otherwise NULL pointer dereference in 'q' might happen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter a754de3ccd tc/q_netem: Don't dereference possibly NULL pointer
Assuming 'opt' might be NULL, move the call to RTA_PAYLOAD to after the
check since it dereferences its parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter 6d02518fdc ifstat, nstat: Check fdopen() return value
Prevent passing NULL FILE pointer to fgets() later.

Fix both tools in a single patch since the code changes are basically
identical.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:43 -07:00
Andreas Henriksson ae4e21c93f ss: fix help/man TCP-STATE description for listening
There's some misleading information in --help and ss(8) manpage about
TCP-STATE named 'listen'.
ss doesn't know such a state, but it knows 'listening' state.

$ ss -tua state listen
ss: wrong state name: listen

$ ss -tua state listening
[...]

Addresses: https://bugs.debian.org/872990
Reported-by: Pavel Lyulchenko <p.lyulchenko@gmail.com>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2017-08-24 11:01:34 -07:00
William Tu 9a1381d509 gre: add support for ERSPAN tunnel
The patch adds ERSPAN type II tunnel support. The implementation is
based on the draft at
 https://tools.ietf.org/html/draft-foschiano-erspan-01.

One of the purposes is for Linux box to be able to receive ERSPAN
monitoring traffic sent from the Cisco switch, by creating a ERSPAN
tunnel device. In addition, the patch also adds ERSPAN TX, so traffic
can also be encapsulated into ERSPAN and sent out.

The implementation reuses the key as ERSPAN session ID, and
field 'erspan' as ERSPAN Index fields:
./ip link add dev ers11 type erspan seq key 100 erspan 123 \
		local 172.16.1.200 remote 172.16.1.100

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
2017-08-23 10:06:54 -07:00
Stephen Hemminger fb14560b76 add ERSPAN headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:05:08 -07:00
Stephen Hemminger 5f1df307b4 config: put CFLAGS/LDLIBS in config.mk
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.

Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:03:09 -07:00
Stephen Hemminger 51186362ba Merge branch 'master' into net-next 2017-08-21 17:37:15 -07:00
Phil Sutter c3724e4bc3 lib/bpf: Don't leak fp in bpf_find_mntpt()
If fopen() succeeded but len != PATH_MAX, the function leaks the open
FILE pointer. Fix this by checking len value before calling fopen().

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-21 17:35:07 -07:00
Phil Sutter 6e33f7b0f6 devlink: Check return code of strslashrsplit()
This function shouldn't fail because all callers of
__dl_argv_handle_port() make sure the passed string contains enough
slashes already, but better make sure if this changes in future the
function won't access uninitialized data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:28:03 -07:00
Phil Sutter 84b6a3f4b5 iplink_vrf: Complain if main table is not found
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsahern@gmail.com>
2017-08-21 17:28:03 -07:00
Phil Sutter 7c66d89828 iproute: Check mark value input
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:28:03 -07:00
Phil Sutter 82ed9ffa2b tc/q_multiq: Don't pass garbage in TCA_OPTIONS
multiq_parse_opt() doesn't change 'opt' at all. So at least make sure
it doesn't fill TCA_OPTIONS attribute with garbage from stack.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter d304b05c12 netem/maketable: Check return value of fstat()
Otherwise info.st_size may contain garbage.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter 301826beb3 ss: Use C99 initializer in netlink_show_one()
This has the additional benefit of initializing st.ino to zero which is
used later in is_sctp_assoc() function.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter b48a1161f5 ipmaddr: Avoid accessing uninitialized data
Looks like this can only happen if /proc/net/igmp is malformed, but
better be sure.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter 258b7c0fa7 iplink_can: Prevent overstepping array bounds
can_state_names array contains at most CAN_STATE_MAX fields, so allowing
an index to it to be equal to that number is wrong. While here, also
make sure the array is indeed that big so nothing bad happens if
CAN_STATE_MAX ever increases.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter d044ea3e78 ipaddress: Avoid accessing uninitialized variable lcl
If no address was given, ipaddr_modify() accesses uninitialized data
when assigning to req.ifa.ifa_prefixlen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Stephen Hemminger a4b8e88d87 Merge branch 'master' into net-next 2017-08-21 17:14:19 -07:00
Phil Sutter 73aa988868 tc/m_gact: Drop dead code
The use of 'ok' variable in parse_gact() is ineffective: The second
conditional increments it either if *argv is 'gact' or if
parse_action_control() doesn't fail (in which case exit() is called).
So this is effectively an unconditional increment and since no decrement
happens anywhere, all remaining checks for 'ok != 0' can be dropped.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter e469523e8e ss: Drop useless assignment
After '*b = *a', 'b->next' already has the same value as 'a->next'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 44448a90ea ss: Skip useless check in parse_hostcond()
The passed 'addr' parameter is dereferenced by caller before and in
parse_hostcond() multiple times before this check, so assume it is
always true.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter b3c5f84493 lib/rt_names: Drop dead code in rtnl_rttable_n2a()
Since 'id' is 32bit unsigned, it can never exceed RT_TABLE_MAX (which is
defined to 0xFFFFFFFF). Therefore drop that never matching conditional.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 2a86625619 iproute: Fix for missing 'Oifs:' display
Covscan complained about dead code but after reading it, I assume the
author's intention was to prefix the interface list with 'Oifs: '.
Initializing first to 1 and setting it to 0 after above prefix was
printed should fix it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 2869262144 ipntable: No need to check and assign to parms_rta
This variable is initialized at declaration and nowhere else does any
assignment to it happen, so just drop the check.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 8579a398c5 devlink: No need for this self-assignment
dl_argv_handle_both() will either assign to handle_bit or error out in
which case the variable is not used by the caller.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-08-21 17:12:21 -07:00
Leon Romanovsky dbc76eb6cc rdma: Add initial manual for the tool
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 7fc75744c0 rdma: Add json output to link object
An example for the JSON output for two devices system.

root@mtr-leonro:~# rdma link -d -p -j
[{
        "ifindex": 1,
        "port": 1,
        "ifname": "mlx5_0/1",
        "subnet_prefix": "fe80:0000:0000:0000",
        "lid": 13399,
        "sm_lid": 49151,
        "lmc": 0,
        "state": "ACTIVE",
        "physical_state": "LINK_UP",
        "caps": ["AUTO_MIG"
        ]
    },{
        "ifindex": 2,
        "port": 1,
        "ifname": "mlx5_1/1",
        "subnet_prefix": "fe80:0000:0000:0000",
        "lid": 13400,
        "sm_lid": 49151,
        "lmc": 0,
        "state": "ACTIVE",
        "physical_state": "LINK_UP",
        "caps": ["AUTO_MIG"
        ]
    }
]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky ef353e2e94 rdma: Implement json output for dev object
The example output for machine with two devices

root@mtr-leonro:~# rdma dev -j -p
[{
	"ifindex": 1,
	"ifname": "mlx5_0",
	"node_type": "ca",
	"fw": "2.8.9999",
	"node_guid": "5254:00c0:fe12:3457",
	"sys_image_guid": 5254:00c0:fe12:3457",
	"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
		  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
		  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
		  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
		  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
		  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
	},{
	"ifindex": 2,
	"ifname": mlx5_1,
	"node_type": "ca",
	"fw": "2.8.9999",
	"node_guid": "5254:00c0:fe12:3458",
	"sys_image_guid": "5254:00c0:fe12:3458",
	"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
		  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
		  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
		  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
		  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
		  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
	}
]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky ab6e2b7bdb rdma: Add json and pretty outputs
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky da990ab40a rdma: Add link object
Link (port) object represent struct ib_port to the user space.

Link properties:
 * Port capabilities
 * IB subnet prefix
 * LID, SM_LID and LMC
 * Port state
 * Physical state

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 40df8263a0 rdma: Add dev object
Device (dev) object represents struct ib_device to the user space.

Device properties:
 * Device capabilities
 * FW version to the device output
 * node_guid and sys_image_guid
 * node_type

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 74bd75c2b6 rdma: Add basic infrastructure for RDMA tool
RDMA devices are cross-functional devices from one side,
but very tailored for the specific markets from another.

Such diversity caused to spread of RDMA related configuration
across various tools, e.g. devlink, ip, ethtool, ib specific and
vendor specific solutions.

This patch adds ability to fill device and port information
by reading RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky afdc119410 utils: Move BIT macro to common header
BIT() macro was implemented and used by devlink for now, but following
patches of rdmatool will reuse the same macro, so put it in common
header file.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Stephen Hemminger 18d7817c60 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-21 16:55:15 -07:00
Stephen Hemminger fa93d9a8aa Merge branch 'master' into net-next 2017-08-18 09:43:00 -07:00
Phil Sutter be55416add tipc/bearer: Fix resource leak in error path
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 46131577cf ss: Fix potential memleak in unix_stats_print()
Fixes: 2d0e538f3e ("ss: Drop list traversal from unix_stats_print()")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter b530cef0e3 ifstat: Fix memleak in dump_kern_db() for json output
Looks like this was forgotten when converting to common json output
formatter.

Fixes: fcc16c2287 ("provide common json output formatter")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 35f6adefb8 ifstat: Fix memleak in error case
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 6ac5943bdd ipvrf: Fix error path of vrf_switch()
Apart from trying to close(-1), this also leaked memory.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 3e587d9f43 tc/em_ipset: Don't leak sockfd on error path
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:16:59 -07:00
Phil Sutter 4b45ae221e ss: Don't leak fd in tcp_show_netlink_file()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:16:59 -07:00
Phil Sutter 08806fb019 iproute_lwtunnel: csum_mode value checking was ineffective
ila_csum_name2mode() returning -1 on error but being declared as
returning __u8 doesn't make much sense. Change the code to correctly
detect this issue. Checking for __u8 overruns shouldn't be necessary
though since ila_csum_name2mode() return values are well-defined.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 58a15e6c7e iproute_lwtunnel: Argument to strerror must be positive
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 436270a45d tipc/node: Fix socket fd check in cmd_node_get_addr()
socket() returns -1 on error, not 0.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 1e3197e0fd ifcfg: Quote left-hand side of [ ] expression
This prevents word-splitting and therefore leads to more accurate error
message in case 'grep -c' prints something other than a number.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:11:00 -07:00
Phil Sutter 2313b6bfe4 examples: Some shell fixes to cbq.init
This addresses the following issues:

- $@ is an array, so don't use it in quoted strings - use $* instead.

- Add missing quotes to components of [ ] expressions. These are not
  strictly necessary since the output of 'wc -l' should be a single word
  only, but in case of errors, bash prints "integer expression expected"
  instead of "too many arguments".

- Use -print0/-0 when piping from find to xargs to allow for filenames
  which contain whitespace.

- Quote arguments to 'eval' to prevent word-splitting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:11:00 -07:00
David Ahern e5fa0e6fe7 libnetlink: Fix extack attribute parsing
Initialize tb in nl_dump_ext_err since not all attributes will be
sent in the messages.

Add error checking on mnl_attr_parse and print messages on the off
chance the ext ack attributes fail to validate.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-18 08:47:34 -07:00
Julien Fortin 43bc20ae73 ip: iplink_vlan.c: add json output support
Schema:
{
    "protocol": {
        "type": "string",
        "attr": "IFLA_VLAN_PROTOCOL"
    },
    "id": {
        "type": "uint",
        "attr": "IFLA_VLAN_ID"
    },
    "flags": {
        "type": "array",
        "attr": "IFLA_VLAN_FLAGS",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "ingress_qos": {
        "type": "array",
        "attr": "IFLA_VLAN_INGRESS_QOS",
        "array": [
            {
                "type": "dict",
                "dict": {
                    "from": {
                        "type": "uint"
                    },
                    "to": {
                        "type": "uint"
                    }
                }
            }
        ]
    },
    "egress_qos": {
        "type": "array",
        "attr": "IFLA_VLAN_EGRESS_QOS",
        "array": [
            {
                "type": "dict",
                "dict": {
                    "from": {
                        "type": "uint"
                    },
                    "to": {
                        "type": "uint"
                    }
                }
            }
        ]
    }
}

$ ip link add name eth0.42 link eth0 type vlan id 42
$ ip -details -json link show
[{
        "ifindex": 30,
        "ifname": "eth0.42",
        "link": "eth0",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:db:31:88",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vlan",
            "info_data": {
                "protocol": "802.1Q",
                "id": 42,
                "flags": ["REORDER_HDR"]
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 92b7454c31 ip: link_macvlan.c: add json output support
Schema:
{
    "mode": {
        "type": "string",
        "attr": "IFLA_MACVLAN_MODE"
    },
    "nopromisc": {
        "type": "bool",
        "attr": "MACVLAN_FLAG_NOPROMISC"
    },
    "macaddr_count": {
        "type": "int",
        "attr": "IFLA_MACVLAN_MACADDR_COUNT"
    },
    "macaddr_data": {
        "type": "array",
        "attr": "IFLA_MACVLAN_MACADDR_DATA",
        "array": [
            {
                "type": "string"
            }
        ]
    },
}

$ ip link add name peth0 link eth0 type macvlan
$ ip -details -json link show peth0
[{
        "ifindex": 26,
        "ifname": "peth0",
        "link": "eth0",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "7a:84:48:3e:7b:1c",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "macvlan",
            "info_data": {
                "mode": "vepa"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 063dd06cc1 ip: link_vti6.c: add json output support
Schema:
{
    "remote": {
        "type": "string",
        "attr": "IFLA_VTI_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VTI_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VTI_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_VTI_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_VTI_OKEY"
    }
}

➜  ~ ip -6 tunnel add name vti6 mode vti6 local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
➜  ~ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
11: ip6_vti0@NONE: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
12: vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default
    link/tunnel6 2001:db8:1::1 peer 2001:db8:85a3::8a2e:370:7334
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6_vti0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti6",
            "info_data": {
                "remote": "::",
                "local": "::",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "vti6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "2001:db8:1::1",
        "link_pointtopoint": true,
        "broadcast": "2001:db8:85a3::8a2e:370:7334",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti6",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 4c42a1c103 ip: link_vti.c: add json output support
Schema:
{
    "remote": {
        "type": "string",
        "attr": "IFLA_VTI_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VTI_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VTI_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_VTI_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_VTI_OKEY"
    }
}

$ ip tunnel add vti0 mode vti local 192.0.2.1 remote 198.51.100.3
$ ip link show
10: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop state DOWN mode DEFAULT group
default
    link/ipip 0.0.0.0 brd 0.0.0.0
11: vti0@NONE: <POINTOPOINT,NOARP> mtu 1428 qdisc noop state DOWN mode
DEFAULT group default
    link/ipip 192.0.2.1 peer 198.51.100.3
$ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip_vti0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1428,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "vti0",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1428,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "192.0.2.1",
        "link_pointtopoint": true,
        "broadcast": "198.51.100.3",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti",
            "info_data": {
                "remote": "198.51.100.3",
                "local": "192.0.2.1",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 2539a407bb ip: link_iptnl.c: add json output support
Schema
{
    "remote": {
        "type": "string",
        "attr": "IFLA_IPTUN_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_IPTUN_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_IPTUN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_IPTUN_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_IPTUN_TOS"
    },
    "pmtudisc": {
        "type": "bool",
        "attr": "IFLA_IPTUN_PMTUDISC"
    },
    "isatap": {
        "type": "bool",
        "attr": "SIT_ISATAP & IFLA_IPTUN_FLAGS"
    },
    "6rd": {
        "type": "dict",
        "attr": "IFLA_IPTUN_6RD_PREFIXLEN",
        "dict": {
            "prefix": {
                "type": "string"
            },
            "prefixlen": {
                "type": "uint",
                "attr": "IFLA_IPTUN_6RD_PREFIXLEN"
            },
            "relay_prefix": {
                "type": "string"
            },
            "relay_prefixlen": {
                "type": "uint",
                "attr": "IFLA_IPTUN_6RD_PREFIXLEN"
            }
        }
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_IPTUN_ENCAP_TYPE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_IPTUN_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip tunnel add tun0 mode ipip local 192.0.2.1 remote 198.51.100.3
$ ip link show
10: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default
    link/ipip 0.0.0.0 brd 0.0.0.0
11: tun0@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
DEFAULT group default
    link/ipip 192.0.2.1 peer 198.51.100.3
$ ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "tunl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1480,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ipip",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "tun0",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1480,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "192.0.2.1",
        "link_pointtopoint": true,
        "broadcast": "198.51.100.3",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ipip",
            "info_data": {
                "remote": "198.51.100.3",
                "local": "192.0.2.1",
                "ttl": 0,
                "pmtudisc": true
            }
        },
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 1facc1c61c ip: link_ip6tnl.c: add json output support
Schema
{
    "proto": {
        "type": "string",
        "attr": "IFLA_IPTUN_PROTO"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_IPTUN_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_IPTUN_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_IPTUN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ip6_tnl_f_ign_encap_limit": {
        "type": "bool",
        "attr": "IP6_TNL_F_IGN_ENCAP_LIMIT"
    },
    "encap_limit": {
        "type": "uint",
        "attr": "IFLA_IPTUN_ENCAP_LIMIT"
    },
    "ttl": {
        "type": "uint",
        "attr": "IFLA_IPTUN_TTL"
    },
    "ip6_tnl_f_use_orig_tclass": {
        "type": "",
        "attr": "IP6_TNL_F_USE_ORIG_TCLASS"
    },
    "flowinfo_tclass": {
        "type": "string",
        "attr": "IP6_FLOWINFO_TCLASS"
    },
    "ip6_tnl_f_use_orig_flowlabel": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FLOWLABEL"
    },
    "flowlabel": {
        "type": "string",
        "attr": "IP6_FLOWINFO_FLOWLABEL"
    },
    "flowinfo": {
        "type": "string"
    },
    "ip6_tnl_f_rcv_dscp_copy": {
        "type": "bool",
        "attr": "IP6_TNL_F_RCV_DSCP_COPY"
    },
    "ip6_tnl_f_mip6_dev": {
        "type": "bool",
        "attr": "IP6_TNL_F_MIP6_DEV"
    },
    "ip6_tnl_f_use_orig_fwmark": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FWMARK"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_IPTUN_ENCAP_TYPE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_IPTUN_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip -6 tunnel add name tun6 mode ip6gre local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
$ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group
default
    link/tunnel6 :: brd ::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group
default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
12: tun6@NONE: <POINTOPOINT,NOARP> mtu 1448 qdisc noop state DOWN mode
DEFAULT group default
    link/gre6 20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01 peer
20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "encap_limit": 0,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01",
        "link_pointtopoint": true,
        "broadcast": "20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ttl": 64,
                "encap_limit": 4,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 6856fb6548 ip: link_gre6.c: add json output support
Schema
{
    "remote": {
        "type": "string",
        "attr": "IFLA_GRE_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_GRE_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_GRE_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GRE_TTL"
    },
    "ip6_tnl_f_ign_encap_limit": {
        "type": "bool",
        "attr": "IP6_TNL_F_IGN_ENCAP_LIMIT"
    },
    "encap_limit": {
        "type": "int",
        "attr": "IFLA_GRE_ENCAP_LIMIT"
    },
    "ip6_tnl_f_use_orig_flowlabel": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FLOWLABEL"
    },
    "flowlabel": {
        "type": "string",
        "attr": "IP6_FLOWINFO_FLOWLABEL"
    },
    "ip6_tnl_f_rcv_dscp_copy": {
        "type": "bool",
        "attr": "IP6_TNL_F_RCV_DSCP_COPY"
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_GRE_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_GRE_OKEY"
    },
    "iseq": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_SEQ"
    },
    "oseq": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_SEQ"
    },
    "icsum": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_CSUM"
    },
    "ocsum": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_CSUM"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_GRE_ENCAP_TYPE != TUNNEL_ENCAP_NONE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_GRE_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip -6 tunnel add name tun6 mode ip6gre local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
$ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT
group default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
12: tun6@NONE: <POINTOPOINT,NOARP> mtu 1448 qdisc noop state DOWN mode
DEFAULT group default
    link/gre6 20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01 peer
20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "encap_limit": 0,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01",
        "link_pointtopoint": true,
        "broadcast": "20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ttl": 64,
                "encap_limit": 4,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin e2d4588331 ip: link_gre.c: add json output support
Schema
{
    "external": {
        "type": "bool",
        "comment": "!tb[IFLA_GRE_COLLECT_METADATA]"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_GRE_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_GRE_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_GRE_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GRE_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_GRE_TOS"
    },
    "pmtudisc": {
        "type": "bool",
        "attr": "IFLA_GRE_PMTUDISC"
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_GRE_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_GRE_OKEY"
    },
    "iseq": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_SEQ"
    },
    "oseq": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_SEQ"
    },
    "icsum": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_CSUM"
    },
    "ocsum": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_CSUM"
    },
    "ignore_df": {
        "type": "bool",
        "attr": "IFLA_GRE_IGNORE_DF"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_GRE_ENCAP_TYPE != TUNNEL_ENCAP_NONE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_GRE_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip tunnel add tun42 mode gre local 192.0.2.42 remote 203.0.113.42 key 42
$ ip link show
10: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group
default
    link/gre 0.0.0.0 brd 0.0.0.0
11: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
12: tun42@NONE: <POINTOPOINT,NOARP> mtu 1472 qdisc noop state DOWN mode
DEFAULT group default
    link/gre 192.0.2.42 peer 203.0.113.42
$ ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1476,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "gretap0",
        "link": null,
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1462,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "00:00:00:00:00:00",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gretap",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun42",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1472,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre",
        "address": "192.0.2.42",
        "link_pointtopoint": true,
        "broadcast": "203.0.113.42",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gre",
            "info_data": {
                "remote": "203.0.113.42",
                "local": "192.0.2.42",
                "ttl": 0,
                "pmtudisc": true,
                "ikey": "0.0.0.42",
                "okey": "0.0.0.42"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin c339834682 ip: ipmacsec.c: add json output support
Schema
{
    "sci": {
        "type": "string",
        "attr": "IFLA_MACSEC_SCI"
    },
    "protect": {
        "type": "string",
        "attr": "IFLA_MACSEC_PROTECT"
    },
    "cipher_suite": {
        "type": "string",
        "attr": "IFLA_MACSEC_CIPHER_SUITE"
    },
    "icv_len": {
        "type": "uint",
        "attr": "IFLA_MACSEC_ICV_LEN"
    },
    "encoding_sa": {
        "type": "uint",
        "attr": "IFLA_MACSEC_ENCODING_SA"
    },
    "validation": {
        "type": "string",
        "attr": "IFLA_MACSEC_VALIDATION"
    },
    "encrypt": {
        "type": "string",
        "attr": "IFLA_MACSEC_ENCRYPT"
    },
    "inc_sci": {
        "type": "string",
        "attr": "IFLA_MACSEC_INC_SCI"
    },
    "es": {
        "type": "string",
        "attr": "IFLA_MACSEC_ES"
    },
    "scb": {
        "type": "string",
        "attr": "IFLA_MACSEC_SCB"
    },
    "replay_protect": {
        "type": "string",
        "attr": "IFLA_MACSEC_REPLAY_PROTECT"
    },
    "window": {
        "type": "int",
        "attr": ""
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 2f26065e25 ip: iplink_xdp.c: add json output support
Schema
{
    "attached": {
        "type": "uint",
        "attr": "IFLA_XDP_ATTACHED"
    },
    "prog_id": {
        "type": "uint",
        "attr": "IFLA_XDP_PROG_ID"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 3b98d9b804 ip: iplink_vxlan.c: add json output support
Schema:
{
    "id": {
        "type": "uint",
        "attr": "IFLA_VXLAN_ID"
    },
    "group": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP"
    },
    "group6": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP6"
    },
    "remote6": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP6"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VXLAN_LOCAL"
    },
    "local6": {
        "type": "string",
        "attr": "IFLA_VXLAN_LOCAL6"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VXLAN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
                "comment": "if not ifname for ifindex"
            }
        }
    },
    "port_range": {
        "type": "dict",
        "attr": "IFLA_VXLAN_PORT_RANGE",
        "dict": {
            "low": {
                "type": "uint"
            },
            "high": {
                "type": "uint"
            }
        }
    },
    "port": {
        "type": "uint",
        "attr": "IFLA_VXLAN_PORT"
    },
    "learning": {
        "type": "bool",
        "attr": "IFLA_VXLAN_LEARNING"
    },
    "proxy": {
        "type": "bool",
        "attr": "IFLA_VXLAN_PROXY"
    },
    "rsc": {
        "type": "bool",
        "attr": "IFLA_VXLAN_RSC"
    },
    "l2miss": {
        "type": "bool",
        "attr": "IFLA_VXLAN_L2MISS"
    },
    "l3miss": {
        "type": "bool",
        "attr": "IFLA_VXLAN_L3MISS"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_VXLAN_TOS"
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_VXLAN_TTL"
    },
    "label": {
        "type": "string",
        "attr": "IFLA_VXLAN_LABEL"
    },
    "ageing": {
        "type": "uint",
        "attr": "IFLA_VXLAN_AGEING"
    },
    "limit": {
        "type": "uint",
        "attr": "IFLA_VXLAN_LIMIT"
    },
    "udp_csum": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_CSUM"
    },
    "udp_zero_csum6_tx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_ZERO_CSUM6_TX"
    },
    "udp_zero_csum6_rx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_ZERO_CSUM6_RX"
    },
    "remcsum_tx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_REMCSUM_TX"
    },
    "remcsum_rx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_REMCSUM_RX"
    },
    "collect_metadata": {
        "type": "bool",
        "attr": "IFLA_VXLAN_COLLECT_METADATA"
    },
    "gbp": {
        "type": "bool",
        "attr": "IFLA_VXLAN_GBP"
    },
    "gpe": {
        "type": "bool",
        "attr": "IFLA_VXLAN_GPE"
    }
}

$ ip link add name vxlan42 type vxlan id 42 dev eth0 remote 203.0.113.6
local 192.0.2.1 dstport 4789
$ ip link add name vxlan43 type vxlan id 43 dev eth0 group 239.0.0.1
dstport 4789
$ ip -details -json link show
[{
        "ifindex": 17,
        "ifname": "vxlan42",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1450,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "b2:92:0e:1a:c6:42",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {
                "id": 42,
                "remote": "203.0.113.6",
                "local": "192.0.2.1",
                "link": "eth0",
                "port_range": {
                    "low": 0,
                    "high": 0
                },
                "port": 4789,
                "learning": true,
                "ttl": 0,
                "ageing": 300,
                "udp_csum": false,
                "udp_zero_csum6_tx": false,
                "udp_zero_csum6_rx": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 18,
        "ifname": "vxlan43",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1450,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "c6:51:4d:7f:f9:2f",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {
                "id": 43,
                "group": "239.0.0.1",
                "link": "eth0",
                "port_range": {
                    "low": 0,
                    "high": 0
                },
                "port": 4789,
                "learning": true,
                "ttl": 0,
                "ageing": 300,
                "udp_csum": false,
                "udp_zero_csum6_tx": false,
                "udp_zero_csum6_rx": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin d9e84ec27c ip: iplink_vrf.c: add json output support
Schema:
{
    "table": {
        "type": "uint",
        "attr": "IFLA_VRF_TABLE"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 8f24afc9d4 ip: iplink_ipvlan.c: add json output support
Schema:
{
    "mode": {
        "type": "string",
        "attr": "IFLA_IPVLAN_MODE"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 3bec1cf84e ip: iplink_ipoib.c: add json output support
Schema:
{
    "key": {
        "type": "string",
        "attr": "IFLA_IPOIB_PKEY"
    },
    "mode": {
        "type": "string",
        "attr": "IFLA_IPOIB_PKEY"
    },
    "umcast": {
        "type": "string",
        "attr": "IFLA_IPOIB_UMCAST"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 119fae0aa5 ip: iplink_geneve.c: add json output support
Schema:
{
    "id": {
        "type": "uint",
        "attr": "IFLA_GENEVE_ID"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_GENEVE_REMOTE"
    },
    "remote6": {
        "type": "string",
        "attr": "IFLA_GENEVE_REMOTE6"
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GENEVE_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_GENEVE_TOS"
    },
    "label": {
        "type": "string",
        "attr": "IFLA_GENEVE_LABEL"
    },
    "port": {
        "type": "uint",
        "attr": "IFLA_GENEVE_PORT"
    },
    "collect_metadata": {
        "type": "bool",
        "attr": "IFLA_GENEVE_COLLECT_METADATA"
    },
    "udp_csum": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_CSUM"
    },
    "udp_zero_csum6_tx": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_ZERO_CSUM6_TX"
    },
    "udp_zero_csum6_rx": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_ZERO_CSUM6_RX"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 529226009f ip: iplink_can.c: add json output support
Schema: IFLA_INFO_DATA
{
    "ctrlmode": {
        "type": "array",
        "attr": "IFLA_CAN_CTRLMODE",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "state": {
        "type": "string",
        "attr": "IFLA_CAN_STATE"
    },
    "berr_counter": {
        "type": "dict",
        "attr": "IFLA_CAN_BERR_COUNTER",
        "dict": {
            "tx": {
                "type": "int"
            },
            "rx": {
                "type": "int"
            }
        }
    },
    "restart_ms": {
        "type": "int",
        "attr": "IFLA_CAN_RESTART_MS"
    },
    "bittiming": {
        "type": "dict",
        "attr": "IFLA_CAN_BITTIMING",
        "dict": {
            "bitrate": {
                "type": "int"
            },
            "sample_point": {
                "type": "float"
            },
            "tq": {
                "type": "int"
            },
            "prop_seg": {
                "type": "int"
            },
            "phase_seg1": {
                "type": "int"
            },
            "phase_seg2": {
                "type": "int"
            },
            "sjw": {
                "type": "int"
            }
        }
    },
    "bittiming_const": {
        "type": "dict",
        "attr": "IFLA_CAN_BITTIMING_CONST",
        "dict": {
            "name": {
                "type": "string"
            },
            "tseg1": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "tseg2": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "sjw": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp_inc": {
                "type": "int"
            }
        }
    },
    "bittiming_bitrate": {
        "type": "uint",
        "attr": "IFLA_CAN_BITTIMING"
    },
    "bitrate_const": {
        "type": "array",
        "attr": "IFLA_CAN_BITRATE_CONST",
        "array": [
            {
                "type": "uint"
            }
        ]
    },
    "data_bittiming": {
        "type": "dict",
        "attr": "IFLA_CAN_DATA_BITTIMING",
        "dict": {
            "bitrate": {
                "type": "int"
            },
            "sample_point": {
                "type": "float"
            },
            "tq": {
                "type": "int"
            },
            "prop_seg": {
                "type": "int"
            },
            "phase_seg1": {
                "type": "int"
            },
            "phase_seg2": {
                "type": "int"
            },
            "sjw": {
                "type": "int"
            }
        }
    },
    "data_bittiming_const": {
        "type": "dict",
        "attr": "IFLA_CAN_DATA_BITTIMING_CONST",
        "dict": {
            "name": {
                "type": "string"
            },
            "tseg1": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "tseg2": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "sjw": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp_inc": {
                "type": "int"
            }
        }
    },
    "data_bittiming_bitrate": {
        "type": "uint",
        "attr": "IFLA_CAN_DATA_BITTIMING"
    },
    "data_bitrate_const": {
        "type": "array",
        "attr": "IFLA_CAN_DATA_BITRATE_CONST",
        "array": [
            {
                "type": "uint"
            }
        ]
    },
    "termination": {
        "type": "unsigned short",
        "attr": "IFLA_CAN_TERMINATION"
    },
    "termination_const": {
        "type": "array",
        "attr": "IFLA_CAN_TERMINATION_CONST",
        "array": [
            {
                "type": "unsigned short"
            }
        ]
    },
    "clock": {
        "type": "int",
        "attr": "IFLA_CAN_CLOCK"
    }
}

IFLA_INFO_XSTATS
{
    "restarts": {
        "type": "int"
    },
    "bus_error": {
        "type": "int"
    },
    "arbitration_lost": {
        "type": "int"
    },
    "error_warning": {
        "type": "int"
    },
    "error_passive": {
        "type": "int"
    },
    "bus_off": {
        "type": "int"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 165a703909 ip: iplink_bridge_slave.c: add json output support
Schema:
bridge_slave: IFLA_INFO_SLAVE_DATA
{
    "state": {
        "type": "string",
        "attr": "IFLA_BRPORT_STATE",
        "mutually_exclusive": {
            "state_index": {
                "type": "uint",
                "comment": "if (state > BR_STATE_BLOCKING)"
            }
        }
    },
    "priority": {
        "type": "int",
        "attr": "IFLA_BRPORT_PRIORITY"
    },
    "cost": {
        "type": "int",
        "attr": "IFLA_BRPORT_COST"
    },
    "mode": {
        "type": "bool",
        "attr": "IFLA_BRPORT_MODE"
    },
    "guard": {
        "type": "bool",
        "attr": "IFLA_BRPORT_GUARD"
    },
    "protect": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROTECT"
    },
    "fast_leave": {
        "type": "bool",
        "attr": "IFLA_BRPORT_FAST_LEAVE"
    },
    "learning": {
        "type": "bool",
        "attr": "IFLA_BRPORT_LEARNING"
    },
    "unicast_flood": {
        "type": "bool",
        "attr": "IFLA_BRPORT_UNICAST_FLOOD"
    },
    "id": {
        "type": "string",
        "attr": "IFLA_BRPORT_ID"
    },
    "no": {
        "type": "string",
        "attr": "IFLA_BRPORT_NO"
    },
    "designated_port": {
        "type": "uint",
        "attr": "IFLA_BRPORT_DESIGNATED_PORT"
    },
    "designated_cost": {
        "type": "uint",
        "attr": "IFLA_BRPORT_DESIGNATED_COST"
    },
    "bridge_id": {
        "type": "string",
        "attr": "IFLA_BRPORT_BRIDGE_ID"
    },
    "root_id": {
        "type": "string",
        "attr": "IFLA_BRPORT_ROOT_ID"
    },
    "hold_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_HOLD_TIMER"
    },
    "message_age_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_MESSAGE_AGE_TIMER"
    },
    "forward_delay_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_FORWARD_DELAY_TIMER"
    },
    "topology_change_ack": {
        "type": "uint",
        "attr": "IFLA_BRPORT_TOPOLOGY_CHANGE_ACK"
    },
    "config_pending": {
        "type": "uint",
        "attr": "IFLA_BRPORT_CONFIG_PENDING"
    },
    "proxyarp": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROXYARP"
    },
    "proxyarp_wifi": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROXYARP_WIFI"
    },
    "multicast_router": {
        "type": "uint",
        "attr": "IFLA_BRPORT_MULTICAST_ROUTER"
    },
    "mcast_flood": {
        "type": "bool",
        "attr": "IFLA_BRPORT_MCAST_FLOOD"
    }
}

$ ip link add dev br42 type bridge
$ ip link add dev bond42 type bond
$ ip link set dev bond42 master br42
$ ip link set dev bond42 up
$ ip link set dev br42 up
$ ip -details link show
$ ip -details link show
15: br42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  199.11 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64
16: bond42: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br42 state UNKNOWN mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state forwarding priority 8 cost 100 hairpin off guard
off root_block off fastleave off learning on flood on port_id 0x8001
port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64

$ ip -details -json link show
[{
        "ifindex": 15,
        "ifname": "br42",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
                "forward_delay": 1500,
                "hello_time": 200,
                "max_age": 2000,
                "ageing_time": 30000,
                "stp_state": 0,
                "priority": 32768,
                "vlan_filtering": 0,
                "vlan_protocol": "802.1Q",
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "root_port": 0,
                "root_path_cost": 0,
                "topology_change": 0,
                "topology_change_detected": 0,
                "hello_timer": 0.00,
                "tcn_timer": 0.00,
                "topology_change_timer": 0.00,
                "gc_timer": 298.27,
                "vlan_default_pvid": 1,
                "vlan_stats_enabled": 0,
                "group_fwd_mask": "0",
                "group_addr": "01:80:c2:00:00:00",
                "mcast_snooping": 1,
                "mcast_router": 1,
                "mcast_query_use_ifaddr": 0,
                "mcast_querier": 0,
                "mcast_hash_elasticity": 4096,
                "mcast_hash_max": 4096,
                "mcast_last_member_cnt": 2,
                "mcast_startup_query_cnt": 2,
                "mcast_last_member_intvl": 100,
                "mcast_membership_intvl": 26000,
                "mcast_querier_intvl": 25500,
                "mcast_query_intvl": 12500,
                "mcast_query_response_intvl": 1000,
                "mcast_startup_query_intvl": 3125,
                "mcast_stats_enabled": 0,
                "mcast_igmp_version": 2,
                "mcast_mld_version": 1,
                "nf_call_iptables": 0,
                "nf_call_ip6tables": 0,
                "nf_call_arptables": 0
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 16,
        "ifname": "bond42",
        "flags": ["BROADCAST","MULTICAST","MASTER","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "master": "br42",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            },
            "info_slave_kind": "bridge",
            "info_slave_data": {
                "state": "forwarding",
                "priority": 8,
                "cost": 100,
                "hairpin": false,
                "guard": false,
                "root_block": false,
                "fastleave": false,
                "learning": true,
                "flood": true,
                "id": "0x8001",
                "no": "0x1",
                "designated_port": 32769,
                "designated_cost": 0,
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "hold_timer": 0.00,
                "message_age_timer": 0.00,
                "forward_delay_timer": 11.97,
                "topology_change_ack": 0,
                "config_pending": 0,
                "proxy_arp": false,
                "proxy_arp_wifi": false,
                "multicast_router": 1,
                "mcast_flood": true
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 20aeecfbf5 ip: iplink_bridge.c: add json output support
Schema and live example:
bridge: IFLA_INFO_DATA
{
    "forward_delay": {
        "type": "uint",
        "attr": "IFLA_BR_FORWARD_DELAY"
    },
    "hello_time": {
        "type": "uint",
        "attr": "IFLA_BR_HELLO_TIME"
    },
    "max_age": {
        "type": "uint",
        "attr": "IFLA_BR_MAX_AGE"
    },
    "ageing_time": {
        "type": "uint",
        "attr": "IFLA_BR_AGEING_TIME"
    },
    "stp_state": {
        "type": "uint",
        "attr": "IFLA_BR_STP_STATE"
    },
    "priority": {
        "type": "uint",
        "attr": "IFLA_BR_PRIORITY"
    },
    "vlan_filtering": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_FILTERING"
    },
    "vlan_protocol": {
        "type": "string",
        "attr": "IFLA_BR_VLAN_PROTOCOL"
    },
    "bridge_id": {
        "type": "string",
        "attr": "IFLA_BR_BRIDGE_ID"
    },
    "root_id": {
        "type": "string",
        "attr": "IFLA_BR_ROOT_ID"
    },
    "root_port": {
        "type": "uint",
        "attr": "IFLA_BR_ROOT_PORT"
    },
    "root_path_cost": {
        "type": "uint",
        "attr": "IFLA_BR_ROOT_PATH_COST"
    },
    "topology_change": {
        "type": "uint",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE"
    },
    "topology_change_detected": {
        "type": "uint",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE_DETECTED"
    },
    "hello_timer": {
        "type": "float",
        "attr": "IFLA_BR_HELLO_TIMER"
    },
    "tcn_timer": {
        "type": "float",
        "attr": "IFLA_BR_TCN_TIMER"
    },
    "topology_change_timer": {
        "type": "float",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE_TIMER"
    },
    "gc_timer": {
        "type": "float",
        "attr": "IFLA_BR_GC_TIMER"
    },
    "vlan_default_pvid": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_DEFAULT_PVID"
    },
    "vlan_stats_enabled": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_STATS_ENABLED"
    },
    "group_fwd_mask": {
        "type": "string",
        "attr": "IFLA_BR_GROUP_FWD_MASK"
    },
    "group_addr": {
        "type": "string",
        "attr": "IFLA_BR_GROUP_ADDR"
    },
    "mcast_snooping": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_SNOOPING"
    },
    "mcast_router": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_ROUTER"
    },
    "mcast_query_use_ifaddr": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_QUERY_USE_IFADDR"
    },
    "mcast_querier": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_QUERIER"
    },
    "mcast_hash_elasticity": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_HASH_ELASTICITY"
    },
    "mcast_hash_max": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_HASH_MAX"
    },
    "mcast_last_member_cnt": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_LAST_MEMBER_CNT"
    },
    "mcast_startup_query_cnt": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_STARTUP_QUERY_CNT"
    },
    "mcast_last_member_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_LAST_MEMBER_INTVL"
    },
    "mcast_membership_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_MEMBERSHIP_INTVL"
    },
    "mcast_querier_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERIER_INTVL"
    },
    "mcast_query_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERY_INTVL"
    },
    "mcast_query_response_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERY_RESPONSE_INTVL"
    },
    "mcast_startup_query_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_STARTUP_QUERY_INTVL"
    },
    "mcast_stats_enabled": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_STATS_ENABLED"
    },
    "mcast_igmp_version": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_IGMP_VERSION"
    },
    "mcast_mld_version": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_MLD_VERSION"
    },
    "nf_call_iptables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_IPTABLES"
    },
    "nf_call_ip6tables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_IP6TABLES"
    },
    "nf_call_arptables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_ARPTABLES"
    }
}

$ ip link add dev br42 type bridge
$ ip link add dev bond42 type bond
$ ip link set dev bond42 master br42
$ ip link set dev bond42 up
$ ip link set dev br42 up

$ ip -details link show
15: br42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  199.11 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64
16: bond42: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br42 state UNKNOWN mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state forwarding priority 8 cost 100 hairpin off guard
off root_block off fastleave off learning on flood on port_id 0x8001
port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64

$ ip -details -json link show
[{
        "ifindex": 15,
        "ifname": "br42",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
                "forward_delay": 1500,
                "hello_time": 200,
                "max_age": 2000,
                "ageing_time": 30000,
                "stp_state": 0,
                "priority": 32768,
                "vlan_filtering": 0,
                "vlan_protocol": "802.1Q",
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "root_port": 0,
                "root_path_cost": 0,
                "topology_change": 0,
                "topology_change_detected": 0,
                "hello_timer": 0.00,
                "tcn_timer": 0.00,
                "topology_change_timer": 0.00,
                "gc_timer": 298.27,
                "vlan_default_pvid": 1,
                "vlan_stats_enabled": 0,
                "group_fwd_mask": "0",
                "group_addr": "01:80:c2:00:00:00",
                "mcast_snooping": 1,
                "mcast_router": 1,
                "mcast_query_use_ifaddr": 0,
                "mcast_querier": 0,
                "mcast_hash_elasticity": 4096,
                "mcast_hash_max": 4096,
                "mcast_last_member_cnt": 2,
                "mcast_startup_query_cnt": 2,
                "mcast_last_member_intvl": 100,
                "mcast_membership_intvl": 26000,
                "mcast_querier_intvl": 25500,
                "mcast_query_intvl": 12500,
                "mcast_query_response_intvl": 1000,
                "mcast_startup_query_intvl": 3125,
                "mcast_stats_enabled": 0,
                "mcast_igmp_version": 2,
                "mcast_mld_version": 1,
                "nf_call_iptables": 0,
                "nf_call_ip6tables": 0,
                "nf_call_arptables": 0
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 16,
        "ifname": "bond42",
        "flags": ["BROADCAST","MULTICAST","MASTER","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "master": "br42",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            },
            "info_slave_kind": "bridge",
            "info_slave_data": {
                "state": "forwarding",
                "priority": 8,
                "cost": 100,
                "hairpin": false,
                "guard": false,
                "root_block": false,
                "fastleave": false,
                "learning": true,
                "flood": true,
                "id": "0x8001",
                "no": "0x1",
                "designated_port": 32769,
                "designated_cost": 0,
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "hold_timer": 0.00,
                "message_age_timer": 0.00,
                "forward_delay_timer": 11.97,
                "topology_change_ack": 0,
                "config_pending": 0,
                "proxy_arp": false,
                "proxy_arp_wifi": false,
                "multicast_router": 1,
                "mcast_flood": true
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 69ffd27325 ip: iplink_hsr.c: add json output support
Schema:
hsr: IFLA_INFO_DATA
{
    "slave1": {
        "type": "string",
        "attr": "IFLA_HSR_SLAVE1"
    },
    "slave2": {
        "type": "string",
        "attr": "IFLA_HSR_SLAVE2"
    },
    "seq_nr": {
        "type": "int",
        "attr": "IFLA_HSR_SEQ_NR"
    },
    "supervision_addr": {
        "type": "int",
        "attr": "IFLA_HSR_SUPERVISION_ADDR"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 707cce5a63 ip: iplink_bond_slave.c: add json output support (info_slave_data)
Schema and live example:
bond_slave: IFLA_INFO_SLAVE_DATA
{
    "state": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_STATE",
        "mutually_exclusive": {
            "state_index": {
                "type": "int",
                "comment": "if (state >= ARRAY_SIZE(slave_states))"
            }
        }
    },
    "mii_status": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_MII_STATUS",
        "mutually_exclusive": {
            "mii_status_index": {
                "type": "int",
                "comment": "if (status >= ARRAY_SIZE(slave_mii_status))"
            }
        }
    },
    "link_failure_count": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_LINK_FAILURE_COUNT"
    },
    "perm_hwaddr": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_PERM_HWADDR"
    },
    "queue_id": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_QUEUE_ID"
    },
    "ad_aggregator_id": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_AGGREGATOR_ID"
    },
    "ad_actor_oper_port_state": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE"
    },
    "ad_partner_oper_port_state": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE"
    }
}

$ ip link add dev bond42 type bond
$ ip link set dev swp5 master bond42
$ ip link set dev bond42 up
$ ip link set dev swp5 up
$ ip -details -json link show
[{
        "ifindex": 7,
        "ifname": "swp5",
        "flags": ["BROADCAST","MULTICAST","SLAVE","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "master": "bond42",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_slave_kind": "bond",
            "info_slave_data": {
                "state": "BACKUP",
                "mii_status": "UP",
                "link_failure_count": 0,
                "perm_hwaddr": "08:00:27:5c:03:c6",
                "queue_id": 0,
                "ad_aggregator_id": 1,
                "ad_actor_oper_port_state": 79,
                "ad_partner_oper_port_state": 1
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 14,
        "ifname": "bond42",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","MASTER","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_info": {
                    "aggregator": 1,
                    "num_ports": 1,
                    "actor_key": 0,
                    "partner_key": 1,
                    "partner_mac": "00:00:00:00:00:00"
                },
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 7ff60b090f ip: iplink_bond.c: add json output support
Schema and live example:
bond: IFLA_INFO_DATA
{
    "mode": {
        "type": "string",
        "attr": "IFLA_BOND_MODE"
    },
    "active_slave": {
        "type": "string",
        "attr": "IFLA_BOND_ACTIVE_SLAVE",
        "mutually_exclusive": {
            "active_slave_index": {
                "type": "int",
                "comment": "if active slave doesn't have a valid ifname"
            }
        }
    },
    "miimon": {
        "type": "uint",
        "attr": "IFLA_BOND_MIIMON"
    },
    "updelay": {
        "type": "uint",
        "attr": "IFLA_BOND_UPDELAY"
    },
    "downdelay": {
        "type": "uint",
        "attr": "IFLA_BOND_DOWNDELAY"
    },
    "use_carrier": {
        "type": "uint",
        "attr": "IFLA_BOND_USE_CARRIER"
    },
    "arp_interval": {
        "type": "uint",
        "attr": "IFLA_BOND_ARP_INTERVAL"
    },
    "arp_ip_target": {
        "type": "array",
        "attr": "IFLA_BOND_ARP_IP_TARGET",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "arp_validate": {
        "type": "string",
        "attr": "IFLA_BOND_ARP_VALIDATE"
    },
    "arp_all_targets": {
        "type": "string",
        "attr": "IFLA_BOND_ARP_ALL_TARGETS"
    },
    "primary": {
        "type": "string",
        "attr": "IFLA_BOND_PRIMARY",
        "mutually_exclusive": {
            "primary_index": {
                "type": "int",
                "comment": "if primary doesn't have a valid ifname"
            }
        }
    },
    "primary_reselect": {
        "type": "string",
        "attr": "IFLA_BOND_PRIMARY_RESELECT"
    },
    "fail_over_mac": {
        "type": "string",
        "attr": "IFLA_BOND_FAIL_OVER_MAC"
    },
    "xmit_hash_policy": {
        "type": "string",
        "attr": "IFLA_BOND_XMIT_HASH_POLICY"
    },
    "resend_igmp": {
        "type": "uint",
        "attr": "IFLA_BOND_RESEND_IGMP"
    },
    "num_peer_notif": {
        "type": "uint",
        "attr": "IFLA_BOND_NUM_PEER_NOTIF"
    },
    "all_slaves_active": {
        "type": "uint",
        "attr": "IFLA_BOND_ALL_SLAVES_ACTIVE"
    },
    "min_links": {
        "type": "uint",
        "attr": "IFLA_BOND_MIN_LINKS"
    },
    "lp_interval": {
        "type": "uint",
        "attr": "IFLA_BOND_LP_INTERVAL"
    },
    "packets_per_slave": {
        "type": "uint",
        "attr": "IFLA_BOND_PACKETS_PER_SLAVE"
    },
    "ad_lacp_rate": {
        "type": "string",
        "attr": "IFLA_BOND_AD_LACP_RATE"
    },
    "ad_select": {
        "type": "string",
        "attr": "IFLA_BOND_AD_SELECT"
    },
    "ad_info": {
        "type": "dict",
        "attr": "IFLA_BOND_AD_INFO",
        "dict": {
            "aggregator": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_AGGREGATOR"
            },
            "num_ports": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_NUM_PORTS"
            },
            "actor_key": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_ACTOR_KEY"
            },
            "partner_key": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_PARTNER_KEY"
            },
            "partner_mac": {
                "type": "string",
                "attr": "IFLA_BOND_AD_INFO_PARTNER_MAC"
            }
        }
    },
    "ad_actor_sys_prio": {
        "type": "uint",
        "attr": "IFLA_BOND_AD_ACTOR_SYS_PRIO"
    },
    "ad_user_port_key": {
        "type": "uint",
        "attr": "IFLA_BOND_AD_USER_PORT_KEY"
    },
    "ad_actor_system": {
        "type": "string",
        "attr": "IFLA_BOND_AD_ACTOR_SYSTEM"
    },
    "tlb_dynamic_lb": {
        "type": "uint",
        "attr": "IFLA_BOND_TLB_DYNAMIC_LB"
    }
}

$ ip link add dev bond42 type bond
$ ip link set dev swp5 master bond42
$ ip link set dev bond42 up
$ ip link set dev swp5 up
$ ip -details -json link show
[{
        "ifindex": 7,
        "ifname": "swp5",
        "flags": ["BROADCAST","MULTICAST","SLAVE","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "master": "bond42",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_slave_kind": "bond",
            "info_slave_data": {
                "state": "BACKUP",
                "mii_status": "UP",
                "link_failure_count": 0,
                "perm_hwaddr": "08:00:27:5c:03:c6",
                "queue_id": 0,
                "ad_aggregator_id": 1,
                "ad_actor_oper_port_state": 79,
                "ad_partner_oper_port_state": 1
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 14,
        "ifname": "bond42",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","MASTER","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_info": {
                    "aggregator": 1,
                    "num_ports": 1,
                    "actor_key": 0,
                    "partner_key": 1,
                    "partner_mac": "00:00:00:00:00:00"
                },
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin e4a1216aeb ip: iplink.c: open/close json obj for ip -brief -json link show dev DEV
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin d0e720111a ip: ipaddress.c: add support for json output
This patch converts all output (mostly fprintfs) to the new ip_print api
which handle both regular and json output.
Initialize a json_writer and open an array object if -json was specified.
Note that the JSON attribute naming follows the NETLINK_ATTRIBUTE naming.

In many places throughout the code, IP, matches integer values with
hardcoded strings tables, such as link mode, link operstate or link
family.
In JSON context, this will result in a named string field. In the
very unlikely event that the requested index is out of bound, IP
displays the raw integer value. For JSON context this result in
having a different integer field example bellow:

if (mode >= ARRAY_SIZE(link_modes))
	print_int(PRINT_ANY, "linkmode_index", "mode %d ", mode);
else
	print_string(PRINT_ANY, "linkmode", "mode %s ",
		     link_modes[mode]);

The "_index" suffix is open to discussion and it is something that I came
up with. The bottom line is that you can't have a string field that may
become an int field in specific cases. Programs written in strongly type
languages (like C) might break if they are expecting a string value and
got an integer instead. We don't want to confuse anybody or make the code
even more complicated handling these specifics cases.
Hence the extra "_index" field that is easy to check for and deal with.

JSON schema, followed by live example:

Live config used:
$ ip link add dev vxlan42 type vxlan id 42
$ ip link add dev bond0 type bond
$ ip link add name swp1.50 link swp1 type vlan id 50
$ ip link add dev br0 type bridge
$ ip link set dev vxlan42 master br0
$ ip link set dev bond0 master br0
$ ip link set dev swp1.50 master br0
$ ip link set dev br0 up

$ ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0
addrgenmode eui64
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:db:31:88 brd ff:ff:ff:ff:ff:ff promiscuity 0
addrgenmode eui64
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 0
addrgenmode eui64
10: vxlan42: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master br0 state
DOWN mode DEFAULT group default
    link/ether 4a:d9:91:42:a2:d2 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 42 srcport 0 0 dstport 8472 ageing 300
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8001 port_no
0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop master br0
state DOWN mode DEFAULT group default
    link/ether e2:aa:7b:17:c5:14 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8002 port_no
0x2 designated_port 32770 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
12: swp1.50@swp1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master
br0 state DOWN mode DEFAULT group default
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vlan protocol 802.1Q id 50 <REORDER_HDR>
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8003 port_no
0x3 designated_port 32771 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
13: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state
DOWN mode DEFAULT group default
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  244.44 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64

// Schema for: ip -brief link show
[
    {
        "deleted": {
            "type": "bool",
            "attr": "RTM_DELLINK"
        },
        "link": {
            "type": "string",
            "attr": "IFLA_LINK"
        },
        "ifname": {
            "type": "string",
            "attr": "IFNAME"
        },
        "operstate": {
            "type": "string",
            "attr": "IFLA_OPERSTATE",
            "mutually_exclusive": {
                "operstate_index": {
                    "type": "uint",
                    "comment": "if state >= ARRAY_SIZE(oper_states)"
                }
            }
        },
        "address": {
            "type": "string",
            "attr": "IFLA_ADDRESS"
        },
        "flags": {
            "type": "array",
            "attr": "IFF_LOOPBACK, IFF_BROADCAST...IFF_*"
        },
        "addr_info": {
            "type": "array",
            "array": [
                {
                    "deleted": {
                        "type": "bool",
                        "attr": "RTM_DELADDR"
                    },
                    "family": {
                        "type": "string",
                        "attr": "ifa->ifa_family",
                        "mutually_exclusive": {
                            "family_index": {
                                "type": "uint",
                                "comment": "if family is not known"
                            }
                        }
                    },
                    "local": {
                        "type": "string",
                        "attr": "IFA_LOCAL"
                    },
                    "address": {
                        "type": "string",
                        "attr": "IFLA_LOCAL && IFA_ADDRESS"
                    },
                    "prefixlen": {
                        "type": "int",
                        "attr": "IFLA_LOCAL"
                    }
                }
            ]
        }
    }
]

$ ip -json -brief link show
[{
        "ifname": "lo",
        "operstate": "UNKNOWN",
        "address": "00:00:00:00:00:00",
        "flags": ["LOOPBACK","UP","LOWER_UP"]
    },{
        "ifname": "eth0",
        "operstate": "UP",
        "address": "08:00:27:db:31:88",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"]
    },{
        "ifname": "swp1",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["BROADCAST","MULTICAST"]
    },{
        "ifname": "vxlan42",
        "operstate": "DOWN",
        "address": "4a:d9:91:42:a2:d2",
        "flags": ["BROADCAST","MULTICAST"]
    },{
        "ifname": "bond0",
        "operstate": "DOWN",
        "address": "e2:aa:7b:17:c5:14",
        "flags": ["BROADCAST","MULTICAST","MASTER"]
    },{
        "link": "swp1",
        "ifname": "swp1.50",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["BROADCAST","MULTICAST","M-DOWN"]
    },{
        "ifname": "br0",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","UP"]
    }
]

Schema for normal plus -details: ip -json -details link show

[
    {
        "deleted": {
            "type": "bool",
            "attr": "RTM_DELLINK"
        },
        "ifindex": {
            "type": "int"
        },
        "ifname": {
            "type": "string",
            "attr": "IFLA_IFNAME"
        },
        "link": {
            "type": "string",
            "attr": "IFLA_LINK",
            "mutually_exclusive": {
                "link_index": {
                    "type": "int",
                    "comment": "if IFLA_LINK_NETNSID exists"
                }
            }
        },
        "flags": {
            "type": "array",
            "attr": "IFF_LOOPBACK, IFF_BROADCAST...IFF_*"
        },
        "mtu": {
            "type": "int",
            "attr": "IFLA_MTU"
        },
        "xdp": {
            "type": "object",
            "attr": "IFLA_XDP",
            "object": {
                "mode": {
                    "type": "utin",
                    "attr": "IFLA_XDP_ATTACHED"
                },
                "prog_id": {
                    "type": "uint",
                    "attr": "IFLA_XDP_PROG_ID"
                }
            }
        },
        "qdisc": {
            "type": "string",
            "attr": "IFLA_QDISC"
        },
        "master": {
            "type": "string",
            "attr": "IFLA_MASTER"
        },
        "operstate": {
            "type": "string",
            "attr": "IFLA_OPERSTATE",
            "mutually_exclusive": {
                "operstate_index": {
                    "type": "uint",
                    "comment": "if state >= ARRAY_SIZE(oper_states)"
                }
            }
        },
        "linkmode": {
            "type": "string",
            "attr": "IFLA_LINKMODE",
            "mutually_exclusive": {
                "linkmode_index": {
                    "type": "uint",
                    "comment": "if mode >= ARRAY_SIZE(link_modes)"
                }
            }
        },
        "group": {
            "type": "string",
            "attr": "IFLA_GROUP"
        },
        "txqlen": {
            "type": "int",
            "attr": "IFLA_TXQLEN"
        },
        "event": {
            "type": "string",
            "attr": "IFLA_EVENT",
            "mutually_exclusive": {
                "event_index": {
                    "type": "uint",
                    "attr": "IFLA_OPERSTATE",
                    "comment": "if event >= ARRAY_SIZE(link_events)"
                }
            }
        },
        "link_type": {
            "type": "string",
            "attr": "ifi_type"
        },
        "address": {
            "type": "string",
            "attr": "IFLA_ADDRESS"
        },
        "link_pointtopoint": {
            "type": "bool",
            "attr": "IFF_POINTOPOINT"
        },
        "broadcast": {
            "type": "string",
            "attr": "IFLA_BROADCAST"
        },
        "link_netnsid": {
            "type": "int",
            "attr": "IFLA_LINK_NETNSID"
        },
        "proto_down": {
            "type": "bool",
            "attr": "IFLA_PROTO_DOWN"
        },

        //
        // if -details
        //

        "promiscuity": {
            "type": "uint",
            "attr": "IFLA_PROMISCUITY"
        },
        "linkinfo": {
            "type": "dict",
            "attr": "IFLA_LINKINFO",
            "dict": {
                "info_kind": {
                    "type": "string",
                    "attr": "IFLA_INFO_KIND"
                },
                "info_data": {
                    "type": "dict",
                    "attr": "IFLA_INFO_DATA",
                    "dict": {}
                },
                "info_xstats": {
                    "type": "dict",
                    "attr": "IFLA_INFO_XSTATS",
                    "dict": {}
                },
                "info_slave_data": {
                    "type": "dict",
                    "attr": "IFLA_INFO_SLAVE_DATA",
                    "dict": {}
                }
            }
        },
        "inet6_addr_gen_mode": {
            "type": "string",
            "attr": "IFLA_INET6_ADDR_GEN_MODE"
        },
        "num_tx_queues": {
            "type": "uint",
            "attr": "IFLA_NUM_TX_QUEUES"
        },
        "num_rx_queues": {
            "type": "uint",
            "attr": "IFLA_NUM_RX_QUEUES"
        },
        "gso_max_size": {
            "type": "uint",
            "attr": "IFLA_GSO_MAX_SIZE"
        },
        "gso_max_segs": {
            "type": "uint",
            "attr": "IFLA_GSO_MAX_SEGS"
        },
        "phys_port_name": {
            "type": "string",
            "attr": "IFLA_PHYS_PORT_NAME"
        },
        "phys_port_id": {
            "type": "string",
            "attr": "IFLA_PHYS_PORT_ID"
        },
        "phys_switch_id": {
            "type": "string",
            "attr": "IFLA_PHYS_SWITCH_ID"
        },
        "ifalias": {
            "type": "string",
            "attr": "IFLA_IFALIAS"
        },
        "stats": {
            "type": "dict",
            "attr": "IFLA_STATS",
            "dict": {
                "rx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "over_errors": {
                            "type": "uint"
                        },
                        "multicast": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "length_errors": {
                            "type": "uint"
                        },
                        "crc_errors": {
                            "type": "uint"
                        },
                        "frame_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "missed_errors": {
                            "type": "uint"
                        },
                        "nohandler": {
                            "type": "uint"
                        }
                    }
                },
                "tx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "carrier_errors": {
                            "type": "uint"
                        },
                        "collisions": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "aborted_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "window_errors": {
                            "type": "uint"
                        },
                        "heartbeat_errors": {
                            "type": "uint"
                        },
                        "carrier_changes": {
                            "type": "uint"
                        }
                    }
                }
            }
        },
        "stats64": {
            "type": "dict",
            "attr": "IFLA_STATS64",
            "dict": {
                "rx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "over_errors": {
                            "type": "uint"
                        },
                        "multicast": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "length_errors": {
                            "type": "uint"
                        },
                        "crc_errors": {
                            "type": "uint"
                        },
                        "frame_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "missed_errors": {
                            "type": "uint"
                        },
                        "nohandler": {
                            "type": "uint"
                        }
                    }
                },
                "tx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "carrier_errors": {
                            "type": "uint"
                        },
                        "collisions": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "aborted_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "window_errors": {
                            "type": "uint"
                        },
                        "heartbeat_errors": {
                            "type": "uint"
                        },
                        "carrier_changes": {
                            "type": "uint"
                        }
                    }
                }
            }
        },
        "vfinfo_list": {
            "type": "array",
            "attr": "IFLA_VFINFO_LIST",
            "array": [
                {
                    "vf": {
                        "type": "int"
                    },
                    "mac": {
                        "type": "string"
                    },
                    "vlan_list": {
                        "type": "array",
                        "attr": "IFLA_VF_VLAN_LIST",
                        "array": [
                            {
                                "vlan": {
                                    "type": "int"
                                },
                                "qos": {
                                    "type": "int"
                                },
                                "protocol": {
                                    "type": "string"
                                }
                            }
                        ]
                    },
                    "vlan": {
                        "type": "int",
                        "attr": "!IFLA_VF_VLAN_LIST && IFLA_VF_VLAN"
                    },
                    "qos": {
                        "type": "int",
                        "attr": "!IFLA_VF_VLAN_LIST && IFLA_VF_VLAN"
                    },
                    "tx_rate": {
                        "type": "int"
                    },
                    "rate": {
                        "type": "dict",
                        "attr": "IFLA_VF_RATE",
                        "dict": {
                            "max_tx": {
                                "type": "int"
                            },
                            "min_tx": {
                                "type": "int"
                            }
                        }
                    },
                    "spoofchk": {
                        "type": "bool",
                        "attr": "IFLA_VF_SPOOFCHK"
                    },
                    "link_state": {
                        "type": "string",
                        "attr": "IFLA_VF_LINK_STATE"
                    },
                    "trust": {
                        "type": "bool",
                        "attr": "IFLA_VF_TRUST"
                    },
                    "query_rss_en": {
                        "type": "bool",
                        "attr": "IFLA_VF_RSS_QUERY_EN"
                    },
                    "stats": {
                        "type": "dict",
                        "attr": "IFLA_VF_STATS",
                        "dict": {
                            "rx": {
                                "type": "dict",
                                "dict": {
                                    "bytes": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_RX_BYTES"
                                    },
                                    "packets": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_RX_PACKETS"
                                    },
                                    "multicast": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_MULTICAST"
                                    },
                                    "broadcast": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_BROADCAST"
                                    }
                                }
                            },
                            "tx": {
                                "type": "dict",
                                "dict": {
                                    "bytes": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_TX_BYTES"
                                    },
                                    "packets": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_TX_PACKETS"
                                    }
                                }
                            }
                        }
                    }
                }
            ]
        }
    }
]

Example with the config previously given:
Note that here, linkinfo attributes are not populated.
The schemas are provided in each link type patches.

$ ip -details -json link show
[{
        "ifindex": 1,
        "ifname": "lo",
        "flags": ["LOOPBACK","UP","LOWER_UP"],
        "mtu": 65536,
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "loopback",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 2,
        "ifname": "eth0",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:db:31:88",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 3,
        "ifname": "swp1",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 10,
        "ifname": "vxlan42",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "4a:d9:91:42:a2:d2",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {}
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "bond0",
        "flags": ["BROADCAST","MULTICAST","MASTER"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "e2:aa:7b:17:c5:14",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "swp1.50",
        "link": "swp1",
        "flags": ["BROADCAST","MULTICAST","M-DOWN"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "vlan",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 13,
        "ifname": "br0",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 6377572f0a ip: ip_print: add new API to print JSON or regular format output
To avoid code duplication and have a ligther impact on most of the files,
these functions were made to handle both stdout (FP context) or JSON
output. Using this api, the changes are easier to read and the code
stays as compact as possible.

includes json_writer.h in ip_common.h to make the lib/json_writer.c
functions available to the new "ip_print" api.

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 7252f16b2d json_writer: add new json handlers (null, float with format, lluint, hu)
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 5df6077259 ip: add new command line argument -json (mutually exclusive with -color)
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 959f142863 color: add new COLOR_NONE and disable_color function
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
David Lebrun 0439990238 man: add documentation for seg6local lwt
This patch adds documentation in the ip-route man page
about the seg6local lightweight tunnel.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
David Lebrun 8db158b9ca iproute: add support for SRv6 local segment processing
This patch adds support for the seg6local lightweight tunnel
("ip route add ... encap seg6local ...").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
David Lebrun 00e76d4da3 iproute: add helper functions for SRH processing
This patch adds two helper functions to print and parse
Segment Routing Headers.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
Stephen Hemminger b7f7c1b817 include: add pfkeyv2.h drop ipv6.h
pfkeyv2.h is include by ipsec.
linux/ipv6.h is not included by any code in current tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-15 16:43:16 -07:00
Stephen Hemminger e0495b84ab seg6: add include/linux/seg6_local.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-15 16:35:30 -07:00
Stephen Hemminger 3af3d358a3 more BPF headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-10 16:42:35 -07:00
Stephen Hemminger 16ab6c47ba Merge branch 'master' into net-next 2017-08-10 16:41:59 -07:00
Daniel Borkmann 8cc360fe48 bpf: unbreak libelf linkage for bpf obj loader
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-10 16:40:02 -07:00
Stephen Hemminger 5926276a2a Merge branch 'master' into net-next 2017-08-09 09:15:30 -07:00
David Ahern fb6cb30774 lib: Dump ext-ack string by default
In time, errfn can be implemented for link, route, etc commands to
give a much more detailed response (e.g., point to the attribute
that failed). Doing so is much more complicated to process the
message and convert attribute ids to names.

In any case the error string returned by the kernel should be dumped
to the user, so make that happen now.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-09 09:14:01 -07:00
Stephen Hemminger 7ef36c8cea Merge branch 'master' into net-next 2017-08-09 09:11:48 -07:00
Stephen Hemminger fcfcc40b7d vti: print keys in hex not dotted notation
The ikey and okey value are normal u32 values. The input accepts
them in dotted, hex or decimal form. For output, hex seems like
the best form since they are not really addresses.

Suggested-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 09:11:02 -07:00
Stephen Hemminger 69fed534a5 change how Config is used in Makefile's
The recent LIBMNL changes was made more difficult to debug because
of how Config is handle in clean make. The Config file is generated
by top level make, but since it is not recursive, the values generated
would not be visible on a clean make.

The change is to not include Config in top level make, and move
all the conditionals down into sub makefiles. Not ideal, but beter
than going full autoconf route. Or forcing separate configure
step.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 09:10:52 -07:00
Stephen Hemminger e9155685b7 Merge branch 'master' into net-next 2017-08-09 08:41:34 -07:00
Stephen Hemminger 2a80154fde vti6: fix local/remote any addr handling
According to the IPv4 behavior of 'ip' it should be possible
to omit the arguments for local and remote address.
Without this patch omitting these parameters would lead to
uninitialized memory being interpreted as IPv6 addresses.

Reported-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:39:27 -07:00
Stephen Hemminger 6ff66acc60 tc, ip: more Makefile updates for LIBMNL
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:38:51 -07:00
Stephen Hemminger 96421f92ef include: update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:35:26 -07:00
Stephen Hemminger ee66d60ee8 Merge branch 'master' into net-next 2017-08-09 08:34:38 -07:00
Alexander Alemayhu 4a340fe9b4 examples/bpf: update list of examples
Remove deleted examples and add the new map in map example.

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-09 08:34:13 -07:00
Stephen Hemminger 089f85694a lib: need to pass LIBMNL flag
Missed on earlier conversion.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:33:31 -07:00
Stephen Hemminger 9d08319d08 Merge branch 'master' into net-next 2017-08-07 12:29:19 -07:00
Stephen Hemminger 7d23fa5591 lib: fix extended ack with and without libmnl
The code was always building without libmnl support, so it was
doing nothing.

Fixes: b6432e68ac ("iproute: Add support for extended ack to rtnl_talk")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-07 12:01:49 -07:00
Jamal Hadi Salim 5c8176ddbc actions: update the man page to describe the "since" time filter
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-04 13:16:51 -07:00
Jamal Hadi Salim 9e71352581 tc actions: Improved batching and time filtered dumping
dump more than TCA_ACT_MAX_PRIO actions per batch when the kernel
supports it.

Introduced keyword "since" for time based filtering of actions.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
  match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
...

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
  match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-04 13:16:51 -07:00
Stephen Hemminger c936d85e19 Merge branch 'master' into net-next 2017-08-04 13:16:47 -07:00
Casey Callendrello d6a4076b6b netns: make /var/run/netns bind-mount recursive
When ip netns {add|delete} is first run, it bind-mounts /var/run/netns
on top of itself, then marks it as shared. However, if there are already
bind-mounts in the directory from other tools, these would not be
propagated. Fix this by recursively bind-mounting.

Signed-off-by: Casey Callendrello <casey.callendrello@coreos.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-08-04 12:08:52 -07:00
Stephen Hemminger 0562af4a07 Merge branch 'master' into net-next 2017-08-04 12:05:31 -07:00
Stephen Hemminger aba9c23a6e ss: enclose IPv6 address in brackets
Based on patch by Lehner Florian <dev@der-flo.net>

Adds support for RFC2732 IPv6 address format with brackets.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-04 12:04:04 -07:00
Stephen Hemminger 566421c8f7 Merge branch 'master' into net-next 2017-08-04 09:54:44 -07:00
Stephen Hemminger b6432e68ac iproute: Add support for extended ack to rtnl_talk
Add support for extended ack error reporting via libmnl.
Add a new function rtnl_talk_extack that takes a callback as an input
arg. If a netlink response contains extack attributes, the callback is
is invoked with the the err string, offset in the message and a pointer
to the message returned by the kernel.

If iproute2 is built without libmnl, it will still work but
extended error reports from kernel will not be available.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-04 09:54:00 -07:00
Ido Schimmel 4b3409d863 iproute: Display offload indication per-nexthop
Since kernel commit 475abbf1ef67 ("ipv4: fib: Set offload indication
according to nexthop flags") offload indication is reported on a
per-nexthop basis.

Adjust iproute2 to display it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
2017-08-03 16:13:16 -07:00
Stephen Hemminger 72e4ea5eb6 update headers from 4.13 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 16:12:40 -07:00
Stephen Hemminger 89bcb455a1 Merge branch 'master' into net-next 2017-08-03 16:11:22 -07:00
Stephen Hemminger 620fc6696d tc: fix m_simple usage
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 16:10:18 -07:00
Phil Sutter e2a055dd23 tc-simple: Fix documentation
- CONTROL has to come last, otherwise 'index' applies to gact and not
  simple itself.
- Man page wasn't updated to reflect syntax changes.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-03 16:02:44 -07:00
Phil Sutter 34705c807a Really fix get_addr() and get_prefix() error messages
Both functions take the desired address family as a parameter. So using
that to notify the user what address family was expected is correct,
unlike using dst->family which will tell the user only what address
family was specified.

The situation which commit 334af76143 tried to fix was when 'ip'
would accept addresses from multiple families. In that case, the family
parameter is set to AF_UNSPEC so that get_addr_1() may accept any valid
address.

This patch introduces a wrapper around family_name() which returns the
string "any valid" for AF_UNSPEC instead of the three question marks
unsuitable for use in error messages.

Tests for AF_UNSPEC:

| # ip a a 256.10.166.1/24 dev d0
| Error: any valid prefix is expected rather than "256.10.166.1/24".

| # ip neighbor add proxy 2001:db8::g dev d0
| Error: any valid address is expected rather than "2001:db8::g".

Tests for explicit address family:

| # ip -6 addrlabel add prefix 1.1.1.1/24 label 123
| Error: inet6 prefix is expected rather than "1.1.1.1/24".

| # ip -4 addrlabel add prefix dead:beef::1/24 label 123
| Error: inet prefix is expected rather than "dead:beef::1/24".

Reported-by: Jaroslav Aster <jaster@redhat.com>
Fixes: 334af76143 ("fix get_addr() and get_prefix() error messages")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-03 16:01:03 -07:00
Stephen Hemminger cc21ebe843 update headers from 4.13-rc4
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 15:57:26 -07:00
Phil Sutter 3da3ebfca8 bpf: Make bytecode-file reading a little more robust
bpf_parse_string() will now correctly handle:

- Extraneous whitespace,
- OPs on multiple lines and
- overlong file names.

The added feature of allowing to have OPs on multiple lines (like e.g.
tcpdump prints them) is rather a side effect of fixing detection of
malformed bytecode files having random content on a second line, like
e.g.:

| 4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0
| foobar

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-03 15:56:48 -07:00
Stephen Hemminger f73ac674d0 ip: change flag names to an array
For the most of the address flags, use a table of values rather
than open coding every value.  This allows for easier inevitable
expansion of flags.

This also fixes the missing stable-privacy flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-01 08:37:53 -07:00
Stephen Hemminger c369dc803b Update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-31 20:55:27 -07:00
Stephen Hemminger 1f215a308b Merge branch 'master' into net-next 2017-07-31 17:05:09 -07:00
Hangbin Liu 5ce897a03b utils: return default family when rtm_family is not RTNL_FAMILY_IPMR/IP6MR
When we get a multicast route, the rtm_type is RTN_MULTICAST, but the
rtm_family may be AF_INET. If we only check the type with RTNL_FAMILY_IPMR,
we will get malformed address. e.g.

+ ip -4 route add multicast 172.111.1.1 dev em1 table main

Before fix:
+ ip route list type multicast table main
multicast ac6f:101:800:400:400:0:3c00:0 dev em1 scope link

After fix:
+ ip route list type multicast table main
multicast 172.111.1.1 dev em1 scope link

Fixes: 56e3eb4c34 ("ip: route: fix multicast route dumps")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-07-27 11:27:17 -07:00
Matteo Croce d3f0b09197 netns: more input validation
ip netns accepts invalid input as namespace name like an empty string or a
string longer than the maximum file name length.
Check that the netns name is not empty and less than or equal to NAME_MAX.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-27 11:25:20 -07:00
Girish Moodalbail c2a85c3bcd geneve: support for modifying geneve device
Ability to change geneve device attributes was added to kernel through
commit 5b861f6baa3a ("geneve: add rtnl changelink support"), however one
cannot do the same through ip-link(8) command.  Changing the allowed
geneve device attributes using 'ip link set <geneve_name> type geneve id
<geneve_id> <allowed_attributes>' currently fails with 'operation not
supported' error.  This patch adds support for it.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
2017-07-27 11:22:50 -07:00
Stephen Hemminger 08f5fdb201 Merge branch 'master' into net-next 2017-07-25 11:59:13 -07:00
Daniel Borkmann 95ae9a4870 bpf: fix mnt path when from env
When bpf fs mount path is from env, behavior is currently broken as
we continue to search in default paths, thus fix this up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-25 11:43:28 -07:00
Daniel Borkmann ecb05c0f99 bpf: improve error reporting around tail calls
Currently, it's still quite hard to figure out if a prog passed the
verifier, but later gets rejected due to different tail call ownership.
Figure out whether that is the case and provide appropriate error
messages to the user.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-25 11:43:28 -07:00
Élie Bouttier 2f406f2d0b ip route: replace exits with returns
This patch replaces exits with returns in ip route
commands.

Allows to continue when invoked with ip -batch.

Signed-off-by: Élie Bouttier <elie@bouttier.eu>
2017-07-25 11:37:49 -07:00
Philip Prindeville adbb296594 iproute2: add support for GRE ignore-df knob
In the presence of firewalls which improperly block ICMP Unreachable
(including Fragmentation Required) messages, Path MTU Discovery is
prevented from working.

The workaround is to handle IPv4 payloads opaquely, ignoring the DF
bit.

Kernel commit 22a59be8b7693eb2d0897a9638f5991f2f8e4ddd ("net: ipv4:
Add ability to have GRE ignore DF bit in IPv4 payloads") is
complemented by this user-space changeset which exposes control of
this setting.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
2017-07-20 17:25:54 -07:00
Matteo Croce 79928fd055 netns: avoid directory traversal
ip netns keeps track of created namespaces with bind mounts named
/var/run/netns/<namespace>. No input sanitization is done, allowing creation and
deletion of files relatives to /var/run/netns or, if the path is non existent or
invalid, allows to create "untracked" namespaces (invisible to the tool).

This commit denies creation or deletion of namespaces with names contaning
"/" or matching exactly "." or "..".

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-20 17:23:52 -07:00
Nikhil Gajendrakumar 44e0f6f3cd bridge: this patch adds json support for bridge mdb show
This patch adds json output to bridge mdb show

Normal Output:
$ bridge -d -s mdb show
dev br0 port swp3 grp 239.0.0.1 temp  vid 128 172.26
dev br0 port swp3 grp 239.0.0.1 temp  vid 64 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 1024 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 256 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 1 172.26
dev br0 port swp3 grp 239.0.0.1 temp  vid 1 172.26
router ports on br0: swp4    0.00 permanent
router ports on br0: swp5    0.00 permanent

Json Output:
$ bridge -d -s -j mdb show
{
    "mdb": [{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 128,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 64,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 1024,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 256,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 1,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 1,
            "timer": " 166.74"
        }
    ],
    "router": {
        "br0": [{
                "port": "swp4",
                "timer": "   0.00",
                "type": "permanent"
            },{
                "port": "swp5",
                "timer": "   0.00",
                "type": "permanent"
            }
        ]
    }
}

Signed-off-by: Nikhil Gajendrakumar <nikhil@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-07-18 17:32:38 -07:00
Stephen Hemminger edadd6b076 Merge branch 'master' into net-next 2017-07-18 17:31:09 -07:00
Matteo Croce b09515553f tc: fix typo in manpage
Fix a typo in the 'tc' manpage and reword some sentences.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-18 17:25:59 -07:00
Daniel Borkmann 779525cd77 bpf: dump id/jited info for cls/act programs
Make use of TCA_BPF_ID/TCA_ACT_BPF_ID that we exposed and print the ID
of the programs loaded and use the new BPF_OBJ_GET_INFO_BY_FD command
for dumping further information about the program, currently whether
the attached program is jited.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Daniel Borkmann 612ff099a1 bpf: support loading map in map from obj
Add support for map in map in the loader and add a small example program.
The outer map uses inner_id to reference a bpf_elf_map with a given ID
as the inner type. Loading maps is done in three passes, i) all non-map
in map maps are loaded, ii) all map in map maps are loaded based on the
inner_id map spec of a non-map in map with corresponding id, and iii)
related inner maps are attached to the map in map with given inner_idx
key. Pinned objetcs are assumed to be managed externally, so they are
only retrieved from BPF fs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Daniel Borkmann 23b2ed2d64 bpf: remove obsolete samples
Remove old samples that have been added in pre BPF fs days which were
using file descriptor passing. It's long obsolete and not encouraged
to use this method given BPF fs is the default way like in the other
samples.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Roopa Prabhu 2e86ed542d iproute: extend route get for mpls routes
This patch extends route get to support mpls specific
route attributes like RTA_NEWDST.

Input:
RTA_DST - input label
RTA_NEWDST - labels in packet for multipath selection

By default the getroute handler returns matched
nexthop label, via and oif

With fibmatch keyword (RTM_F_FIB_MATCH flag), full matched
route is returned.

example:
$ip -f mpls route show
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
201
        nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
        nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12

$ip -f mpls route get 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get 101
101 as to 102/103 via inet 172.16.2.2 dev virt1-2

$ip -f mpls route get as to 302/303 101
101 as to 302/303 via inet 172.16.12.2 dev virt1-12

$ip -f mpls route get fibmatch 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get fibmatch 101
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-07-18 17:17:27 -07:00
Stephen Hemminger 89ec74a3ea remove duplicated #include's
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-18 17:17:15 -07:00
Stephen Hemminger 517771e271 update headers to 4.13-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-18 17:16:56 -07:00
Stephen Hemminger f0b9b79572 update kernel headers from net-next
Just as net-next merge window opens.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-17 18:32:03 -07:00
Stephen Hemminger ef513fb04e Merge branch 'master' into net-next 2017-07-05 09:12:16 -07:00
Stephen Hemminger cdb90ce406 v4.12.0 2017-07-05 09:07:31 -07:00
Stephen Hemminger 79e7918a2a Merge branch 'master' into net-next 2017-07-05 09:07:30 -07:00
Krister Johansen 288c28bc11 iptunnel: add support for mpls/ip to ipip tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:59 -07:00
Krister Johansen f005b700cf iptunnel: add support for mpls/ip to sit tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:59 -07:00
Krister Johansen 7baca946c4 iptunnel: document mode parameter for sit tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:58 -07:00
Lucas Bates 2ce280de9f Add new man page for tc actions.
This page is to highlight all operations and options that are
applicable to all tc actions.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-07-05 09:00:37 -07:00
Roman Mashak 81ba3e6fbd tc: updated ife man page.
Explain when skbmark encoding may fail.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-30 15:27:07 -07:00
Jakub Kicinski 1b5e809466 bpf: allow requesting XDP HW offload
Let XDP link set command request that the program be offloaded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Jakub Kicinski 1468381415 bpf: add xdpdrv for requesting XDP driver mode
Allow user to select XDP DRV_MODE flag by using xdpdrv keyword
instead of xdp or xdpgeneric.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Jakub Kicinski 2de3379701 bpf: print xdp offloaded mode
Add interpretation of XDP_ATTACHED_HW mode on dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Martin KaFai Lau 0b4ea60b5a bpf: Add support for IFLA_XDP_PROG_ID
This patch adds support to the newly added IFLA_XDP_PROG_ID.

./ip link show dev eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:2 qdisc [...]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Stephen Hemminger 35a004dc8a update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-27 16:11:12 -07:00
Stephen Hemminger 1fd8a8e23d Merge branch 'master' into net-next 2017-06-27 16:10:55 -07:00
Daniel Borkmann c9c3720d14 bpf: indicate lderr when bpf_apply_relo_data fails
When LLVM wrongly generates a rodata relo entry (llvm BZ #33599),
then just bail out instead of probing for prog w/o reloc, which
will fail in this case anyway.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:08:52 -07:00
Lukas Braun 3288e9b426 man: ip-route.8: Mention that lower metric means higher priority
This is quite counter-intuitive when using the 'preference' keyword.

Signed-off-by: Lukas Braun <koomi@moshbit.net>
2017-06-27 16:07:28 -07:00
Phil Sutter f2ca4a7a6f man: Collect names of man pages automatically
As it turned out, forgetting to add a man page to the respective
Makefile when introducing it is a common mistake. Overcome this once and
for all by using $(wildcard) function in Makefiles.

Fixes: 7124942942 ("genl: add manpage")
Fixes: 958cd21094 ("ifcfg: add manpage")
Fixes: e1b7f883e5 ("man: add documentation for IPv6 SR commands")
Fixes: 1949f82cdf ("Introduce ip vrf command")
Fixes: 535194a172 ("tipc: add peer remove functionality")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-06-27 16:00:09 -07:00
Roman Mashak 7cca407e28 tc: updated tc-u32 man page to reflect skip_sw and skip_hw parameters.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-21 08:34:29 -07:00
Roman Mashak fb12cea8d9 tc: fixed typo in usage text.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-21 08:34:28 -07:00
Jiri Benc 59eb271d1d tc: m_tunnel_key: add csum/nocsum option
Allows control of UDP zero checksum.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Benc 50907a8245 tc: m_tunnel_key: reformat the usage text
Adding new tunnel key fields would cause the usage line overflow 80 chars.
Make the usage text similar to other commands.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Pirko c794b7b179 tc: don't print error message on miss when parsing action with default
In case default control action parsing takes place, it is ok to miss.
So don't print error message.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:07:31 -07:00
Stephen Hemminger 39f3776b50 update headers to get TCA_TUNNEL_CSUM
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-16 09:06:47 -07:00
Stephen Hemminger 236211a763 Merge branch 'master' into net-next 2017-06-16 09:05:53 -07:00
David Lebrun e4319590f7 iproute: fix compilation issue with older glibc
If a header that includes linux/in6.h is included before
iproute's utils.h, then iproute2 fails to compile on older
glibc versions.

Fixes: e8493916a8 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-06-16 09:03:48 -07:00
Hangbin Liu ad0a6a2c63 ip neigh: allow flush FAILED neighbour entry
After upstream commit 5071034e4af7 ('neigh: Really delete an arp/neigh entry
on "ip neigh delete" or "arp -d"'), we could delete a single FAILED neighbour
entry now. But `ip neigh flush` still skip the FAILED entry.

Move the filter after first round flush so we can flush FAILED entry on fixed
kernel and also do not keep retrying on old kernel.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-06-16 09:01:02 -07:00
Stephen Hemminger be8b93e3e2 Merge branch 'master' into net-next 2017-06-15 08:32:53 -07:00
Donald Sharp 3dc98cf2f5 ip: mroute: Add table output to show command
When the user specifies `table all` or `table 0` to
the `ip mroute show` command we dump the entirety of
the known mroute tables.  Without some sort of
divisor to tell us what table we are looking at
the command is useless.

Add `Table: <vrf name>` to the output of 'ip mroute show table 0'

Follow the convention established by 'ip route show table 0'
for when to display

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-06-15 08:29:30 -07:00
Nicolas Dichtel a11b7b71a6 link_gre6: really support encaplimit option
This option is documented in gre6 help, but was not supported.

Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2017-06-15 08:29:30 -07:00
Stephen Hemminger a9ae195a21 xfrm: get #define's from linux includes
Use linux/ipsec.h and linux/in.h to get the definition of IP related
protocols.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-14 10:25:39 -07:00
Jakub Sitnicki 7b201d6019 iproute: Remove useless check for nexthop keyword when setting RTA_OIF
When modifying a route we set the RTA_OIF attribute only if a device was
specified with "dev" or "oif" keyword. But for some unknown reason we
earlier alternatively check also for the presence of "nexthop" keyword,
even though it has no effect. So remove the pointless check.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
2017-06-14 09:56:05 -07:00
Stephen Hemminger b68581d43e more bpf header updates
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-14 09:52:44 -07:00
Arkadi Sharshevsky 8a38e44fad bridge: Distinguish between externally learned vs offloaded FDBs
Distinguish between externally learned vs offloaded FDBs. This is done
in order to indicate that FDBs added by software was successfully
offloaded.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-14 09:50:25 -07:00
Jiri Pirko d5ebd6fdde tc: add support for TRAP action
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Jiri Pirko 18f05d0601 tc: gact: fix control action parsing
parse_action_control helper does advancing of the arg inside. So don't
do it outside.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Or Gerlitz 6ea2c2b1cf tc: flower: add support for matching on ip tos and ttl
Allow users to set flower classifier filter rules which
include matches for ip tos and ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 10:59:53 -07:00
Stephen Hemminger 410556ad99 update headers from net-next (bpf and tc)
More BPF and tc_action values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-08 10:56:14 -07:00
Vlad Yasevich 735a52ceda ip: Add IFLA_EVENT output to ip monitor
Add IFLA_EVENT output so that event types can be viewed with
'monitor' command.  This gives a little more information for why
a given message was received.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
2017-06-05 12:38:19 -07:00
Roopa Prabhu aa883d86c0 ip: extend route get to return matching fib route
Uses newly introduced RTM_GETROUTE flag RTM_F_FIB_MATCH
to return a matching fib route. Introduces 'fibmatch'
keyword to ip route get.

ipv4:
----
$ip route show
default via 192.168.0.2 dev eth0
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

$ip route get 10.0.14.2
10.0.14.2 via 172.16.1.3 dev dummy1  src 172.16.1.1
    cache

$ip route get fibmatch 10.0.14.2
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

ipv6:
----
$ip -6 route show
2001:db9:100::/120  metric 1024
        nexthop via 2001:db8:2::2  dev dummy0 weight 1
        nexthop via 2001:db8:12::2  dev dummy1 weight 1

$ip -6 route get 2001:db9:100::1
2001:db9:100::1 from :: via 2001:db8:12::2 dev dummy1  \
                src 2001:db8:12::1  metric 1024  pref medium

$ip -6 route get fibmatch 2001:db9:100::1
2001:db9:100::/120  metric 1024
        nexthop via 2001:db8:12::2  dev dummy1 weight 1
        nexthop via 2001:db8:2::2  dev dummy0 weight 1

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
2017-06-05 12:33:50 -07:00
Stephen Hemminger d9bcafb4fe updated headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-05 12:31:52 -07:00
Stephen Hemminger a5445c56e1 Merge branch 'master' into net-next 2017-06-05 12:31:19 -07:00
Eli Cohen 5a3ec4ba64 iplink: Update usage in help message
Add to usage message a description of how to configure Infiniband node
and port GUIDs. Also modify the man page to emphasize the GUIDs are
configured for Infiniband VFs.

Fixes: d91fb3f4c7 ("Add support for configuring Infiniband GUIDs")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2017-06-05 12:29:36 -07:00
Oliver Hartkopp efe459c76d ip: link add vxcan support
Since commit a8f820a380a2a06 ('can: add Virtual CAN Tunnel driver (vxcan)')
for Linux 4.12 a virtual CAN tunnel driver analogue to veth is available in
Linux.

This patch adds the ability to create vxcan device pairs.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2017-06-05 12:27:32 -07:00
Stephen Hemminger 309d5c2f83 Merge branch 'master' into net-next 2017-05-30 17:55:17 -07:00
David Ahern 1dddb60503 ip vrf: Add show command
Add show command to list all configured VRF and their table ids.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 63891c7013 ip address: Change print_linkinfo_brief to take filter as an input
Change print_linkinfo_brief to take the filter as an input arg.
If the arg is NULL, use the global filter in ipaddress.c.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 741dd5cd9c ip address: Move filter struct to ip_common.h
Move filter struct to ip_common.h as struct link_filter.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 4ad875944f ip address: Export ip_linkaddr_list
ipaddr_list_flush_or_save generates a list of nlmsg's for links and
optionally for addresses. Move the code into ip_linkaddr_list and
export it along with the supporting infrastructure.

API to use this function is:
        struct nlmsg_chain linfo = { NULL, NULL};
        struct nlmsg_chain ainfo = { NULL, NULL};

        ip_linkaddr_list(family, filter_req, &linfo, &ainfo);

        ... error checking and code looping over linfo/ainfo ...

        free_nlmsg_chain(&linfo);
        free_nlmsg_chain(&ainfo);

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
Stephen Hemminger 6ef590ed88 Merge branch 'master' into net-next 2017-05-30 17:50:47 -07:00
Daniel Borkmann 218560185d bpf: dump error to the user when retrieving pinned prog fails
I noticed we currently don't dump an error message when a pinned
program couldn't be retrieved, thus add a hint to the user.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-30 17:49:09 -07:00
Daniel Borkmann 077bb1803c bpf: update printing of generic xdp mode
Follow-up to d67b9cd28c1d ("xdp: refine xdp api with regards to
generic xdp") in order to update the XDP dumping part.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-30 17:49:09 -07:00
Jiri Pirko 0c30d14d0a tc: flower: add support for tcp flags
Allow user to insert a flower classifier filter rule which includes
match for tcp flags.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-30 17:41:32 -07:00
Stephen Hemminger 2ecb169280 Merge branch 'master' into net-next 2017-05-30 17:40:57 -07:00
Remigiusz Kołłątaj 759fa6086e ip: add handling for new CAN netlink interface
This patch adds handling for new CAN netlink interface introduced in
4.11 kernel:
- IFLA_CAN_TERMINATION,
- IFLA_CAN_TERMINATION_CONST,
- IFLA_CAN_BITRATE_CONST,
- IFLA_CAN_DATA_BITRATE_CONST

Output example:
$ip -d link show can0
6: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 80000
             [   20000,    33333,    50000,    80000,    83333,   100000,
                125000,   150000,   175000,   200000,   225000,   250000,
                275000,   300000,   500000,   625000,   800000,  1000000 ]
          termination 0 [ 0, 120 ]
          clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Signed-off-by: Remigiusz Kołłątaj <remigiusz.kollataj@mobica.com>
2017-05-30 17:39:33 -07:00
Phil Sutter f6fc1055e4 tc: m_xt: Prevent a segfault in libipt
This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These
are still not usable with this patch, but at least tc doesn't crash
anymore when one tries to use them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-30 17:38:19 -07:00
Roi Dayan d315b706e9 devlink: Add option to set and show eswitch encapsulation support
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

The actual encap/decap is carried out (along with the matching and other
actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel
key action.

Possible values are enable/disable.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-05-30 17:36:52 -07:00
David Ahern 05a14fc121 netlink: Change rtnl_dump_done to always show error
The original code which became rtnl_dump_done only shows netlink errors
if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends
the length which contains any error encountered during the dump. Update
rtnl_dump_done to always show the error if there is one.

As an *example* without this patch, dumping a route object that exceeds
the internal buffer size terminates with no message to the user -- the
dump just ends because the NLMSG_DONE attribute was received. With this
patch the user at least gets a message that the dump was aborted.

$ ip ro ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1
172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1
Error: Buffer too small for object
Dump terminated

The point of this patch is to notify the user of a failure versus
silently exiting on a partial dump. Because the NLMSG_DONE attribute
was received, the entire dump needs to be restarted to use a larger
buffer for EMSGSIZE errors. That could be done automatically but it
has other user impacts (e.g., duplicate output if the dump is
restarted) and should be the subject of a different patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:32:38 -07:00
Baruch Siach 98447086f8 ip: include libc headers first
Including libc headers first helps as a workaround to redefinition of struct
ethhdr with a suitably patched musl libc that suppresses the kernel
if_ether.h.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2017-05-30 17:27:58 -07:00
Stephen Hemminger 8612ca2f13 update headers to get IFLA_EVENT
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-30 10:14:01 -07:00
Stephen Hemminger 0071f3d058 update headers to get changes for TCA_FLOWER
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-26 17:12:41 -07:00
Stephen Hemminger d4473c0257 update to current net-next headers 2017-05-26 17:11:02 -07:00
Roman Mashak cba134ae70 tc: fix Makefile to build skbmod
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-05-22 13:33:51 -07:00
Jiri Pirko d19f72f789 tc/actions: introduce support for goto chain action
Allow user to set control action "goto" with filter chain index as
a parameter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko e67aba5595 tc: actions: add helpers to parse and print control actions
Each tc action is terminated by a control action. Each action parses and
prints then intividually. Introduce set of helpers and allow to share
this code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko 732f03461b tc_filter: add support for chain index
Allow user to put filter to a specific chain identified by index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Stephen Hemminger cda81a4ea5 include: remove no longer used iptables_common.h
Reported-by: Baruch Siach <baruch@tkos.co.il>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-22 13:22:22 -07:00
Khem Raj ae717baf15 tc: include stdint.h explicitly for UINT16_MAX
Fixes
| tc_core.c:190:29: error: 'UINT16_MAX' undeclared (first use in this function); did you mean '__INT16_MAX__'?
|    if ((sz >> s->size_log) > UINT16_MAX) {
|                              ^~~~~~~~~~

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2017-05-22 11:41:53 -07:00
Stephen Hemminger a2325adf0f update headers from 4.12-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-22 11:06:29 -07:00
David Ahern 4af4471606 ip: add support for more MPLS labels
Kernel now supports up to 30 labels but not defined as part of the uapi.
iproute2 handles up to 8 labels but in a non-consistent way. Update ip
to handle more labels, but in a more programmatic way.

For the MPLS address family, the data field in inet_prefix is used for
labels.  Increase that field to 64 u32's -- 64 as nothing more than a
convenient power of 2 number.

Update mpls_pton to take the length of the address field, convert that
length to number of labels and add better error handling to the parsing
of the user supplied string.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-22 11:03:02 -07:00
Amir Vadai f3e1b2448a pedit: Introduce ipv6 support
Add support for modifying IPv6 headers using pedit.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai a13426fe1a pedit: Check for extended capability in protocol parser
Do not allow using eth and udp header types if non-extended pedit kABI
is being used. Other protocol parsers already have this check.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai cdca191862 pedit: Do not allow using retain for too big fields
Using retain for fields longer than 32 bits is not supported.
Do not allow user to do it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai 290cdc058d pedit: Fix a typo in warning
'ex' attribute should be placed after 'action pedit' and not after
'munge'.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Girish Moodalbail 01c659955a vxlan: Add support for modifying vxlan device attributes
Ability to change vxlan device attributes was added to kernel through
commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
cannot do the same through ip(8) command.  Changing the allowed vxlan
device attributes using 'ip link set dev <vxlan_name> type vxlan
<allowed_attributes>' currently fails with 'operation not supported'
error.  This failure is due to the incorrect rtnetlink message
construction for the 'ip link set' operation.

The vxlan_parse_opt() callback function is called for parsing options
for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
down default values for those attributes that were not provided as CLI
options. However, for the 'set' case we should be only passing down the
explicitly provided attributes and not any other (default) attributes.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
2017-05-11 11:11:08 -07:00
David Ahern aac40403ea ip: mpls: fix printing of mpls labels
If the kernel returns more labels than iproute2 expects, none of
the labels are printed and (null) is shown instead:
    $ ip -f mpls ro ls
    101 as to (null) via inet 172.16.2.2 dev virt12
    201 as to 202/203 via inet6 2001:db8:2::2 dev virt12

Remove the use of MPLS_MAX_LABELS and rely on buffer length that is
passed to mpls_ntop. With this change ip can print the label stack
returned by the kernel up to 255 characters (limit is due to size of
buf passed in) which amounts to 31 labels with a separator.

With this change the above is:
    $ ip/ip -f mpls ro ls
    101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev virt12

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-11 11:08:02 -07:00
Alexander Alemayhu 5be9971c73 tc: bpf: add ppc64 and sparc64 to list of archs with eBPF support
sparc64 support was added in 7a12b5031c6b (sparc64: Add eBPF JIT., 2017-04-17)[0]
and ppc64 in 156d0e290e96 (powerpc/ebpf/jit: Implement JIT compiler for extended BPF, 2016-06-22)[1].

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7a12b5031c6b
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=156d0e290e96
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-08 23:05:35 -07:00
Or Gerlitz e57285b81a tc: Reflect HW offload status
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-05 09:49:25 -07:00
Stephen Hemminger 76557951f5 update kernel headers during 4.12 merge window
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-05 09:48:54 -07:00
Arkadi Sharshevsky 153c1a9b21 devlink: Add support for pipeline debug (dpipe)
Add support for pipeline debug (dpipe). The headers are used both the
gain visibillity into the headers supported by the hardware, and to
build the headers/field database which is used by other commands.

Examples:

First we can see the headers supported by the hardware:

$devlink dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta
  field:
    name erif_port bitwidth 32 mapping_type ifindex
    name l3_forward bitwidth 1
    name l3_drop bitwidth 1

Note that mapping_type is presented only if relevant. Also the header/
field id's are reported by the kernel they are not shown by default.
They can be observed by using the -v option. Also the headers scope
(global/local) is specified.

$devlink -v dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta id 0 global false
  field:
    name erif_port id 0 bitwidth 32 mapping_type ifindex
    name l3_forward id 1 bitwidth 1
    name l3_drop id 2 bitwidth 1

Second we can examine the tables supported by the hardware. In order
to dump all the tables no table name should be provided:
$devlink dpipe table show pci/0000:03:00.0

In order to examine specific table its name have to be specified:
$devlink dpipe table show pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  name mlxsw_erif size 800 counters_enabled true
  match:
    type field_exact header mlxsw_meta field erif_port mapping ifindex
  action:
    type field_modify header mlxsw_meta field l3_forward
    type field_modify header mlxsw_meta field l3_drop

To enable/disable counters on the table:
$devlink dpipe table set pci/0000:03:00.0 name erif counters enable
$devlink dpipe table set pci/0000:03:00.0 name erif counters disable

In order to see the current entries in the hardware for specific table:
$devlink dpipe table dump pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  index 0 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 383 value 0
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

  index 1 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 381 value 1
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

In the above example the table contains two entries which does match
on erif port and forwards the packet or drop it (currently only the
forward count is implemented). The counter values are provided for
example. In case the counting is not enabled on the table the counters
will not be available.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-03 09:29:43 -07:00
Arkadi Sharshevsky 4f10cede93 devlink: Change netlink attribute validation
Currently the netlink attribute resolving is done by a sequence of
if's. Change the attribute resolving to table lookup.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-05-03 09:29:42 -07:00
Phil Sutter 6a78ef97b6 man: ip.8: Document -brief flag
Brief output is especially useful for new users, so at least mention
it's existence in ip man page.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-03 09:28:40 -07:00
Stephen Hemminger 5bffc12ef4 Merge branch 'net-next' 2017-05-03 09:28:10 -07:00
Stephen Hemminger cbc7929b21 v4.11.0 2017-05-01 09:32:25 -07:00
Boris Pismenny cfd2e727f0 ip xfrm: Add xfrm state crypto offload
syntax:
ip xfrm state .... offload dev <if-name> dir <in or out>

Example to add inbound offload:
  ip xfrm state .... offload dev mlx0 dir in
Example to add outbound offload:
  ip xfrm state .... offload dev mlx0 dir out

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
2017-05-01 09:30:25 -07:00
Daniel Borkmann a872b870a5 bpf: add support for generic xdp
Follow-up to commit c7272ca720 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca720, but the driver still lacks native XDP
support.

  # ip link
  [...]
  3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
      link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
  [...]

In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca720,
but with 'xdpgeneric' instead of 'xdp' command for loading:

  # ip -force link set dev eno1 xdpgeneric obj xdp.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
2017-05-01 09:28:19 -07:00
Stephen Hemminger 7ff1fce549 update headers to 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:27:46 -07:00
Stephen Hemminger d2b9100a08 Merge branch 'master' into net-next 2017-05-01 09:26:51 -07:00
Stephen Hemminger 1e600da057 pedit: fix whitespace
Add newlines to break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:25:22 -07:00
Or Gerlitz 3d2a7781ec tc/pedit: p_udp: introduce pedit udp support
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 999 \
    action pedit ex munge \
      udp dport set 888 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 2c6eb12ab8 tc/pedit: p_tcp: introduce pedit tcp support
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
      dst_port 80 \
    action pedit ex munge \
      tcp dport set 8080 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 3cd5149ecd tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai fa4652ff3b tc/pedit: Support fields bigger than 32 bits
Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 8d193d9607 tc/pedit: p_ip: introduce editing ttl header
Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai c05ddaf9e0 tc/pedit: Introduce 'add' operation
This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 7c71a40cbd tc/pedit: Extend pedit to specify offset relative to mac/transport headers
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 51536ebbe8 tc/pedit: Fix a typo in pedit usage message
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Stephen Hemminger bb6ab47b16 iplink: whitespace cleanup
Break lines to conform to 80 col guideline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:13:09 -07:00
Zhang Shengju 432b92a702 iplink: add support for IFLA_CARRIER attribute
Add support to set IFLA_CARRIER attribute.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2017-05-01 09:06:54 -07:00
Michal Kubeček 6ec14f1abb routel: fix infinite loop in line parser
As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.

Improve the parser to

  (1) only set variables for keywords we care about
  (2) recognize (currently) known keywords without value

This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-04-27 16:42:29 -07:00
Phil Sutter 843fc90068 man: ip-rule.8: Further clarify how to interpret priority value
Despite the past changes, users seemed to get confused by the seemingly
contradictory relation of priority value and actual rule priority.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-24 11:43:09 -07:00
Craig Gallek ad4b1425c3 iplink: Expose IFLA_*_FWMARK attributes for supported link types
This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.

Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:14:46 -07:00
Stephen Hemminger 590dde3a98 Merge branch 'master' into net-next 2017-04-23 09:14:35 -07:00
Craig Gallek 35893864c8 gre6: fix copy/paste bugs in GREv6 attribute manipulation
Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:13:07 -07:00
Jamal Hadi Salim fd8b3d2c1b actions: Add support for user cookies
Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-04-23 09:10:02 -07:00
Stephen Hemminger 0e3cdd9ce0 remove unused header file sysctl.h
Not referred to in current source tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:47:30 -07:00
Stephen Hemminger 5b0aa88737 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:41:33 -07:00
David Lebrun e1b7f883e5 man: add documentation for IPv6 SR commands
This patch adds information about seg6 encapsulation in the ip-route
manual, as well as the ip-sr manual page.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun e8493916a8 iproute: add support for SR-IPv6 lwtunnel encapsulation
This patch adds support for SEG6 encapsulation type
("ip route add ... encap seg6 ...").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun 9386332823 ip: add ip sr command to control SR-IPv6 internal structures
This patch adds commands to support the tunnel source properties
("ip sr tunsrc") and the HMAC key -> secret, algorithm binding
("ip sr hmac").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
Stephen Hemminger 85dd6ab510 add seg6.h kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:21:34 -07:00
Stephen Hemminger 2c6a0636e2 Update kernel headers from 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:19:32 -07:00
David Ahern 0da8250be8 ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:19:32 -07:00
David Ahern f443565f8d ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:06:33 -07:00
David Ahern c6858ef431 ip netconf: show all families on dev request
Currently specifying a device to ip netconf and it dumps only values
for IPv4. Change this to dump data for all families unless a specific
family is given.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern f052f5dfe0 ip netconf: Show all address families by default in dumps
Currently, 'ip netconf' only shows ipv4 and ipv6 netconf settings. If IPv6
is not enabled, the dump ends with
    RTNETLINK answers: Operation not supported

when IPv6 request is attempted. Further, if the mpls_router module is also
loaded a separate request is needed to get MPLS settings.

To make this better going forward, use the new PF_UNSPEC dump all option
if the kernel supports it. If the kernel does not, it sets NLMSG_ERROR and
returns EOPNOTSUPP which is trapped and we fall back to the existing output
to maintain compatibility with existing kernels.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern 3ad6d17638 netlink: Add flag to suppress print of nlmsg error
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,

  RTNETLINK answers: Operation not supported

when probing for support of a new feature.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
Stephen Hemminger dfb60ddd29 Merge branch 'master' into net-next 2017-04-14 15:59:12 -07:00
Stephen Hemminger 2d3af1675d netem: fix out of bounds access in maketable
The maketable program used to generate one of the configuration
files at build time for netem would access past the end of the array
for one input value. This is a bug inherited from original NISTnet.
Just fold the value, like other code there.

This is not a runtime error security problem.
It only impacts the build process if the build machine
had extra hardening enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-12 10:13:14 -07:00
Robert Shearman 9688cf3b7a iproute: Add support for MPLS LWT ttl attribute
Add support for setting and displaying the ttl attribute
for MPLS IP lighweight tunnels.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-12 10:02:15 -07:00
Robert Shearman c44d18ea96 iproute: Add support for ttl-propagation attribute
Add support for setting and displaying the ttl-propagation attribute
initially used by MPLS to control propagation of MPLS TTL to IPv4/IPv6
TTL/hop-limit on popping final label on a per-route basis.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-12 10:02:15 -07:00
Stephen Hemminger 19beb1aa16 Merge branch 'master' into net-next 2017-04-12 10:02:07 -07:00
Timothy Redaelli 5551ed44d3 ip-route: Prevent some other double spaces in output
Print spaces only after text.

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-04-12 09:53:23 -07:00
Stephen Hemminger 45f78b4dec update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-04 14:57:46 -07:00
Stephen Hemminger f4878dfae4 Merge branch 'master' into net-next 2017-04-04 14:56:41 -07:00
Phil Sutter 058d28b44c man: ip-link: Specify min/max values for bridge slave priority and cost
The values are parsed as u16/u32, but kernel limits allowed values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:54:44 -07:00
Phil Sutter 9fd7b86c2d ip: link: Add missing link type help texts
These are basically stubs: The types which lacked their own help text
simply don't accept any options (yet). Still it might be a bit confusing
to users if they are presented with the generic 'ip link' help text
instead of something saying there are no type specific options.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter 8b47135474 ip: link: Unify link type help functions a bit
Take help function in iplink_bridge.c as an example and make other link
types' help functions similar:

* Use a single fprintf() call (if possible).
* Don't state a full command line, just "... type OPTIONS".
* Put every option in it's own line, align options by column.
* List mandatory options first.

link_veth.c is intentionally left untouched because it's 'peer' option
eats all kinds of generic link options and the help text points this out
without duplicating all the options there again.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter e336868e09 ip: link: macvlan: Add newline to help output
A newline between synopsis and variable definition looks nice and is
consistent with others.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter be985020ab ip: link: bond: Fix whitespace in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Sabrina Dubroca 3fbb5d43bb man: ip-link.8: document bridge options
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2017-04-04 14:50:02 -07:00
Roman Mashak 878babffec tc: print skbedit action when dumping actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-04-04 14:48:54 -07:00
Alexander Alemayhu 5caba410c2 man: fix man page warnings
While generating PDFs from the man pages, I saw the warning below from
several files. Compared the tc-matchall.8 with bridge.8 and used .RI
instead of .R. It should have no effect on the man page rendering.

    `R' is a string (producing the registered sign), not a macro.

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
2017-04-04 14:46:34 -07:00
Stephen Hemminger b285ba9ea4 update headers from net-next (post 4.11-rc3)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-20 10:18:50 -07:00
Stephen Hemminger 10b9d499b6 Merge branch 'master' into net-next 2017-03-20 10:18:17 -07:00
Stephen Hemminger cfca3b356a update headers from 4.11-rc3
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-20 10:17:01 -07:00
Vincent Bernat 97d564b90c vxlan: use preferred address family when neither group or remote is specified
When neither group or remote is specified (or if they are specified with
the any address), nothing is sent to the kernel. In this case, the
kernel defaults to IPv4. This makes impossible to use IPv6 with
unspecified unicast remote ("bridge fdb add" will return
EAFNOTSUPPORT).

If the user specifies a preferred address family (eg, "ip -6 link add"),
then send either IFLA_VXLAN_GROUP or IFLA_VXLAN_GROUP6 to enforce the
use of the appropriate family.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
2017-03-20 10:16:09 -07:00
David Ahern 3e14fd0411 ip route: Add missing space between nexthop and via for mpls multipath routes
MPLS multipath routes are missing a space between 'nexthop' and 'via':

$ ip -net ns1 -f mpls ro ls
100
	nexthopvia inet 172.16.2.2  dev virt12
	nexthopvia inet 172.16.3.2  dev br0

Add it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-03-20 10:14:44 -07:00
Stephen Hemminger 4303c0dd99 Merge branch 'master' into net-next 2017-03-14 16:42:59 -07:00
Alexander Alemayhu 0db70c59e1 man: add examples to ip.8
Having some examples in the top level man page might make it a little bit easier
for new users to get started. Reused some words / sentences from the existing
man pages.

Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org>
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
2017-03-14 16:41:13 -07:00
Jiri Kosina 7c581a124d iproute2: add support for invisible qdisc dumping
Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking
kernel to perform 'full qdisc dump', as for historical reasons some of the
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-14 16:37:08 -07:00
Stephen Hemminger 2099b98385 update headers from net-next
Get TCA_DUMP_INVISIBLE and SCTP changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-14 16:36:36 -07:00
Stephen Hemminger 8fded9ffad update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-13 08:32:13 -07:00
Stephen Hemminger a4280cfa72 update headers from 4.11-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-13 08:30:55 -07:00
Robert Shearman ad0e37403f man: Fix formatting of vrf parameter of ip-link show command
Add missing opening " [" for the vrf parameter.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
2017-03-10 08:58:17 -08:00
Stephen Hemminger 60ccfcd7f2 pie: remove always false condition
When built with GCC warnings enabled:
q_pie.c: In function ‘pie_parse_opt’:
q_pie.c:78:38: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (alpha > ALPHA_MAX) || (alpha < ALPHA_MIN)) {
                                      ^
q_pie.c:85:35: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (beta > BETA_MAX) || (beta < BETA_MIN)) {
                                   ^

This is because MIN is 0 and unsigned number can never be less than 0.
Therefore just remove the _MIN values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-10 08:58:01 -08:00
Robert Shearman 837552b445 iplink: add support for afstats subcommand
Add support for new afstats subcommand. This uses the new
IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
per-device, AF-specific stats. At the moment the kernel only supports
MPLS AF stats, so that is all that's implemented here.

The print_num function is exposed from ipaddress.c to be used for
printing the new stats so that the human-readable option, if set, can
be respected.

Example of use:

    $ ./ip/ip -f mpls link afstats dev eth1
    3: eth1
        mpls:
            RX: bytes  packets  errors  dropped  noroute
            9016       98       0       0        0
            TX: bytes  packets  errors  dropped
            7232       113      0       0

Signed-off-by: Robert Shearman <rshearma@brocade.com>
2017-03-10 08:44:55 -08:00
Phil Sutter 32b1a12713 man: ss.8: Add missing protocols to description of -A
The list was missing dccp and sctp protocols.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-03-10 08:42:13 -08:00
Roi Dayan 639785ff30 devlink: Add json and pretty options to help and man
While at it also fixed missing double dash for long opts.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-03-08 17:59:01 -08:00
Daniel Borkmann 51361a9f1c bpf: test for valid type in bpf_get_work_dir
Jan-Erik reported an assertion in bpf_prog_to_subdir() failed where
type was BPF_PROG_TYPE_UNSPEC, which is only used in bpf_init_env()
to auto-mount and cache the bpf fs mount point.

Therefore, make sure when bpf_init_env() is called multiple times
(f.e. eBPF classifier with eBPF action attached) and bpf_mnt_cached
is set already that the type is also valid. In bpf_init_env(), we're
only interested in the mount point and not a type-specific subdir.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Reported-by: Jan-Erik Rediger <janerik@rediger.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-03-08 17:57:00 -08:00
Petr Vorel 54eab4c79a color: use "light" colors for dark background
COLORFGBG environment variable is used to detect dark background.

Idea and a bit of code is borrowed from Vim, thanks.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-03 09:58:05 -08:00
Stephen Hemminger d896797c7b bpf: remove unnecessary cast
No need to cast RTA_DATA

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:25:02 -08:00
Stephen Hemminger a59b616200 tc: use rta_getattr_u32
Don't cast RTA_DATA use newish accessors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:24:34 -08:00
Stephen Hemminger 84da4099e9 xfrm: remove unnecessary casts
Since RTA_DATA() returns void * no need to cast it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:23:14 -08:00
Jiri Kosina be67f81297 iproute2: tc: introduce build dependency on libnetlink
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-24 15:11:32 -08:00
Stephen Hemminger 9f1370c0e5 netlink route attribute cleanup
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA().  Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 08:56:38 -08:00
Daniel Borkmann e37d706b56 {f,m}_bpf: dump tag over insns
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-02-23 09:02:19 -08:00
Roi Dayan 164a9ff401 tc: flower: Fix parsing ip address
Fix order of arguments when passed to __flower_parse_ip_addr.

Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-23 09:01:15 -08:00
David Ahern 76f7d89d4d ip: Add support for MPLS netconf
Add support for MPLS netconf to ip monitor and ip netconf commands.
Changes to header files not included as those are typically pulled
in my a header sync with the kernel.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-23 08:58:40 -08:00
Stephen Hemminger 3f34574d0f Update headers based on 4.11 merge window
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-23 08:58:11 -08:00
Stephen Hemminger ae429903d7 update headers from net-next
updated sctp.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-20 08:53:50 -08:00
Stephen Hemminger 2b99748a60 add missing iplink_xstats.c
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-20 08:53:40 -08:00
Stephen Hemminger 29926015ea Merge branch 'master' into net-next 2017-02-20 08:51:22 -08:00
Stephen Hemminger f36ba8a4cd v4.10.0 2017-02-20 08:47:52 -08:00
Jiri Pirko cdd2f7ccd7 devlink: use DEVLINK_CMD_ESWITCH_* instead of DEVLINK_CMD_ESWITCH_MODE_*
Sync with kernel and don't use the obsolete enum values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-02-19 12:01:47 -08:00
Nikolay Aleksandrov 217264a079 iplink: bridge_slave: add support for displaying xstats
This patch adds support to the bridge_slave link type for displaying
xstats by reusing the previously added bridge xstats callbacks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-02-18 16:37:24 -08:00
Nikolay Aleksandrov 60ec0ecf0f iplink: bridge: add support for displaying xstats
Add support for the new parse/print_ifla_xstats callbacks and use them to
print the per-bridge multicast stats.
Example:
$ ip link xstats type bridge
br0
                    IGMP queries:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP reports:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP leaves: RX: 0 TX: 0
                    IGMP parse errors: 0
                    MLD queries:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD reports:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD leaves: RX: 0 TX: 0
                    MLD parse errors: 0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-02-18 16:36:58 -08:00
Nikolay Aleksandrov 94f1a22aa7 iplink: add support for xstats subcommand
This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:36:01 -08:00
Stephen Hemminger bb8771573a Merge branch 'master' into net-next 2017-02-18 16:32:16 -08:00
Leon Romanovsky b77c77d294 devlink: Call dl_free in early exit case
Prior to parsing command options, the devlink tool allocates memory
to store results. In case of early exit (wrong parameters or version
check), this memory wasn't freed.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-02-18 16:29:56 -08:00
Lucas Bates 5e4dc1951e man page: add page for skbmod action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-18 16:27:41 -08:00
Stephen Hemminger d250da9c68 Merge branch 'master' into net-next 2017-02-18 16:21:20 -08:00
Stephen Hemminger 2bf1a81a2f utils: hex2mem get rid of unnecessary goto
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:18:55 -08:00
Stephen Hemminger c72dab6624 Merge branch 'master' into net-next 2017-02-18 16:07:32 -08:00
Stephen Hemminger 835784525a update headers from 4.10-rc8
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:05:37 -08:00
Stephen Hemminger b6d8c4a606 Merge branch 'merge-4.10' of /tmp/iproute2 2017-02-18 16:04:25 -08:00
Stephen Hemminger ac94e16ca2 Merge branch 'merge-4.10' into next-merge 2017-02-17 15:34:24 -08:00
David Ahern b5377431df ip vrf: Detect invalid vrf name in pids command
Verify VRF name is valid before attempting to read cgroups files.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 6a9783831c ip vrf: Handle VRF nesting in namespace
Since cgroups are not namespace aware, the directory heirarchy used by
ip vrf should account for network namespaces. In this case, change the
path from CGRP/BASE/vrf/NAME to CGRP/BASE/NETNS/vrf/NAME where CGRP is
the cgroup2 mount path, BASE in any base heirarchy inherited before VRF
is applied and NAME is the VRF name.

The intent is as follows: a user logs into the box into some namespace
with a name known to iproute2. Some other policy may have put the
process into a BASE heirarchy. From there the user executes a task in
a VRF and in doing so the task heirarchy becomes CGRP/BASE/NETNS/vrf/NAME.
The namespace level is omitted for the default namespace.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 9c49438a67 ip netns: refactor netns_identify
Move guts of netns_identify into a standalone function that returns
the netns name in a given buffer.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 46afa6947b ip vrf: Handle vrf in a cgroup hierarchy
Add support for VRF in a pre-existing hierarchy. For example, if the
current process is running in CGRP/foo/bar, the 'ip vrf exec NAME CMD'
should run CMD in the cgroup CGRP/foo/bar/vrf/NAME.

When listing process ids in a VRF, search for the directory vrf/NAME
regardless of base path (foo/bar/vrf/NAME and vrf/NAME) are still
running against the same vrf NAME.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
Stephen Hemminger 732b18af97 Merge branch 'merge-4.10' into next-merge 2017-02-17 15:32:28 -08:00
Simon Horman 6374961a00 tc: flower: support masked ICMP code and type match
Extend ICMP code and type match to support masks.

Also add missing documentation to synopsis in manpage.

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman 9d36e54f36 tc: flower: provide generic masked u8 print helper
Provide generic masked u8 print helper and use it to print arp operations.

Also:
* Make name parameter of arp op print helper const.
* Consistently use __u8 rather than uint8_t, in keeping with the
  pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman 180136e540 tc: flower: provide generic masked u8 parser helper
Provide generic masked u8 paser helper and use it to parse arp operations.

Also consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Stephen Hemminger cad5493448 update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-17 15:30:50 -08:00
Stephen Hemminger 9b0d47e58a Merge branch 'master' into next-merge 2017-02-17 15:29:24 -08:00
Asbjørn Sloth Tønnesen d754a64aed testsuite: search for kernel config in /boot
Add support for finding the kernel config in Debian
and derivatives.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2017-02-17 15:26:30 -08:00
Asbjørn Sloth Tønnesen 3064a44c69 testsuite: refactor kernel config search
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2017-02-17 15:26:30 -08:00
Or Gerlitz afdc1fed24 tc: matchall: Print skip flags when dumping a filter
Print the skip flags when we dump a filter.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:25:24 -08:00
David Ahern 1ca2e08bd0 ip route: Make name of protocol 0 consistent
iproute2 can inconsistently show the name of protocol 0 if a route with
a custom protocol is added. For example:
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto none  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto none  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto none  metric 0  pref medium

protocol 0 is pretty printed as "none". Add a route with a custom protocol:
  dsa@cartman:~$ sudo ip -6 ro add  2001:db8:200::1/128 dev eth0 proto 123

And now display has switched from "none" to "unspec":
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto unspec  metric 0  pref medium

The rt_protos file has the id to name mapping as "unspec" while
rtnl_rtprot_tab[0] has "none". The presence of a custom protocol id
triggers reading the rt_protos file and overwriting the string in
rtnl_rtprot_tab. All of this is logic from 2004 and earlier.

Update rtnl_rtprot_tab to "unspec" to match the enum value.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:12:29 -08:00
Hangbin Liu e83435fcd7 man: ip-link.8: Document bridge_slave fdb_flush option
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-02-09 17:31:43 -08:00
Phil Sutter 3cef95926b testsuite: Search kernel config in modules dir also
At least in Fedora there is no /proc/config.gz but instead
/lib/modules/`uname -r`/config, so use that as a fallback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-09 17:28:48 -08:00
Phil Sutter 886f2c43b5 testsuite: Generate nlmsg blob at runtime
Since netlink messages are in host byte order, shipping a pre-generated
nlmsg blob won't suffice on systems with different endianness. Therefore
generate the blob at runtime, so it's content fits the hosts endianness.

Note that the generated message will contain only a single interface
featuring two VFs instead of the full list before. Yet this is
sufficient, as it triggers the crash with iproute versions prior to
commit 8c29ae7cc2 ("ip link: Fix crash on older kernels when show VF
dev").

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-09 17:28:48 -08:00
Simon Horman c7ec052bb8 tc: flower: Update documentation to indicate ARP takes IPv4 prefixes
Unlike other PREFIXes documented in the usage for tc flower, which accept
both IPv4 and IPv6 prefixes, arp_sip and arp_tip only accepts IPv4
prefixes.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:39:33 -08:00
Simon Horman 81f6e5a727 tc: flower: use correct type when calling flower_icmp_attr_type
Use enum flower_icmp_field rather than bool as type of third parameter
when calling flower_icmp_attr_type.

Fixes: eb3b5696f1 ("tc: flower: support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:37:44 -08:00
Hangbin Liu 1e5b0e80ff man: ip-link.8: Document bridge_slave fdb_flush option
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-02-08 11:36:22 -08:00
Stephen Hemminger f0337c4475 tc: add missing sample file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-07 11:53:24 -08:00
Stephen Hemminger 985091aa8c update headers from bridge tunnel metadata
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-07 11:52:49 -08:00
Yotam Gigi b32c0b64fa tc: bash-completion: Add support for matchall
Add support for the matchall classifier and its parameters.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi a9d2f4d861 tc: bash-completion: Add support for filter actions
Previously, the autocomplete routine did not complete actions after a
filter keyword, for example:

$ tc filter add dev eth0 u32 [...] action <TAB>

did not suggest the actions list, and:

$ tc filter add dev eth0 u32 [...] action mirred <TAB>

did not suggest the specific mirred parameters. Add the support for this
kind of completion by adding the _tc_filter_action_options routine and
invoking it from inside _tc_filter_options.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 57086f7b25 tc: bash-completion: Make the *_KIND variables global
The QDISC_KIND, FILTER_KIND, ACTION_KIND variables may be used by other
routines, thus make them global variables.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi f62b54a106 tc: bash-completion: Prepare action autocomplete to support several actions
The action autocomplete routine (_tc_action_options) currently does not
support several actions statements in one tc command line as it uses the
_tc_once_attr and _tc_one_from_list.

For example, in that case:

$ tc filter add dev eth0 handle ffff: u32 [...]  \
	   action sample group 5 rate 12 	 \
	   action sample <TAB>

the _tc_once_attr function, when invoked with "group rate" will not
suggest those as they already exist on the command line.

Fix the function to use the _from variant, thus allowing each action
autocomplete start from the action keyword, and not from the beginning of
the command line.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 26e0996a87 tc: bash-completion: Add the _from variant to _tc_one* funcs
The _tc_one_of_list and _tc_once_attr functions simplfy the bash
completion task by validating each attr exist only once on the command
line.

For example, for the command line:

$ a b c d e

and the call to _tc_once_attr with "a f g", the function will suggest
"f g" as "a" existed in the command line in args 0.

Add the _from variant to those functions, which allows having the command
line option once from a specified index. In the previous example, calling
_tc_once_attr with 4 and "a f g" will suggest "a f g".

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 787317f50a tc: man: matchall: Update examples to include sample
Add an example of packet sampling to the tc-matchall man page examples
section. The example uses the matchall classifier and the sample action to
create packet sampling on a port.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Yotam Gigi 515e943d76 tc: man: Add man entry for the tc-sample action
In addition to general information about the tc action, the man entry
contains common usage examples and information about the tlv fields packed
within each sampled packet.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Yotam Gigi 0b1abd84fb tc: Add support for the sample tc action
The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).

The sampled packets contain informative metadata, for example, the input
interface and the original packet length.

The action syntax:
tc filter add [...] \
	action sample rate <RATE> group <GROUP> [trunc <SIZE>]
	[...]

Where:
  RATE := The sampling rate which is the ratio of packets observed at the
	  data source to the samples generated
  GROUP := the psample module sampling group
  SIZE := optional truncation size

An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
       matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Stephen Hemminger 818a10a77f Merge branch 'master' into net-next 2017-02-06 14:13:27 -08:00
Stephen Hemminger 17c4c446bd tcp: header file update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-06 14:08:07 -08:00
Stephen Hemminger b5de688592 Merge branch 'master' into net-next 2017-02-06 14:07:13 -08:00
Phil Sutter 72dfff6e11 man: ip-route.8: Fix 'expires' indenting
Descriptions of each route sub-command's arguments are enclosed in
.RS/.RE pairs. For 'replace' sub-command, '.RE' was incorrectly put
before the last argument ('expires').

Fixes: 3fbe7ca847 ("iproute2: ip-route.8.in: Add expires option for ip route")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-06 13:52:52 -08:00
Eric Dumazet 38e6dbc4b3 ss: print tcpi_rcv_mss and tcpi_advmss
tcpi_rcv_mss and tcpi_advmss tcp info fields were not yet reported
by ss.

While adding GRO support to packetdrill, I found this was useful.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2017-02-06 13:50:29 -08:00
Ralf Baechle e7867c34e8 ip: HSR: Fix cut and paste error
Fixes: 5c0aec93a5 ("ip: Add HSR support")
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-02-06 13:49:02 -08:00
Nogah Frankel aaacdfd570 ifstat: Add xstat to ifstat man page
Add documentation about the extended statistics to the ifstat man page.
Add ifstat man age to the man8 Makefile

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 1c2df61344 ifstat: Add "sw only" extended statistics to ifstat
Add support for extended statistics of SW only type, for counting only the
packets that went via the cpu. (useful for systems with forward
offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.

It is under the name 'cpu_hits'
(or any shorten of it as 'cpu' or simply 'c')

For example:
ifstat -x c

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 5a52102b7c ifstat: Add extended statistics to ifstat
Extended stats are part of the RTM_GETSTATS method. This patch adds them
to ifstat.
While extended stats can come in many forms, we support only the
rtnl_link_stats64 struct for them (which is the 64 bits version of struct
rtnl_link_stats).
We support stats in the main nesting level, or one lower.
The extension can be called by its name or any shorten of it. If there is
more than one matched, the first one will be picked.

To get the extended stats the flag -x <stats type> is used.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 3d8048dcc3 ifstat: Includes reorder
Reorder the includes in misc/ifstat.c to match convention.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Yotam Gigi d65a744cdb tc: man: matchall: Fix example indentation
The man page contains two examples, which have different indentation. Fix
the indentation of the two examples to match.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-01-31 16:18:33 -08:00
Stephen Hemminger b479a7d75b update kernel headers from net-next 2017-01-29 20:31:31 -08:00
Stephen Hemminger fefc93bb28 Merge branch 'master' into net-next 2017-01-29 20:30:05 -08:00
Roman Mashak 31951c47e9 tc: distinguish Add/Replace action operations.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-01-29 20:26:44 -08:00
Phil Sutter 6bbe5e6290 man: tc-csum.8: Fix example
This fixes two issues with the provided example:

- Add missing 'dev' keyword to second command.
- Use a real IPv4 address instead of a bogus hex value since that will
  be rejected by get_addr_ipv4().

Fixes: dbfb17a67f ("man: tc-csum.8: Add an example")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-29 20:25:35 -08:00
Benjamin LaHaise 4f7d406f5d f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"
v2 - update to address changes in 00697ca19a.

When using the tc flower filter, rules marked with "protocol all" do not
actually match all packets.  This is due to a bug in f_flower.c that passes
in ETH_P_ALL in the TCA_FLOWER_KEY_ETH_TYPE attribute when adding a rule.
Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to
ETH_P_ALL.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2017-01-29 20:23:58 -08:00
Paul Blakey 08f66c80c0 tc: flower: Refactor matching flags to be more user friendly
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
    # add a flower filter that will drop fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags frag \
    action drop

    # add a flower filter that will drop non-fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags nofrag \
    action drop

Fixes: 22a8f01989 ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 10:36:45 -08:00
Hangbin Liu d1b41236e1 iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH
This patch implements support for the IFLA_BRPORT_FLUSH attribute
in iproute2 so it can flush bridge slave's fdb dynamic entries.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu c3f1e3c425 iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION
This patch implements support for the IFLA_BR_MCAST_MLD_VERSION
attribute in iproute2 so it can change the mcast mld version.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 19756950f7 iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
This patch implements support for the IFLA_BR_MCAST_IGMP_VERSION
attribute in iproute2 so it can change the mcast igmp version.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 6ddad009e2 iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
This patch implements support for the IFLA_BR_MCAST_STATS_ENABLED
attribute in iproute2 so it can enable/disable mcast stats accounting.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 7d93b567ea iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
This patch implements support for the IFLA_BR_VLAN_STATS_ENABLED
attribute in iproute2 so it can enable/disable vlan stats accounting.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu f3372d62eb iplink: bridge: add support for IFLA_BR_FDB_FLUSH
This patch implements support for the IFLA_BR_FDB_FLUSH attribute
in iproute2 so it can flush bridge fdb dynamic entries.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Nikolay Aleksandrov e763e3310e ipmroute: add support for RTNH_F_UNRESOLVED
This patch adds a new field that is printed in the end of the line which
denotes the real entry state. Before this patch an entry's IIF could
disappear and it would look like an unresolved one (iif = unresolved):
(3.0.16.1, 225.11.16.1)          Iif: unresolved

with no way to really distinguish it from an unresolved entry.
After the patch if the dumped entry has RTNH_F_UNRESOLVED set we get:
(3.0.16.1, 225.11.16.1)          Iif: unresolved  State: unresolved

for unresolved entries and:
(0.0.0.0, 225.11.11.11)          Iif: eth4       Oifs: eth3  State: resolved

for resolved entries after the OIF list. Note that "State:" has ':' in
it so it cannot be mistaken for an interface name.

And for the example above, we'd get:
(0.0.0.0, 225.11.11.11)          Iif: unresolved     State: resolved

Also when dumping all routes via ip route show table all,
 it will show up as:
multicast 225.11.16.1/32 from 3.0.16.1/32 table default proto 17 unresolved

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:41:26 -08:00
David Ahern 11f2c75315 ip route: error out on multiple via without nexthop keyword
To specify multiple nexthops in a route the user is expected to use the
"nexthop" keyword which ip route uses to create the RTA_MULTIPATH.
However, ip route always accepts multiple 'via' keywords where only the
last one is used in the route leading to confusion. For example, ip
accepts this syntax:
    $ ip ro add vrf red  1.1.1.0/24 via 10.100.1.18 via 10.100.2.18

but the route entered inserted by the kernel is just the last gateway:
    1.1.1.0/24 via 10.100.2.18 dev eth2

which is not the full request from the user. Detect the presense of
multiple 'via' and give the user a hint to add nexthop:

    $ ip ro add vrf red  1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
    Error: argument "via" is wrong: use nexthop syntax to specify multiple via

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:38:20 -08:00
Davide Caratti 6561cb28f2 tc: m_csum: add support for SCTP checksum
'sctp' parameter can now be used as 'csum' target to enable CRC32c
computation on SCTP packets.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-01-20 09:32:08 -08:00
Stephen Hemminger a044b36af3 update kernel headers from 4.10 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:28:36 -08:00
Stephen Hemminger 9174b4cf3e Merge branch 'master' into net-next 2017-01-20 09:27:57 -08:00
Roi Dayan 00697ca19a tc: flower: Fix incorrect error msg about eth type
addattr16 may return an error about the nl msg size
but not about incorrect eth type.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Roi Dayan c85609b25f tc: flower: Add missing err check when parsing flower options
addattr32 may return an error.

Fixes: cfcabf18d8 ("tc: flower: Add skip_{hw|sw} support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Stephen Hemminger 6166cc35be update kernel headers (from 4.10-rc4)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:26:27 -08:00
Alexander Heinlein d5eb0564da ip/xfrm: Fix deleteall when having many policies installed
Fix "Policy buffer overflow" when trying to use deleteall with many
policies installed.

Signed-off-by: Alexander Heinlein <alexander.heinlein@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:21:02 -08:00
Jiri Benc c3d09fba93 Revert "man pages: add man page for skbmod action"
This reverts commit a40995d1c7.

The patch is missing the actual tc-skbmod.8 file which causes 'make
install' to fail:

install -m 0755 -d /tmp/ip/usr/share/man/man8
install -m 0644 ip-address.8 ip-link.8 ip-route.8 ip.8 arpd.8 lnstat.8
routel.8 rtacct.8 rtmon.8 rtpr.8 ss.8 tc.8 tc-bfifo.8 tc-bpf.8 tc-cbq.8
tc-cbq-details.8 tc-choke.8 tc-codel.8 tc-fq.8 tc-drr.8 tc-ematch.8
tc-fq_codel.8 tc-hfsc.8 tc-htb.8 tc-pie.8 tc-mqprio.8 tc-netem.8 tc-pfifo.8
tc-pfifo_fast.8 tc-prio.8 tc-red.8 tc-sfb.8 tc-sfq.8 tc-stab.8 tc-tbf.8
bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 ip-addrlabel.8 ip-fou.8 ip-gue.8
ip-l2tp.8 ip-macsec.8 ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8
ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 ip-tcp_metrics.8
ip-netconf.8 ip-token.8 tipc.8 tipc-bearer.8 tipc-link.8 tipc-media.8
tipc-nametable.8 tipc-node.8 tipc-socket.8 tc-basic.8 tc-cgroup.8 tc-flow.8
tc-flower.8 tc-fw.8 tc-route.8 tc-tcindex.8 tc-u32.8 tc-matchall.8
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8
tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 tc-skbmod.8
tc-tunnel_key.8 devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8
devlink-sb.8 /tmp/ip/usr/share/man/man8
install: cannot stat ‘tc-skbmod.8’: No such file or directory
make[2]: *** [install] Error 1
make[1]: *** [install] Error 2

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-01-18 08:59:54 -08:00
Roi Dayan b2141de1ad tc: flower: Fix flower output for src and dst ports
This fix a missing use case after the introduction of enum flower_endpoint.

Fixes: 6910d65661 ("tc: flower: introduce enum flower_endpoint")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
2017-01-17 08:45:22 -08:00
Jamal Hadi Salim 1c570c50a3 utils: make hex2mem available to all users
hex2mem() api is useful for parsing hexstrings which are then packed in
a stream of chars.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-01-17 08:45:22 -08:00
Petr Vorel 530903dd90 ip: fix igmp parsing when iface is long
Entries with long vhost names in /proc/net/igmp have no whitespace
between name and colon, so sscanf() adds it to vhost and
'ip maddr show iface' doesn't include inet result.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
2017-01-17 08:39:55 -08:00
Phil Sutter a05b9557f4 tc: m_xt: Drop needless parentheses from #if checks
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-13 16:33:54 -08:00
Stephen Hemminger facfc5c1c0 include: remove unused header
not used by any source here

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-13 14:11:12 -08:00
Stephen Hemminger 65047fa641 add more uapi header files
In order to ensure no backward/forward compatiablity problems,
make sure that all kernel headers used come from the local copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:54:39 -08:00
Simon Horman f888f4e205 tc: flower: Support matching ARP
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \                    arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \                   arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-12 17:46:37 -08:00
Stephen Hemminger e2ade8cefb kernel headers update
For flower, etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:45:30 -08:00
Stephen Hemminger 51dd3455a3 Merge branch 'master' into net-next 2017-01-12 17:44:44 -08:00
Simon Horman aeeaae2fa9 tc: ife: correct spelling of prio in example
Correct typo in example in ife man page.

Fixes: 06f9a59170 ("man: tc-ife.8: man page for ife action")
Cc: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-12 17:40:19 -08:00
Nikolay Aleksandrov 7f10090b9f bridge: fdb: add state filter support
This patch adds a new argument to the bridge fdb show command that allows
to filter by entry state.
Also update the man page to include all available show arguments.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-12 17:38:55 -08:00
David Ahern 5ffbf4508c rttable: Fix invalid range checking when table id is converted to u32
Frank reported that table ids for very large numbers are not properly
detected:
$ ip li add foobar type vrf table 98765432100123456789

command succeeds and resulting table id is actually:

21: foobar: <NOARP,MASTER> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether da:ea:d4:77:38:2a brd ff:ff:ff:ff:ff:ff promiscuity 0
    vrf table 4294967295 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Make the temp variable 'i' unsigned long and let the typecast to u32
happen on assignment to id.

Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-12 17:34:22 -08:00
David Forster 40f9070d94 ip6tunnel: Align ipv6 tunnel key display with ipv4
Show ipv6 tunnel keys on presence of GRE_KEY flag for tunnel types
other than GRE. Aligns ipv6 behaviour with ipv4.

Signed-off-by: dforster@brocade.com
2017-01-12 17:34:02 -08:00
Phil Sutter 97a02cabef tc: m_xt: Fix segfault with iptables-1.6.0
Said iptables version introduced struct xtables_globals field
'compat_rev', a function pointer. Initializing it is mandatory as
libxtables calls it without existence check.

Without this, tc segfaults when using the xt action like so:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
|	action xt -j MARK --set-mark 20

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-12 17:32:26 -08:00
Stephen Hemminger 3bad1dbb20 whitespace cleanup
Get rid of blanks at end of line and extra lines at eof

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:31:20 -08:00
David Ahern 719e331ff6 Add support for rt_protos.d
Add support for reading proto id/name mappings from rt_protos.d
directory. Allows users to have custom protocol values converted
to human friendly names.

Each file under rt_protos.d has the 'id name' format used by
rt_protos. Only .conf files are read and parsed.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-12 17:31:18 -08:00
David Ahern 9b036afd3c ip vrf: Improve bpf error messages
Next up a non-root user gets various bpf related error messages:

$ ip vrf exec mgmt bash
Failed to load BPF prog: 'Operation not permitted'
Kernel compiled with CGROUP_BPF enabled?

Catch the EPERM error and do not show the kernel config option.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:09 -08:00
David Ahern 2bbc5b0726 ip vrf: Improve cgroup2 error messages
Currently, if a non-root user attempts to run ip vrf exec a non-helpful
error is returned:

$ ip vrf exec mgmt bash
Failed to mount cgroup2. Are CGROUPS enabled in your kernel?

Only show the CGROUPS kernel hint for the ENODEV error and for the
rest show the strerror for the errno. So now:

$ ip/ip vrf exec mgmt bash
Failed to mount cgroup2: Operation not permitted

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:08 -08:00
David Ahern edbae5e0b2 ip vrf: Fix run-on error message on mkdir failure
Andy reported a missing newline if a non-root user attempts to run
'ip vrf exec':

$ ./ip/ip vrf exec default /bin/echo asdf
mkdir failed for /var/run/cgroup2: Permission deniedFailed to setup vrf cgroup2 directory

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:08 -08:00
Simon Horman a5ae170ed8 tc: flower: Update dest UDP port documentation
Since 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
tc flower supports setting the dest UDP port.

* Use "port_number" to be consistent with other man-page text
* Re-add "enc_dst_port" documentation to manpage which was
  accidently removed by b2a1f740aa ("tc: flower: document that *_ip
  parameters take a PREFIX as an argument.")

Cc: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-09 12:09:46 -08:00
Stephen Hemminger e467a283b1 minor kernel header update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-09 12:09:26 -08:00
Stephen Hemminger 1693e4f257 Merge branch 'master' into net-next 2017-01-09 12:08:34 -08:00
David Michael bb18c98198 tc: make tc linking depend on libtc.a
There was a race condition where the command to link the tc binary
could (rarely) run before the libtc.a archive existed.
2017-01-09 12:06:58 -08:00
Paul Blakey 22a8f01989 tc: flower: support matching flags
Enhance flower to support matching on flags.

The 1st flag allows to match on whether the packet is
an IP fragment.

Example:

	# add a flower filter that will drop fragmented packets
	# (bit 0 of control flags)
	tc filter add dev ens4f0 protocol ip parent ffff: \
		flower \
		src_mac e4:1d:2d:fd:8b:01 \
		dst_mac e4:1d:2d:fd:8b:02 \
		indev ens4f0 \
		matching_flags 0x1/0x1 \
	action drop

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-29 10:42:08 -08:00
Stephen Hemminger d34adf67b5 Merge branch 'master' into net-next 2016-12-29 10:31:44 -08:00
Alexey Kodanev 7f97744777 fix typo in ip-xfrm man page, rmd610 -> rmd160
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
2016-12-29 10:24:35 -08:00
Baruch Siach d421bb4efe tc: add missing limits.h header
This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
                             ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2016-12-29 10:24:35 -08:00
Hadar Hen Zion f6d3126ef9 tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
tunnel key set parameters includes also dest UDP port, add it to the
usage.

Fixes: 449c709c38 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
2016-12-22 11:02:00 -08:00
Hadar Hen Zion bf73c650ac tc/cls_flower: Add to the usage encapsulation dest UDP port
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.

Fixes: 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
2016-12-22 11:02:00 -08:00
Simon Horman c2078f8dc4 tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
  to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
  filters from the kernel.

Example of use of LLADDR with and without a mask:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 16:07:53 -08:00
Simon Horman b2a1f740aa tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
  optional prefix length which is used to provide a mask to limit the scope
  of matching.
* This is documented as a PREFIX in keeping with ip-route(8).

Example of uses of IPv4 and IPv6 prefixes

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 dst_ip 2001:DB8::1 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 16:07:41 -08:00
Stephen Hemminger 8578bb731d Revert "tc: flower: Allow *_mac options to accept a mask"
This reverts commit 0390185078.
2016-12-21 16:06:49 -08:00
Stephen Hemminger 10da552800 Revert "tc: flower: document that *_ip parameters take a PREFIX as an argument."
This reverts commit a8a1dccd2a.
2016-12-21 16:06:35 -08:00
Stephen Hemminger 176b6b7329 update kernel headers 2016-12-21 15:58:49 -08:00
Roman Mashak 00fe039dd5 tc: updated man page to reflect filter-id use in filter GET command.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2016-12-21 15:56:39 -08:00
Roman Mashak 17b9668a86 tc: fixed man page fonts for keywords and variable values
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2016-12-21 15:56:39 -08:00
Julien Fortin fd4ca03935 ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
Fixes: 4fb4a10e12 ("ipaddress: Print IFLA_VF_QUERY_RSS_EN setting”)

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-12-21 15:56:39 -08:00
Simon Horman 0390185078 tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
  to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
  filters from the kernel.

Example of use of LLADDR with and without a mask:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 15:56:39 -08:00
Simon Horman a8a1dccd2a tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
  optional prefix length which is used to provide a mask to limit the scope
  of matching.
* This is documented as a PREFIX in keeping with ip-route(8).

Example of uses of IPv4 and IPv6 prefixes

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 dst_ip 2001:DB8::1 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 15:56:39 -08:00
David Ahern ee9369a05f ip netns: Reset vrf to default VRF on namespace switch
A vrf is local to a namespace. Drop any VRF association before trying
to exec a command in the new namespace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern 2917b4f41a ip vrf: Fix reset to default VRF
Path in vrf_switch for "default" VRF is supposed to be MNT/vrf not
MNT/default. Also, default_vrf flag is redundant with ifindex. Remove
the flag in favor of ifindex != 0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern b5efa59763 ip vrf: Refactor ipvrf_identify
Split ipvrf_identify into arg processing and a function that does the
actual cgroup file parsing. The latter function is used in a follow
on patch.

In the process, convert the reading of the cgroups file to use fopen
and fgets just in case the file ever grows beyond 4k. Move printing
of any error message and the vrf name to the caller of the new
vrf_identify.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern c94112faf5 ip vrf: Move kernel config hint to prog_load failure
Move the hint about CGROUP_BPF enabled to prog_load failure since
it fails before the attach. Update the existing error message to
print to stderr.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
Stephen Hemminger 9f9ccc89f7 configure: fix elftest when warnings enabled
If compile testing with -W then elftest.c would fail because
of unused variables.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-14 19:11:02 -08:00
David Ahern 8b59612f99 Fix compile warning in get_addr_1
A recent cleanup causes a compile warning on Debian jessie:

    CC       utils.o
utils.c: In function ‘get_addr_1’:
utils.c:486:21: warning: passing argument 1 of ‘ll_addr_a2n’ from incompatible pointer type
   len = ll_addr_a2n(&addr->data, sizeof(addr->data), name);
                     ^
In file included from utils.c:34:0:
../include/rt_names.h:27:5: note: expected ‘char *’ but argument is of type ‘__u32 (*)[8]’
 int ll_addr_a2n(char *lladdr, int len, const char *arg);
     ^

Revert the removal of the typecast

Fixes: e1933b9281 ("utils: cleanup style")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-14 19:00:36 -08:00
Roman Mashak 530753184a tc: pass correct conversion specifier to print 'unsigned int' action index.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-12-14 19:00:36 -08:00
Stephen Hemminger ab91aee4b0 ipvrf: cleanup style issues
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:43:24 -08:00
Stephen Hemminger e1933b9281 utils: cleanup style
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:41:36 -08:00
Stephen Hemminger 892a25e286 libnetlink: break up dump function
Indentation is deep here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:41:29 -08:00
David Ahern 1949f82cdf Introduce ip vrf command
'ip vrf' follows the user semnatics established by 'ip netns'.

The 'ip vrf' subcommand supports 3 usages:

1. Run a command against a given vrf:
       ip vrf exec NAME CMD

   Uses the recently committed cgroup/sock BPF option. vrf directory
   is added to cgroup2 mount. Individual vrfs are created under it. BPF
   filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
   device index. From there the current process (ip's pid) is addded to
   the cgroups.proc file and the given command is exected. In doing so
   all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
   the VRF domain.

   The association is inherited parent to child allowing the command to
   be a shell from which other commands are run relative to the VRF.

2. Show the VRF a process is bound to:
       ip vrf id
   This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
   entry with the VRF name following.

3. Show process ids bound to a VRF
       ip vrf pids NAME
   This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
   shows the process ids in the particular vrf cgroup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 463d9efaa2 libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".

Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 2330490f0e change name_is_vrf to return index
index of 0 means name is not a valid vrf.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 1dafceb1c9 Add filesystem APIs to lib
Add make_path to recursively call mkdir as needed to create a given
path with the given mode.

Add find_cgroup2_mount to lookup path where cgroup2 is mounted. If it
is not already mounted, cgroup2 is mounted under /var/run/cgroup2 for
use by iproute2.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 08bd33d77f move cmd_exec to lib utils
Code move only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 10e51a76a9 bpf: Add BPF_ macros
Based on version in kernel repo, samples/bpf/libbpf.h

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
David Ahern 869d889eed bpf: export bpf_prog_load
Code move only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
David Ahern fc4ccce038 lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
Roi Dayan 7d59d6354f tc: tunnel_key: Add tc-tunnel_key man page to Makefile
To be installed with the other man pages.

Fixes: d57639a475 ("tc/act_tunnel: Introduce ip tunnel action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
2016-12-13 10:15:11 -08:00
Roi Dayan 5c46a8fd61 tc: flower: Fix typo and style in flower man page
Replace vlan_eth_type with vlan_ethtype.

Fixes: 745d917260 ("tc: flower: Introduce vlan support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
2016-12-13 10:15:11 -08:00
Hadar Hen Zion 449c709c38 tc/m_tunnel_key: Add dest UDP port to tunnel key action
Enhance tunnel key action parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-13 10:15:11 -08:00
Hadar Hen Zion 41aa17ff46 tc/cls_flower: Add dest UDP port to tunnel params
Enhance IP tunnel parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-13 10:15:11 -08:00
Stephen Hemminger b723368caa lwtunnel: style cleanup
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:37:00 -08:00
Thomas Graf b15f440e78 lwt: BPF support for LWT
Adds support to configure BPF programs as nexthop actions via the LWT
framework.

Example:
   ip route add 192.168.253.2/32 \
     encap bpf out obj lwt_len_hist_kern.o section len_hist \
     dev veth0

Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-12-12 15:32:54 -08:00
Stephen Hemminger ba2a2124ec update to net-next headers (pre 4.10 rc)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:26:55 -08:00
Stephen Hemminger 2a56c090e4 Merge branch 'master' into net-next 2016-12-12 15:24:40 -08:00
Stephen Hemminger ae0969c893 v4.9.0 2016-12-12 15:07:42 -08:00
Stephen Hemminger dc5622cb66 update to 4.9 release headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:06:18 -08:00
David Ahern eebc7cc192 Makefile: really suppress printing of directories
Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
defined however Config always defines VERBOSE. Update the check to
whether VERBOSE is 0.

Fixes: 57bdf8b764 ("Make builds default to quiet mode")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-09 12:49:44 -08:00
Simon Horman eb3b5696f1 tc: flower: support matching on ICMP type and code
Support matching on ICMP type and code.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: flower \
	indev eth0 ip_proto icmp type 8 code 0 action drop

tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-09 12:46:34 -08:00
Simon Horman 6910d65661 tc: flower: introduce enum flower_endpoint
Introduce enum flower_endpoint and use it instead of a bool
as the type for paramatising source and destination.

This is intended to improve read-ability and provide some type
checking of endpoint parameters.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-09 12:45:59 -08:00
Daniel Borkmann c7272ca720 bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.

Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.

Minimal example:

  clang -target bpf -O2 -Wall -c prog.c -o prog.o
  ip [-force] link set dev em1 xdp obj prog.o       # attaching
  ip [-d] link                                      # dumping
  ip link set dev em1 xdp off                       # detaching

For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-12-09 12:44:12 -08:00
Daniel Borkmann fb24802b9c bpf: check for owner_prog_type and notify users when differ
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-12-09 12:44:12 -08:00
Thomas Graf 0f74d0f3a9 bpf: Fix number of retries when growing log buffer
The log buffer is automatically grown when the verifier output does not
fit into the default buffer size. The number of growing attempts was
not sufficient to reach the maximum buffer size so far.

Perform 9 iterations to reach max and let the 10th one fail.

j:0     i:65536         max:16777215
j:1     i:131072        max:16777215
j:2     i:262144        max:16777215
j:3     i:524288        max:16777215
j:4     i:1048576       max:16777215
j:5     i:2097152       max:16777215
j:6     i:4194304       max:16777215
j:7     i:8388608       max:16777215
j:8     i:16777216      max:16777215

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-09 12:42:11 -08:00
Roi Dayan 6566ca8cdb devlink: Add option to set and show eswitch inline mode
This is needed for some HWs to do proper macthing and steering.
Possible values are none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-12-09 12:41:03 -08:00
Roi Dayan a93b6bb3a2 devlink: Add usage help for eswitch subcommand
Add missing usage help for devlink dev eswitch subcommand.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-12-09 12:40:52 -08:00
Stephen Hemminger 3dd0bb51d7 update kernel headers from net-next
Net-next now closed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-09 12:40:07 -08:00
Stephen Hemminger e6fee79104 Merge branch 'master' into net-next 2016-12-09 12:38:51 -08:00
Stephen Hemminger e49aef96bb update kernel headers 2016-12-09 12:38:35 -08:00
Stephen Hemminger b95e5c55a9 Revert "devlink: Add usage help for eswitch subcommand"
This reverts commit 11f4cd31d2.
2016-12-09 12:37:39 -08:00
Stephen Hemminger d646916993 Revert "devlink: Add option to set and show eswitch inline mode"
This reverts commit b9dcf9c282.

Intended for net-next
2016-12-09 12:37:19 -08:00
Simon Horman 6bd5b80cdc tc: flower: make use of flower_port_attr_type() safe and silent
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
  Only access tb[] in the case of the former.
* Do not access null entries in tb[]

Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.

Fixes: a1fb0d4842 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 61dff9ac10 tc: flower: correct name of ip_proto parameter to flower_parse_port()
This corrects a typo.

Fixes: a1fb0d4842 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 6ad7e60c1f tc: flower: document SCTP ip_proto
Add SCTP ip_proto to help text and man page.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 730381fede tc: flower: remove references to eth_type in manpage
Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.

Also correct formatting of boldface text with whitespace.

Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-02 14:59:43 -08:00
Stephen Hemminger 143a704bf8 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-02 14:54:33 -08:00
Stephen Hemminger f2df31170f Merge branch 'master' into net-next 2016-12-02 14:19:08 -08:00
Simon Horman 1dd0cca7fa ss: initialise variables outside of for loop
Initialise for loops outside of for loops. GCC flags this as being
out of spec unless C99 or C11 mode is used.

With this change the entire tree appears to compile cleanly with -Wall.

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
...
$ make
...
ss.c: In function ‘unix_show_sock’:
ss.c:3128:4: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
...

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-02 14:17:09 -08:00
Amir Vadai d57639a475 tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Amir Vadai bb9b63b18e tc/cls_flower: Classify packet in ip tunnels
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action mirred egress redirect dev vnet0

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Amir Vadai aab0f61043 libnetlink: Introduce rta_getattr_be*()
Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and
change existing code to use it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Phil Sutter 039b3620cf ss: unix_show: No need to initialize members of calloc'ed structs
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter b710a72254 ss: Make sstate_namel local to scan_state()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 1882c0db02 ss: Make sstate_name local to sock_state_print()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 96d45daa92 ss: Make unix_state_map local to unix_show()
Also make it const, since there won't be any write access happening.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 2f938ce1fa ss: Get rid of single-fielded struct snmpstat
A struct with only a single field does not make much sense. Besides
that, it was used by print_summary() only.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 6b224dad23 ss: Get rid of useless goto in handle_follow_request()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter b3535dd61d ss: Make slabstat_ids local to get_slabstat()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 95eafe438a ss: Make some variables function-local
addrp_width and screen_width are used in main() only, so no need to have
them globally available.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter b25bad2ffe ss: Make user_ent_hash_build_init local to user_ent_hash_build()
By having it statically defined, there is no need for it to be global.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 86dfa1be4a ss: Make tmr_name local to tcp_timer_print()
It's used only there, so no need to have it globally defined.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 0cb74a8610 ss: Turn generic_proc_open() wrappers into macros
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter f25062e9e7 ss: Eliminate unix_use_proc()
This function is used only at a single place anymore, so replace the
call to it by it's content, which makes that specific part of
unix_show() consistent with e.g. tcp_show().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 2d0e538f3e ss: Drop list traversal from unix_stats_print()
Although this complicates the dedicated procfs-based code path in
unix_show() a bit, it's the only sane way to get rid of unix_show_sock()
output diverging from other socket types in that it prints all socket
details in a new line.

As a side effect, it allows to eliminate all procfs specific code in
the same function.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 5f27ac1db9 ss: introduce proc_ctx_print()
This consolidates identical code in three places. While the function
name is not quite perfect as there is different proc_ctx printing code
in netlink_show_one() as well, I sadly didn't find a more suitable one.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter be7e4d20b9 ss: Use sockstat->type in all socket types
Unix sockets used that field already to hold info about the socket type.
By replicating this approach in all other socket types, we can get rid
of protocol parameter in inet_stats_print() and have sock_state_print()
figure things out by itself.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 4519999708 ss: Add missing tab when printing UNIX details
When dumping UNIX sockets and show_details is active but not show_mem
(ss -xne), the socket details are printed without being prefixed by tab.
Fix this by printing the tab character when either one of '-e' or '-m'
has been specified.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 6babc649a9 ss: Drop empty lines in UDP output
When dumping UDP sockets and show_tcpinfo (-i) is active but not
show_mem (-m), print_tcpinfo() does not output anything leading to an
empty line being printed after every socket. Fix this by skipping the
call to print_tcpinfo() and the previous newline printing in that case.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 36df1a6e92 ss: Mark fall through in arg parsing switch()
As there is a certain chance of overlooking this, better add a comment
to draw readers' attention.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Yuchung Cheng b6c7fc61fa ss: print new tcp_info fields: busy, rwnd-limited, sndbuf-limited times
Dump some new fields added to tcp_info in v4.10: tcpi_busy_time,
tcpi_rwnd_limited, tcpi_sndbuf_limited.

Example output for a flow busy for 110ms but never measurably limited by
receive window or send buffer:
   busy:110ms

Example output for a flow usually limited by receive window:
   busy:111ms rwnd_limited:101ms(91.0%)

Example output for a flow sometimes limited by send buffer:
   busy:50ms sndbuf_limited:10ms(20.0%)

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-12-01 11:00:28 -08:00
Neal Cardwell 2f579872fb ss: print new tcp_info fields: delivery_rate and app_limited
Dump the new delivery_rate and delivery_rate_app_limited fields that
were added to tcp_info in Linux v4.9.

Example output:
  pacing_rate 65.7Mbps delivery_rate 62.9Mbps

And for the application-limited case this looks like:
  pacing_rate 1031.1Mbps delivery_rate 87.4Mbps app_limited

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-12-01 11:00:28 -08:00
Cyrill Gorcunov 41fe6c34de ss: Add inet raw sockets information gathering via netlink diag interface
unix, tcp, udp[lite], packet, netlink sockets already support diag
interface for their collection and killing. Implement support
for raw sockets.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-01 10:55:56 -08:00
Cyrill Gorcunov 9f66764e30 libnetlink: Add test for error code returned from netlink reply
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2016-12-01 10:55:56 -08:00
Stephen Hemminger bf9a0aff36 Update kernel headers for XDP and tcp_info
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-01 10:52:30 -08:00
Stephen Hemminger d6ad31db57 Merge branch 'master' into net-next 2016-12-01 10:48:05 -08:00
Phil Sutter f5f760b812 man: ip-route.8: Add notes about dropped IPv4 route cache
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-01 10:47:11 -08:00
Stephen Hemminger 328374dcfe Merge branch 'master' into net-next 2016-12-01 10:29:12 -08:00
Roi Dayan b9dcf9c282 devlink: Add option to set and show eswitch inline mode
This is needed for some HWs to do proper macthing and steering.
Possible values are none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-11-29 19:17:20 -08:00
Roi Dayan 11f4cd31d2 devlink: Add usage help for eswitch subcommand
Add missing usage help for devlink dev eswitch subcommand.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-11-29 19:17:20 -08:00
Zhang Shengju 6bd1ea28c5 link: add team and team_slave link type
Add missing team and team_slave link type.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 14:03:00 -08:00
Stephen Hemminger 281db53ff8 l2tp: style cleanup
Make l2tp conform to kernel style guidelines
2016-11-29 13:40:06 -08:00
Asbjørn Sloth Tønnesen 51a9d01aaa man: ip-l2tp.8: document UDP checksum options
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen f7982f5c95 l2tp: show tunnel: expose UDP checksum state
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 8a11421a5d l2tp: support sequence numbering
This patch implement and documents the user interface for
sequence numbering.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 35cc6ded4f l2tp: read IPv6 UDP checksum attributes from kernel
In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX}
the old hard-coded value is being preserved, since the attribute flag will be
missing.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen c73fad7860 l2tp: fix L2TP_ATTR_UDP_CSUM handling
L2TP_ATTR_UDP_CSUM is read by the kernel as a NLA_FLAG value,
but is validated as a NLA_U8, so we will write it as an u8,
but the value isn't actually being read by the kernel.

It is written by the kernel as a NLA_U8, so we will read as
such.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 4d51b3331e l2tp: fix L2TP_ATTR_{RECV,SEND}_SEQ handling
L2TP_ATTR_RECV_SEQ and L2TP_ATTR_SEND_SEQ are declared as NLA_U8
attributes in the kernel, so let's threat them accordingly.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 31f63e7c42 l2tp: fix integers with too few significant bits
udp6_csum_{tx,rx}, tunnel and session are the only ones
currently used.

recv_seq, send_seq, lns_mode and data_seq are partially
implemented in a useless way.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen d0baf5cac8 man: ip-l2tp.8: remove non-existent tunnel parameter name
The name parameter is only valid for sessions, not tunnels.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 222c4dab8e man: ip-l2tp.8: fix l2spec_type documentation
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Roman Mashak 98df0c81da tc: distinguish Add/Replace filter operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 13:26:10 -08:00
Daniel Hopf 3a4df03913 macsec: Nr. of packets and octets for macsec tx stats were swapped
Acked-by: Rami Rosen <roszenrami@gmail.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Daniel Hopf <daniel.hopf@continental-corporation.com>
2016-11-29 13:22:12 -08:00
Mike Frysinger eca7a74219 ifstat/nstat: fix help output alignment
Some lines use tabs while others use spaces.  Use spaces everywhere.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2016-11-29 13:17:08 -08:00
Stephen Hemminger 2c500a4dc2 libnetlink: style cleanups
Follow kernel style related cleanups:
 * break long lines
 * remove unnecessary void * cast
2016-11-29 13:15:08 -08:00
Zhang Shengju 1b109a30bf libnetlink: reduce size of message sent to kernel
Fixes commit 246f57c4086d99fa ("ip link: Add support for kernel
side filtering").

This patch reduce the size of message sent to kernel space. Before this
patch, for command: 'ip link show', we will sent 1056 bytes. With this
patch, we only need to send 40 bytes.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 13:03:00 -08:00
Zhang Shengju 2d98dd4821 iproute2: fix the link group name getting error
In the situation where more than one entry live in the same hash bucket,
loop to get the correct one.

Before:
$ cat /etc/iproute2/group
0	default
256     test

$ sudo ip link set group test dummy1

$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group 0 qlen 1000
    link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
    link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff

After:
$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
    link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 12:48:07 -08:00
david decotigny ba7b97776e iproute2: a non-expected rtnl message is an error 2016-11-29 12:44:30 -08:00
david decotigny 8be2955816 iproute2: avoid exit in case of error.
Be consistent with how non-0 print_route() return values are handled
elesewhere: return -1.
2016-11-29 12:44:30 -08:00
michael-dev@fami-braun.de aa1b44ca77 iproute2: macvlan: add "source" mode
Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com

v5:
 - rebase and fix checkpatch

v4:
 - add MACADDR_SET support
 - skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
 - fix man page
2016-11-29 12:41:42 -08:00
Daniel Borkmann e42256699c bpf: make tc's bpf loader generic and move into lib
This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.

  [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-11-29 12:35:32 -08:00
Lorenzo Colitti 82252cdc50 ip: support UID range routing.
- Support adding, deleting and showing IP rules with UID ranges.
- Support querying per-UID routes via "ip route get uid <UID>".

UID range routing was added to net-next in 4fb7450683 ("Merge
branch 'uid-routing'")

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-11-29 12:26:37 -08:00
Stephen Hemminger 512caeb273 tc: flower checkpatch cleanups
break long lines and minor whitespace changes.
2016-11-29 11:48:52 -08:00
Simon Horman a1fb0d4842 tc: flower: Support matching on SCTP ports
Support matching on SCTP ports in the same way that matching
on TCP and UDP ports is already supported.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: \
        flower indev eth0 ip_proto sctp dst_port 80 \
        action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-11-29 11:44:46 -08:00
Stephen Hemminger 1a97748be4 update net-next headers 2016-11-29 11:43:40 -08:00
Stephen Hemminger 6e2e71e16e update headers based on 4.9-rc7 2016-11-29 11:41:58 -08:00
Stephen Hemminger b932e6f372 tc: cleanup style of qdisc code
Get rid of lingering mismatches with kernel style.
2016-11-29 11:41:58 -08:00
Roman Mashak d42e1444f2 tc: print raw qdisc handle.
This is v2 patch with fixed code indentation.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Roman Mashak 4b5451c4cd tc: improved usage help for fw classifier.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Phil Sutter 4fb4a10e12 ipaddress: Print IFLA_VF_QUERY_RSS_EN setting
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:58 -08:00
Roman Mashak 7bdcc0d942 tc: updated man page to reflect GET command to retrieve a single filter.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Stephen Hemminger 468fa020f1 ip: style cleanup
Make code more inline with current kernel style
2016-11-29 11:41:58 -08:00
Phil Sutter ff9463e048 ipaddress: Simplify vf_info parsing
Commit 7b8179c780 ("iproute2: Add new command to ip link to
enable/disable VF spoof check") tried to add support for
IFLA_VF_SPOOFCHK in a backwards-compatible manner, but aparently overdid
it: parse_rtattr_nested() handles missing attributes perfectly fine in
that it will leave the relevant field unassigned so calling code can
just compare against NULL. There is no need to layback from the previous
(IFLA_VF_TX_RATE) attribute to the next to check if IFLA_VF_SPOOFCHK is
present or not. To the contrary, it establishes a potentially incorrect
assumption of these two attributes directly following each other which
may not be the case (although up to now, kernel aligns them this way).

This patch cleans up the code to adhere to the common way of checking
for attribute existence. It has been tested to return correct results
regardless of whether the kernel exports IFLA_VF_SPOOFCHK or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Greg Rose <grose@lightfleet.com>
2016-11-29 11:41:58 -08:00
Stephen Hemminger 168d97f97b ss: break really long lines 2016-11-29 11:41:58 -08:00
Phil Sutter f89d46ad63 ss: Add support for SCTP protocol
This makes use of the sctp_diag interface recently added to the kernel.

Joint work with Xin Long who provided the PoC implementation which I
merely polished up a bit.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:57 -08:00
Phil Sutter 5dec02d7b4 include: Add linux/sctp.h
Add sanitized UAPI linux/sctp.h header file.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:57 -08:00
Paul Blakey d9c3995ab7 tc: flower: Fix usage message
Remove left over usage from removal of eth_type argument.

Fixes: 488b41d020 ('tc: flower no need to specify the ethertype')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2016-11-12 10:19:06 +03:00
Isaac Boukris 878dadc79d iproute2: ss: escape all null bytes in abstract unix domain socket
Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed by ss the
same way the null prefix is currently being translated.

Signed-off-by: Isaac Boukris <iboukris@gmail.com>
2016-11-12 10:16:24 +03:00
stefan@datenfreihafen.org 8ae2c5382b ip: update link types to show 6lowpan and ieee802.15.4 monitor
Both types have been missing here and thus ip always showed
only the numbers.

Based on a suggestion from Alexander Aring.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2016-11-12 10:14:03 +03:00
Shmulik Ladkani 5eca0a3701 tc: m_mirred: Add support for ingress redirect/mirror
So far, only the 'egress' direction was implemented.

Allow specifying 'ingress' as the direction packet appears on the target
interface.

For example, this takes incoming 802.1q frames on veth0 and redirects
them for input on dummy0:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q basic \
     action mirred ingress redirect dev dummy0

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2016-10-26 11:20:47 -07:00
Stephen Hemminger e770979cf1 update kernel headers to 4.9-net-next 2016-10-26 11:20:29 -07:00
Stephen Hemminger f3f339e959 cleanup debris from revert
Last revert didn't come out clean.
2016-10-26 11:19:11 -07:00
Stephen Hemminger c07a36c3db Revert "iproute2: macvlan: add "source" mode"
This reverts commit f33b727610.

The upstream changes are not in 4.9
2016-10-26 11:15:09 -07:00
Daniel Borkmann 4710e46ec3 tc, ipt: don't enforce iproute2 dependency on iptables-devel
Since 5cd1adba79 ("Update to current iptables headers") compilation
of iproute2 broke for systems without iptables-devel package [1].
Reason is that even though we fall back to build m_ipt.c, the include
depends on a xtables-version.h header, which only ships with
iptables-devel. Machines not having this package fail compilation with:

    [...]
    CC       m_ipt.o
In file included from ../include/iptables.h:5:0,
                 from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory
compilation terminated.
../Config:31: recipe for target 'm_ipt.o' failed
make[1]: *** [m_ipt.o] Error 1

The configure script only barks that package xtables was not found in
the pkg-config search path. The generated Config then only contains f.e.
TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o
to TCMODULES. m_ipt.c then includes the local include/iptables.h header
copy, which includes the include/xtables.h copy. Latter then includes
xtables-version.h, which only ships with iptables-devel.

One way to resolve this is to skip this whole mess when pkg-config has
no xtables config available. I've carried something along these lines
locally for a while now, but it's just too annyoing. :/ Build works fine
now also when xtables.pc is not available.

  [1] http://www.spinics.net/lists/netdev/msg366162.html

Fixes: 5cd1adba79 ("Update to current iptables headers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-10-26 10:58:22 -07:00
Hangbin Liu 7a34b9d098 devlink: Convert conditional in dl_argv_handle_port() to switch()
Discovered by Phil's covscan. The final return statement is never reached.
This is not inherently clear from looking at the code, so change the
conditional to a switch() statement which should clarify this.

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-10-17 05:30:45 -07:00
Nikolay Aleksandrov 9208b4e7c9 bridge: add support for the multicast flood flag
Recently a new per-port flag was added which controls the flooding of
unknown multicast, this patch adds support for controlling it via iproute2.
It also updates the man pages with information about the new flag.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-17 05:29:24 -07:00
Jakub Kicinski 87e46a5198 tc: cls_bpf: handle skip_sw and skip_hw flags
Add support for controling hardware offload using (now standard)
skip_sw and skip_hw flags in cls_bpf.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2016-10-17 05:27:59 -07:00
Nikolay Aleksandrov 660afec25f bridge: vlan: remove wrong stats help
When I did the per-vlan stats iproute2 support, I left out a hunk from a
previous version of the patch that was using a special subcommand "stats".
Since the latest version uses the -s switch remove the help for the stats
subcommand.

Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-17 05:22:47 -07:00
Stephen Hemminger 7409334b87 ip: macvlan style cleanup
breaklong lines.
2016-10-12 15:23:27 -07:00
michael-dev@fami-braun.de f33b727610 iproute2: macvlan: add "source" mode
Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com
2016-10-12 15:22:14 -07:00
Lucas Bates a40995d1c7 man pages: add man page for skbmod action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:21:55 -07:00
Stephen Hemminger ec2e005fe5 tc_filter: style cleanup
Break long lines and whtespace changes.
2016-10-12 15:21:13 -07:00
Jamal Hadi Salim 120f556d15 tc filters: add support to get individual filters by handle
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol ip \
u32 match u32 0 0 flowid 1:1 \
action ok
sudo $TC filter add dev $ETH parent ffff: prio 1 protocol ip \
u32 match ip protocol 1 0xff flowid 1:10 \
action ok

now dump to see all rules..
$TC -s filter ls dev $ETH parent ffff: protocol ip
 ....
filter pref 1 u32
filter pref 1 u32 fh 801: ht divisor 1
filter pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10  (rule hit 0 success 0)
  match 00010000/00ff0000 at 8 (success 0 )
        action order 1: gact action drop
         random type none pass val 0
         index 6 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 336 success 336)
  match 00000000/00000000 at 0 (success 336 )
        action order 1: gact action pass
         random type none pass val 0
         index 5 ref 1 bind 1 installed 38 sec used 4 sec
        Action statistics:
        Sent 24864 bytes 336 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
 ....

..get filter 801::800
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 801:0:800 prio 2  u32

 ....
filter parent ffff: protocol ip pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10  (rule hit 260 success 130)
  match 00010000/00ff0000 at 8 (success 130 )
        action order 1: gact action drop
         random type none pass val 0
         index 6 ref 1 bind 1 installed 348 sec used 0 sec
        Action statistics:
        Sent 11440 bytes 130 pkt (dropped 130, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
 ....

..get other one
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 800:0:800 prio 2  u32

....
filter parent ffff: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 514 success 514)
  match 00000000/00000000 at 0 (success 514 )
        action order 1: gact action pass
         random type none pass val 0
         index 5 ref 1 bind 1 installed 506 sec used 4 sec
        Action statistics:
        Sent 35544 bytes 514 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
....

..try something that doesnt exist
$TC -s filter get dev $ETH parent ffff: protocol ip  handle 800:0:803 prio 2  u32

.....
RTNETLINK answers: No such file or directory
We have an error talking to the kernel
.....

Note, added NLM_F_ECHO is for backward compatibility. old kernels never
before Eric's patch will not respond without it and newer kernels (after Erics patch)
will ignore it.
In old kernels there is a side effect:
In addition to a response to the GET you will receive an event (if you do tc mon).
But this is still better than what it was before (not working at all).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:14:47 -07:00
Stephen Hemminger 557b705445 tc: skbmod style cleanup
break long lines
2016-10-12 15:12:51 -07:00
Jamal Hadi Salim 46871dc9c6 man pages: Add tc-ife to Makefile
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Lucas Bates d491a3480f man pages: update ife action to include tcindex
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Jamal Hadi Salim da65128998 actions: add skbmod action
This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
    munge offset -13 u8 set 0x15 \
    munge offset -12 u8 set 0x15 \
    munge offset -11 u8 set 0x15 \
    munge offset -10 u16 set 0x1515 \
    pipe

to:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15

Or worse, try to debug a policy with destination mac, source mac and
etherype. Then make that a hundred rules and you'll get my point.

The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesn't get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac).

In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Craig Dillabaugh 883c6708e4 action gact: list pipe as a valid action
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Jamal Hadi Salim 8da6ff35cd actions ife: Introduce encoding and decoding of tcindex metadata
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 1b600f4b54 ife: improve help text
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 57ee4430f9 ife: print prio, mark and hash as unsigned
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 9a56cca3f3 ife action: allow specifying index in hex
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Stephen Hemminger e147161b1a ip: iprule style cleanup
Trivial whitespace cleanup to iprule
2016-10-09 19:29:24 -07:00
Hangbin Liu ca89c52143 ip rule: add selector support
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-10-09 19:25:59 -07:00
Hangbin Liu cb294a1de6 ip rule: merge ip rule flush and list, save together
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-10-09 19:25:59 -07:00
Stephen Hemminger 6773bcc227 iplink: cleanup style errors
Fix long strings causing checkpatch warnings
2016-10-09 19:24:38 -07:00
Moshe Shemesh 56e9f0ab19 ip link: Add support to configure SR-IOV VF to vlan protocol 802.1ad (VST QinQ)
Introduce a new API that exposes a list of vlans per VF (IFLA_VF_VLAN_LIST),
giving the ability for user-space application to specify it for the VF as
an option to support 802.1ad (VST QinQ).

We introduce struct vf_vlan_info, which extends struct vf_vlan and adds
an optional VF VLAN proto parameter.
Default VLAN-protocol is 802.1Q.

Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older kernel versions.

Suitable ip link tool command examples:
 - Set vf vlan protocol 802.1ad (S-TAG)
	ip link set eth0 vf 1 vlan 100 proto 802.1ad
 - Set vf vlan S-TAG and vlan C-TAG (VST QinQ)
	ip link set eth0 vf 1 vlan 100 proto 802.1ad vlan 30 proto 802.1Q
 - Set vf to VST (802.1Q) mode
	ip link set eth0 vf 1 vlan 100 proto 802.1Q
 - Or by omitting the new parameter (backward compatible)
	ip link set eth0 vf 1 vlan 100

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2016-10-09 19:17:15 -07:00
Eric Dumazet 39f8caeb96 tc: fq: display unthrottle latency
In linux-4.9 fq packet scheduler got a new stat :

unthrottle_latency in nano second units.

Gives a good indication of system load or timer implementation
latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-10-09 19:15:13 -07:00
Shmulik Ladkani 4654173e90 tc: m_vlan: Add vlan modify action
The 'vlan modify' action allows to replace an existing 802.1q tag
according to user provided settings.
It accepts same arguments as the 'vlan push' action.

For example, this replaces vid 6 with vid 5:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \
      basic match 'meta(vlan mask 0xfff eq 6)' \
      action vlan modify id 5 continue

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2016-10-09 19:11:34 -07:00
Nikolay Aleksandrov 590bf22a34 ipmroute: add support for age dumping
Add support to dump the mroute cache entry age if the show_stats (-s)
switch is provided.
Example:
$ ip -s mroute
(0.0.0.0, 239.10.10.10)          Iif: eth0       Oifs: eth0
  0 packets, 0 bytes, Age  245.44

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-09 19:09:31 -07:00
Stephen Hemminger b96306f8d9 Merge branch 'master' into net-next 2016-10-09 19:04:50 -07:00
Stephen Hemminger 63ec17a3da v4.8.0 2016-10-09 19:00:11 -07:00
Anton Aksola e29a8e0537 iproute2: build nsid-name cache only for commands that need it
The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.

As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.

Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m16.832s
user    0m1.350s
sys    0m15.029s

patched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m3.859s
user    0m0.132s
sys    0m3.205s

Signed-off-by: Anton Aksola <aakso@iki.fi>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-10-09 18:56:47 -07:00
Stephen Hemminger d99272470a update headers from pre 4.9 (net-next) 2016-10-09 18:55:58 -07:00
Stephen Hemminger d54e3ab985 Merge branch 'master' into net-next 2016-10-09 18:53:52 -07:00
Sushma Sitaram 58d93d0030 tc: f_u32: Fill in 'linkid' provided by user
Currently, 'linkid' input by the user is parsed but 'handle' is appended to the netlink message.

# tc filter add dev enp1s0f1 protocol ip parent ffff: prio 99 u32 ht 800: \
	order 1 link 1: offset at 0 mask 0f00 shift 6 plus 0 eat match ip \
	protocol 6 ff

resulted in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0
  match 00060000/00ff0000 at 8
    offset 0f00>>6 at 0  eat

This patch results in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 link 1:
  match 00060000/00ff0000 at 8
    offset 0f00>>6 at 0  eat

Signed-off-by Sushma Sitaram: Sushma Sitaram <sushma.sitaram@intel.com>
2016-10-09 18:51:00 -07:00
anuradhak afd3921ea9 bridge: Fix garbled json output seen if a vlan filter is specified
json objects were started but not completed if the fdb vlan did not
match the specified filter vlan.

Sample output:
$ bridge -j fdb show vlan 111
[{
        "mac": "44:38:39:00:69:88",
        "dev": "br0",
        "vlan": 111,
        "master": "br0",
        "state": "permanent"
    }
]
$ bridge -j fdb show vlan 100
[]
$

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2016-10-09 18:49:32 -07:00
Igor Ryzhov 6cf2609ddb fix netlink message length checks
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2016-10-09 18:48:30 -07:00
Hangbin Liu 22a84711f4 ip: Use specific slave id
The original bond/bridge/vrf and slaves use same id, which make people
confused. Use bond/bridge/vrf_slave as id name will make code more clear.

Acked-by: Phil Sutter <psutter@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-22 16:39:55 -07:00
Hangbin Liu 77089b583a misc/ss: tcp cwnd should be unsigned
tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
results in negative bandwidth calculations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-22 16:39:08 -07:00
Hangbin Liu d1f338b318 misc/ss: tcp cwnd should be unsigned
tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
results in negative bandwidth calculations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-22 16:38:22 -07:00
Lorenzo Colitti ec75249b14 ss: Support displaying and filtering on socket marks.
This allows the user to dump sockets with a given mark (via
"fwmark = 0x1234/0x1234" or "fwmark = 12345", etc.) , and to
display the socket marks of dumped sockets.

The relevant kernel commits are: d545caca827b ("net: inet: diag:
expose the socket mark to privileged processes.") and
- a52e95abf772 ("net: diag: allow socket bytecode filters to
match socket marks")

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-09-22 16:34:40 -07:00
Alexei Starovoitov 4bfe682536 iptnl: add support for collect_md flag in IPv4 and IPv6 tunnels
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2016-09-21 16:36:24 -07:00
Stephen Hemminger a9c990b6d7 Merge branch 'master' into net-next 2016-09-21 16:35:56 -07:00
Jiri Benc 1f4c51c0e4 tunnels: use macros for IPv6 address comparison
Replace open coded comparison of IPv6 addresses with appropriate macros.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-09-21 16:35:05 -07:00
Liping Zhang c44003f7e7 ipmonitor: fix ip monitor can't work when NET_NS is not enabled
In ip monitor, netns_map_init will check getnsid is supported or not.
But when /proc/self/ns/net does not exist, we just print out error
messages and exit. So user cannot use ip monitor anymore when
CONFIG_NET_NS is disabled:
  # ip monitor
  open("/proc/self/ns/net"): No such file or directory

If open "/proc/self/ns/net" failed, set have_rtnl_getnsid to false.

Fixes: d652ccbf81 ("netns: allow to dump and monitor nsid")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-09-21 16:32:44 -07:00
Neal Cardwell 2f0f9aef94 ss: output TCP BBR diag information
Dump useful TCP BBR state information from a struct tcp_bbr_info that
was grabbed using the inet_diag API.

We tolerate info that is shorter or longer than expected, in case the
kernel is older or newer than the ss binary. We simply print the
minimum of what is expected from the kernel and what is provided from
the kernel. We use the same trick as that used for struct tcp_info:
when the info from the kernel is shorter than we hoped, we pad the end
with zeroes, and don't print fields if they are zero.

The BBR output looks like:
  bbr:(bw:1.2Mbps,mrtt:18.965,pacing_gain:2.88672,cwnd_gain:2.88672)

The motivation here is to be consistent with DCTCP, which looks like:
  dctcp(ce_state:23,alpha:23,ab_ecn:23,ab_tot:23)

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-09-21 16:29:35 -07:00
Stephen Hemminger 16c2a51dc4 update bpf.h 2016-09-21 16:28:56 -07:00
Hangbin Liu bffb68b6c2 ip route: check ftell, fseek return value
ftell() may return -1 in error case, which is not handled and
therefore pass a negative offset to fseek(). The return code of
fseek() is also not checked.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-20 09:52:35 -07:00
Stephen Hemminger 36923f4e69 Merge branch 'master' into net-next 2016-09-20 09:50:53 -07:00
Mahesh Bandewar b7c1488034 ip: (ipvlan) introduce L3s mode
The new mode 'l3s' can be set like -

  ip link add link <master> dev <IPvlan-slave> type ipvlan mode l3s

  e.g. ip link add link eth0 dev ipvl0 type ipvlan mode l3s

Also did some trivial code restructuring.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
2016-09-20 09:50:45 -07:00
Davide Caratti f20f5f7990 macsec: fix input range of 'icvlen' parameter
the maximum possible ICV length in a MACsec frame is 16 octects, not 32:
fix get_icvlen() accordingly, so that a proper error message is displayed
in case input 'icvlen' is greater than 16.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
2016-09-20 09:48:26 -07:00
Jiri Benc e2cfe5501f vxlan: group address requires net device
This is now enforced in the kernel, check also in iproute to get a better
error message.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-09-20 09:46:41 -07:00
Davide Caratti 087dec7fcf tc: don't accept qdisc 'handle' greater than ffff
since get_qdisc_handle() truncates the input value to 16 bit, return an
error and prompt "invalid qdisc ID" in case input 'handle' parameter needs
more than 16 bit to be stored.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-20 09:44:59 -07:00
Phil Sutter 003f0fde69 iproute: fix documentation for ip rule scan order
Looks like the real issue is missing definition of priority.
2016-09-20 09:36:45 -07:00
Stephen Hemminger e8a67bc4cf update kernel headers from net-next 2016-09-20 09:31:42 -07:00
Stephen Hemminger f3af3074fd tipc: cleanup style issues
Fix style issues reported by checkpatch.
2016-09-20 09:25:42 -07:00
Parthasarathy Bhuvaragan 76fee71bf3 tipc: update man page for link monitor
Add description for the new link monitor commands.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 5b748f094b tipc: add link monitor list
In this commit, we list the monitor attributes. By default it lists
the attributes for all bearers, otherwise the specified bearer.

A sample usage is shown below:
$ tipc link monitor list

bearer eth:data0
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    16         UU []
1.1.2         up     direct    16         UU []
1.1.3         up     direct    16         UU []

bearer eth:data1
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    2          UU []
1.1.2         up     direct    3          UU []
1.1.3         up     direct    3          UU []

$ tipc link monitor list media eth device data0

bearer eth:data0
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    16         UU []
1.1.2         up     direct    16         UU []
1.1.3         up     direct    16         UU []

$ tipc link monitor list -h
Usage: tipc monitor list [ media MEDIA ARGS...]

MEDIA
 udp                   - User Datagram Protocol
 ib                    - Infiniband
 eth                   - Ethernet

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan d2ba0b0bbb tipc: refractor bearer to facilitate link monitor
In this commit, we:
1. Export print_bearer_media()
2. Move the bearer name handling from nl_add_bearer_name() into
   a new function cmd_get_unique_bearer_name().

These exported functions will be used by link monitor used in
subsequent commits.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 80e9807dff tipc: add link monitor summary
The monitor summary command prints the basic attributes
specific to the local node.
A sample usage is shown below:
$ tipc link monitor summary
bearer eth:data0
    table_generation 15
    cluster_size 8
    algorithm overlapping-ring

bearer eth:data1
    table_generation 15
    cluster_size 8
    algorithm overlapping-ring

$ tipc link monitor summary -h
Usage: tipc monitor summary

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 7da7ef9bd8 tipc: add link monitor get threshold
The command prints the monitor activation threshold.
A sample usage is shown below:
$ tipc link monitor get threshold
32

$ tipc link monitor get -h
Usage: tipc monitor get PPROPERTY

PROPERTIES
 threshold      - Get monitor activation threshold

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan b33a69005e tipc: add link monitor set threshold
The command sets the activation threshold for the new
cluster ring supervision.
A sample usage is shown below:
$ tipc link monitor set threshold 4

$ tipc link monitor set -h
Usage: tipc monitor set PPROPERTY

PROPERTIES
 threshold SIZE - Set activation threshold for monitor

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 5f944e47ea tipc: remove dead code
remove dead code and a newline.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Stephen Hemminger 6831acc8ef Merge branch 'master' into net-next 2016-09-20 09:13:03 -07:00
Phil Sutter 31a29009c5 iproute: fix documentation for ip rule scan order
Hi,

On Thu, Sep 08, 2016 at 11:59:55AM +0200, Michal Kubecek wrote:
> On Thu, Sep 01, 2016 at 09:04:54AM -0700, Stephen Hemminger wrote:
> > On Tue, 30 Aug 2016 17:32:52 -0700
> > Iskren Chernev <iskren@imo.im> wrote:
> >
> > > From 416f45b62f33017d19a9b14e7b0179807c993cbe Mon Sep 17 00:00:00 2001
> > > From: Iskren Chernev <iskren@imo.im>
> > > Date: Tue, 30 Aug 2016 17:08:54 -0700
> > > Subject: [PATCH bug-fix] iproute: fix documentation for ip rule scan order
> > >
> > > ---
> > >  man/man8/ip-rule.8 | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
> > > index 1774ae3..3508d80 100644
> > > --- a/man/man8/ip-rule.8
> > > +++ b/man/man8/ip-rule.8
> > > @@ -93,7 +93,7 @@ Each policy routing rule consists of a
> > >  .B selector
> > >  and an
> > >  .B action predicate.
> > > -The RPDB is scanned in order of decreasing priority. The selector
> > > +The RPDB is scanned in order of increasing priority. The selector
> > >  of each rule is applied to {source address, destination address,
> > > incoming
> > >  interface, tos, fwmark} and, if the selector matches the packet,
> > >  the action is performed. The action predicate may return with success.
> > > --
> > > 2.4.5
> >
> > Applied
>
> I'm sorry I didn't notice before but this just reverts the change done
> by commit 4957250166 ("iproute2: clarification of various man8 pages").
> IMHO the problem is that both versions are equally confusing as the word
> "priority" can be understood in two different senses.
>
> How about more explicit formulation, e.g.
>
>   ... in order of decreasing logical priority (i.e. increasing numeric
>   values).
>
> Would that be better?

Looks like the real issue is missing definition of priority. What about
this:
2016-09-20 09:08:56 -07:00
Thomas Graf 113fab78e4 tuntap: Add name attribute to usage text
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-09-08 14:31:33 -07:00
Hangbin Liu 12f92e2e4f gitignore: Ignore 'tags' file generated by ctags
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-08 14:30:44 -07:00
Hangbin Liu 45a0dc164a nstat: add sctp snmp support
SCTP module was not load by default. But this should be OK since we will not
load table if fdopen() failed, also opening the proc file won't load SCTP
kernel module.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-08 14:29:36 -07:00
Stephen Hemminger 88ba11bc08 Merge branch 'master' into net-next 2016-09-01 09:11:10 -07:00
Stephen Hemminger 3cad6e5f25 update kernel headers from 4.8-rc4 2016-09-01 09:10:43 -07:00
Davide Caratti 0330f49ea0 macsec: fix byte ordering on input/display of 'sci'
use get_be64() in place of get_u64() when parsing input 'sci' parameter,
so that 'sci' can be entered using network byte order regardless the
endianness of target system; use ntohll() when printing out 'sci'. While
at it, improve documentation of 'sci' in ip-link.8.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Davide Caratti d0baa1389f man: ip.8: add missing 'macsec' item to OBJECT list
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Davide Caratti 5898bd667a macsec: fix input of 'port', improve documentation of 'address'
remove hardcoded base 10 parsing of 'port' parameter, update man page
and fix usage() functions as well. Fix misleading line in man page that
theoretically allowed specifying 'port' keyword right after 'sci' keyword.
Provide documentation of 'address' parameter in man pages and in usage()
functions as well.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Stephen Hemminger cc28aad1e6 ip: iptuntap cleanup
Minor whitespace changes
2016-09-01 09:03:40 -07:00
Stephen Hemminger ae810982cc remove useless return statement
Get rid of:
void foo() {
...
	return;
}
2016-09-01 08:44:20 -07:00
Iskren Chernev 4a564d914d iproute: fix documentation for ip rule scan order 2016-09-01 08:41:37 -07:00
Andrey Jr. Melnikov 67a990b811 iproute: disallow ip rule del without parameters
Disallow run `ip rule del` without any parameter to avoid delete any first
rule from table.

Signed-off-by: Andrey Jr. Melnikov <temnota.am@gmail.com>
2016-09-01 08:41:37 -07:00
Hannes Frederic Sowa 567e696072 iptuntap: show processes using tuntap interface
Show which processes are using which tun/tap devices, e.g.:

$ ip -d tuntap
tun0: tun
	Attached to processes: vpnc(9531)
vnet0: tap vnet_hdr
	Attached to processes: qemu-system-x86(10442)
virbr0-nic: tap UNKNOWN_FLAGS:800
	Attached to processes:

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2016-09-01 08:41:37 -07:00
Nikolay Aleksandrov 56e3eb4c34 ip: route: fix multicast route dumps
If we have multicast routes and do ip route show table all we'll get the
following output:
 ...
 multicast ???/32 from ???/32  table default  proto static  iif eth0
The "???" are because the rtm_family is set to RTNL_FAMILY_IPMR instead
(or RTNL_FAMILY_IP6MR for ipv6). Add a simple workaround that returns the
real family based on the rtm_type (always RTN_MULTICAST for ipmr routes)
and the rtm_family. Similar workaround is already used in ipmroute, and
we can use this helper there as well.

After the patch the output is:
multicast 239.10.10.10/32 from 0.0.0.0/32  table default  proto static  iif eth0

Also fix a minor whitespace error and switch to tabs.

Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-09-01 08:41:37 -07:00
Stephen Hemminger 98a2af1d40 Merge branch 'master' into net-next 2016-09-01 08:39:15 -07:00
Hadar Hen Zion 0e43ed9dea tc: m_vlan: Add priority option to push vlan action
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent ffff: \
	flower \
	indev veth0 \
	action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:38:41 -07:00
Hadar Hen Zion 745d917260 tc: flower: Introduce vlan support
Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent ffff: \
	flower \
		indev ens4f0 \
		vlan_ethtype ipv4 \
		vlan_id 100 \
		vlan_prio 3 \
	action vlan pop

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:38:41 -07:00
Yotam Gigi 0501294bca tc: man: Add man entry for the matchall classifier.
In addition to providing information about the mathcall filter and its
configurations, the man entry contains examples for creating port
mirorring entries.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:37:01 -07:00
Yotam Gigi d5cbf3ff05 tc: Add support for the matchall traffic classifier.
The matchall classifier matches every packet and allows the user to apply
actions on it. In addition, it supports the skip_sw and skip_hw (as can
be found on u32 and flower filter) that direct the kernel to skip the
software/hardware processing of the actions.

This filter is very useful in usecases where every packet should be
matched. For example, packet mirroring (SPAN) can be setup very easily
using that filter.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:37:01 -07:00
Richard Alpe ed81deabf2 tipc: add the ability to get UDP bearer options
In this patch we introduce the ability to get UDP specific bearer
options such as remoteip, remoteport, localip and localport.

After some discussions on tipc-discussion on how to handle media
specific options we agreed to pass them after the media.

For media generic bearer options we already do:
$ tipc bearer get OPTION media MEDIA name|device NAME|DEVICE

For the UDP media specific bearer options we introduce in this path:
$ tipc bearer get media udp name NAME OPTION
such as
$ tipc bearer get media udp name NAME remoteip

This allows bash-completion to tab complete only appropriate options,
it makes more logical sense and it scales better. Even though it might
look a little different to the user.

In order to use the existing option parsing framework to do this we
add a flag (OPT_KEY) to the option parsing function.

If the UDP bearer has multiple remoteip addresses associated with it
(replicast) we handle the TIPC_NLA_UDP_MULTI_REMOTEIP flag and send
a TIPC_NL_UDP_GET_REMOTEIP query transparently to the user.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
2016-09-01 08:34:35 -07:00
Richard Alpe f1f40cf77d tipc: introduce bearer add for remoteip
Introduce the ability to add remote IP addresses to an existing UDP
bearer. On the kernel side, adding a "remoteip" to an existing bearer
puts the bearer in "replicast" mode where TIPC multicast messages are
send out to each configured remoteip using unicast. This is required
for TIPC UDP bearers to work in environments where IP multicast is
disabled.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
2016-09-01 08:34:35 -07:00
Stephen Hemminger 3cc0b954b0 Merge branch 'master' into net-next 2016-08-29 11:19:03 -07:00
Stephen Hemminger 7c55d7700f devlink: whitespace cleanup
Break long lines
2016-08-29 11:17:38 -07:00
Or Gerlitz f57856fab2 devlink: Add e-switch support
Implement kernel devlink e-switch interface. Currently we allow
to get and set the device e-switch mode.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-08-29 11:15:54 -07:00
Stephen Hemminger 60ba41ad7f update TIPC headers 2016-08-29 11:06:02 -07:00
Nikolay Aleksandrov 7abf5de677 bridge: vlan: add support to display per-vlan statistics
This patch adds support for the stats argument to the bridge
vlan command which will display the per-vlan statistics and the device
each vlan belongs to with its flags. The supported command filtering
options are dev and vid. Also the man page is updated to explain the new
option.
The patch uses the new RTM_GETSTATS interface with a filter_mask to dump
all bridges and ports vlans. Later we can add support for using the
per-device dump and filter it in the kernel instead.

Example:
$ bridge -s vlan show
port             vlan id
br0               1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
eth1              1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  100
                    RX: 0 bytes 0 packets
                    TX: 0 bytes 0 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
                  102
                    RX: 16897 bytes 93 packets
                    TX: 0 bytes 0 packets

The format is the same as bridge vlan show but with stats, even though
under the hood the calls done to the kernel are different.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-08-29 10:58:40 -07:00
Stephen Hemminger f7708201f8 Merge branch 'master' into net-next 2016-08-29 10:57:02 -07:00
Roman Mashak 27d2b08e23 police: bug fix man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Roman Mashak 3de88c4b47 police: improve usage message
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Roman Mashak cef49e514a police: add extra space to improve police result printing
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Phil Sutter 7cc7cb8a88 ip-route: Prevent some double spaces in output
The code is a bit messy, as it starts with space after text and at some
point switches to space before text. But either way, printing space
before *and* after text almost certainly leads to printing more
whitespace than necessary.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-29 10:35:33 -07:00
Richard Alpe 535194a172 tipc: add peer remove functionality
This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

This functionality was first merged in:
f9dec657e4 (Richard Alpe tipc: add peer remove functionality)

And later backed out (as the kernel counterpart was held up) in:
385caeb13b (Stephen Hemminger Revert "tipc: add peer remove functionality")

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
2016-08-29 10:33:24 -07:00
Stephen Hemminger 380656f8c4 update headers to 4.8-rc2 net-next 2016-08-25 08:49:07 -07:00
Stephen Hemminger 9f9e2bb88e update BPF headers 2016-08-25 08:46:25 -07:00
Jamal Hadi Salim 06be01f75d tc classifiers: Modernize tcindex classifier
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-22 10:08:00 -07:00
Eric Dumazet 1acd208c0b ip: report IFLA_GSO_MAX_SIZE and IFLA_GSO_MAX_SEGS
kernel support for these attributes was added in linux-4.6

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-08-22 10:03:57 -07:00
Gustavo Zacarias 6b376ebd6e ss: fix build with musl libc
UINT_MAX usage requires limits.h, so include it.

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
2016-08-22 10:02:12 -07:00
Xin Long c85703bb9f ip route: restore_handler should check tb[RTA_PREFSRC] for local networks
Prior to this patch, If one route entry's RTA_PREFSRC and RTA_GATEWAY
both were NULL, it was supposed to be restored ONLY as a local address.

But as it didn't check tb[RTA_PREFSRC] when restoring local networks,
rtattr_cmp would return a success if it was NULL, this route entry would
be restored again as a local network.

This patch is to add tb[RTA_PREFSRC] check when restoring local networks.

Fixes: 74af8dd962 ("ip route: restore route entries in correct order")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Phil Sutter <phil@nwl.cc>
2016-08-18 14:54:08 -07:00
Sabrina Dubroca 9423a324bf ila: show usage even if the module is not available
Currently, the `ip ila` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: ec71cae0bb ("ila: Support for configuring ila to use netfilter hook")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 14:00:28 -07:00
Sabrina Dubroca d240a0e174 fou: show usage even if the module is not available
Currently, the `ip fou` command tries to initialize a genl context even
when we just want to see the help for the command, which doesn't require
to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: 6928747b6e ("ip fou: Support to configure foo-over-udp RX")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 14:00:22 -07:00
Sabrina Dubroca 688f9aa4f2 macsec: show usage even if the module is not available
Currently, the `ip macsec` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: b26fc590ce ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 13:59:52 -07:00
Sabrina Dubroca 2b68cb77cd libgenl: introduce genl_init_handle
All users of genl have the same code to open a genl socket and resolve
the family for their specific protocol.  Introduce a helper to initialize
the handle, and use it in all the genl code.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 13:59:21 -07:00
Phil Sutter 08c0466b11 ip-link: add missing {min,max}_tx_rate to help text
These vf options are described in man page already, they're just missing
in help output.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-17 13:54:31 -07:00
Richard Alpe 50afc4dbf8 tipc: refactor bearer identification
Introduce a generic function (nl_add_bearer_name()) that identifies a
bearer and adds it to an existing netlink message. This reduces code
complexity and makes the code a little bit easier to maintain.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
2016-08-17 13:52:07 -07:00
Richard Alpe ff77557957 tipc: fix UDP bearer synopsis
Local ip is not required to identify a UDP bearer and shouldn't be
passed to bearer disable, set or get. In this patch we remove the
localip entry from the synopsis of these functions.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
2016-08-17 13:52:07 -07:00
Tom Herbert 2d01b393f4 ipila: Fixed unitialized variables
Initialize locator and locator_match to zero and only do
addattr if they have been set.

Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-17 13:51:18 -07:00
Phil Sutter 7e33b09331 man: ip-link.8: Document missing geneve options
This adds missing documentation of geneve type options:

- dstport
- external
- udpcsum
- udp6zerocsumtx
- udp6zerocsumrx

The bits for the last three was just copy and pasted from vxlan section.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-12 12:58:53 -07:00
Tom Herbert 8bd31d8db5 fou: Allowing configuring IPv6 listener
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert 0b2fbb7358 gre6: Support for fou encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert 73516e128a ip6tnl: Support for fou encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert ec71cae0bb ila: Support for configuring ila to use netfilter hook
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:50:15 -07:00
Tom Herbert ed67f83806 ila: Support for checksum neutral translation
Add configuration of ila LWT tunnels for checksum mode including
checksum neutral translation.

Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:49:30 -07:00
WANG Cong 6fcf36c9c6 tc: fix a misleading failure
Before this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".

After this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel

Cc: Stephen Hemminger <shemming@brocade.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2016-08-09 11:18:14 -07:00
Stephen Hemminger 592990ebf4 Merge branch 'net-next' 2016-08-09 11:14:47 -07:00
Roopa Prabhu e40d6b2b90 bridge: print_vlan: add missing check for json instance
Also initialize vlan_flags

Fixes: d82a49ce85 ("bridge: add json support for bridge vlan show")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-08-08 10:51:12 -07:00
Stephen Hemminger 80c4bc6a3e Merge branch 'master' into net-next 2016-08-08 09:27:28 -07:00
Stephen Hemminger 1b2594935e Merge branch 'master' into net-next 2016-08-08 08:57:22 -07:00
Stephen Hemminger dc00db9e84 update kernel headers 2016-08-08 08:51:22 -07:00
Roopa Prabhu 39c64df1c7 bridge: print_vlan: add missing check for json instance
Also initialize vlan_flags

Fixes: d82a49ce85 ("bridge: add json support for bridge vlan show")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-08-08 08:44:16 -07:00
Stephen Hemminger 6d54c41580 Merge branch 'master' into net-next 2016-08-08 08:44:07 -07:00
Stephen Hemminger 79f5bf17a5 Merge branch 'master' into net-next 2016-07-25 08:21:00 -07:00
Stephen Hemminger ba91cd9d86 include: update net-next XDP headers 2016-07-20 12:24:59 -07:00
Stephen Hemminger ac75d5cd36 Merge branch 'master' into net-next 2016-07-20 12:21:42 -07:00
Stephen Hemminger a951428058 update headers files to current net-next 2016-07-15 11:55:14 -07:00
Stephen Hemminger ba5783cbf3 Merge branch 'master' into net-next 2016-07-15 11:49:41 -07:00
Stephen Hemminger d5b62e6439 Merge branch 'master' into net-next 2016-07-06 21:29:32 -07:00
Stephen Hemminger 2a5855706a Merge branch 'master' into net-next 2016-07-06 21:23:26 -07:00
Jamal Hadi Salim 1d1e0fd29b actions: skbedit add support for mod-ing skb pkt_type
I'll make a formal submission sans the header when the kernel patches
makes it in. This version is for someone who wants to play around with
the net-next kernel patches i sent

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-07-06 21:15:44 -07:00
Stephen Hemminger 4824bb4151 update kernel header (4.7 net-next) 2016-07-06 21:14:57 -07:00
Stephen Hemminger f62f952fad Merge branch 'master' into net-next 2016-06-30 17:31:37 -07:00
Stephen Hemminger 131351086e Merge branch 'master' into net-next 2016-06-27 11:30:06 -07:00
Stephen Hemminger 3b6d9ab2e2 update kernel headers (net-next) 2016-06-21 11:29:20 -07:00
Stephen Hemminger 7d057fc292 Merge branch 'master' into net-next 2016-06-21 11:28:32 -07:00
Stephen Hemminger 76d0aa1f3b Merge branch 'master' into net-next 2016-06-16 09:43:07 -07:00
Stephen Hemminger 74951b2d07 Merge branch 'master' into net-next 2016-06-14 18:05:49 -07:00
Stephen Hemminger d831cc7c00 iprule: whitespace cleanup
Cleanup long lines, and indentation issues.
Use rta_getattru32 rather than cast to unsigned.
2016-06-14 17:20:02 -07:00
David Ahern 8c92e12277 ip rule: Add support for l3mdev rules
Kernel commit 96c63fa7393d ("net: Add l3mdev rule") added support for
the FRA_L3MDEV attribute. The attribute enables use of l3mdev rules
which mean 'get table id from l3 master device'. This patch adds
support to iproute2 to show, add and delete rules with this attribute.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-14 16:53:20 -07:00
Stephen Hemminger ec09118749 fib_rules.h update header file
Add new L3MDEV (clone from net-next)
2016-06-14 16:33:48 -07:00
Stephen Hemminger 5c05c1d4a2 Merge branch 'master' into net-next 2016-06-14 16:33:24 -07:00
Simon Horman 6f1aded9d0 iproute2: correct port in FOU/GRE example
This resolves what appears to be a typo.

Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-06-08 09:40:07 -07:00
Stephen Hemminger c68780826d minor header update from net-next 2016-06-08 09:39:03 -07:00
703 changed files with 100375 additions and 31897 deletions

130
.clang-format Normal file
View File

@ -0,0 +1,130 @@
# SPDX-License-Identifier: GPL-2.0
#
# clang-format configuration file. Intended for clang-format >= 4.
#
# For more information, see:
#
# Documentation/process/clang-format.rst
# https://clang.llvm.org/docs/ClangFormat.html
# https://clang.llvm.org/docs/ClangFormatStyleOptions.html
#
---
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
#AlignEscapedNewlines: Left # Unknown to clang-format-4.0
AlignOperands: true
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
#AfterExternBlock: false # Unknown to clang-format-5.0
BeforeCatch: false
BeforeElse: false
IndentBraces: false
#SplitEmptyFunction: true # Unknown to clang-format-4.0
#SplitEmptyRecord: true # Unknown to clang-format-4.0
#SplitEmptyNamespace: true # Unknown to clang-format-4.0
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
#BreakBeforeInheritanceComma: false # Unknown to clang-format-4.0
BreakBeforeTernaryOperators: false
BreakConstructorInitializersBeforeComma: false
#BreakConstructorInitializers: BeforeComma # Unknown to clang-format-4.0
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: false
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
#CompactNamespaces: false # Unknown to clang-format-4.0
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 8
ContinuationIndentWidth: 8
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
#FixNamespaceComments: false # Unknown to clang-format-4.0
# Taken from:
# git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ \
# | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \
# | sort | uniq
ForEachMacros:
- 'list_for_each_entry'
- 'list_for_each_entry_safe'
- 'mnl_attr_for_each_nested'
- 'hlist_for_each'
- 'hlist_for_each_safe'
- 'hlist_for_each_entry'
#IncludeBlocks: Preserve # Unknown to clang-format-5.0
IncludeCategories:
- Regex: '.*'
Priority: 1
IncludeIsMainRegex: '(Test)?$'
IndentCaseLabels: false
#IndentPPDirectives: None # Unknown to clang-format-5.0
IndentWidth: 8
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: Inner
#ObjCBinPackProtocolList: Auto # Unknown to clang-format-5.0
ObjCBlockIndentWidth: 8
ObjCSpaceAfterProperty: true
ObjCSpaceBeforeProtocolList: true
# Taken from git's rules
#PenaltyBreakAssignment: 10 # Unknown to clang-format-4.0
PenaltyBreakBeforeFirstCallParameter: 30
PenaltyBreakComment: 10
PenaltyBreakFirstLessLess: 0
PenaltyBreakString: 10
PenaltyExcessCharacter: 100
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
ReflowComments: false
SortIncludes: false
#SortUsingDeclarations: false # Unknown to clang-format-4.0
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
#SpaceBeforeCtorInitializerColon: true # Unknown to clang-format-5.0
#SpaceBeforeInheritanceColon: true # Unknown to clang-format-5.0
SpaceBeforeParens: ControlStatements
#SpaceBeforeRangeBasedForLoopColon: true # Unknown to clang-format-5.0
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInContainerLiterals: false
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp03
TabWidth: 8
UseTab: Always
...

16
.gitignore vendored
View File

@ -1,6 +1,7 @@
# locally generated
Config
static-syms.h
config.*
Config
*.o
*.a
*.so
@ -10,6 +11,7 @@ Config
# cscope
cscope.*
ncscope.*
tags
TAGS
# git files that we don't want to ignore even it they are dot-files
@ -35,13 +37,5 @@ series
# tests
testsuite/results
testsuite/iproute2/iproute2-this
# doc files generated at runtime
doc/*.aux
doc/*.log
doc/*.toc
doc/*.ps
doc/*.dvi
doc/*.html
doc/*.pdf
doc/*.out
testsuite/tools/generate_nlmsg
testsuite/tests/ip/link/dev_wo_vf_rate.nl

22
.mailmap Normal file
View File

@ -0,0 +1,22 @@
#
# This list is used by git-shortlog to fix a few botched name translations
# in the git archive, either because the author's full name was messed up
# and/or not always written the same way, making contributions from the
# same person appearing not to be so or badly displayed.
#
# Format
# Full name <goodaddress> <badaddress>
Steve Wise <larrystevenwise@gmail.com> <swise@opengridcomputing.com>
Steve Wise <larrystevenwise@gmail.com> <swise@chelsio.com>
Stephen Hemminger <stephen@networkplumber.org> <sthemmin@microsoft.com>
Stephen Hemminger <stephen@networkplumber.org> <shemming@brocade.com>
Stephen Hemminger <stephen@networkplumber.org> <stephen.hemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@linux-foundation.org>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@osdl.org>
Stephen Hemminger <stephen@networkplumber.org> <osdl.org!shemminger>
Stephen Hemminger <stephen@networkplumber.org> <osdl.net!shemminger>
David Ahern <dsahern@gmail.com> <dsa@cumulusnetworks.com>

101
Makefile
View File

@ -1,12 +1,26 @@
# SPDX-License-Identifier: GPL-2.0
# Top level Makefile for iproute2
-include config.mk
ifeq ("$(origin V)", "command line")
VERBOSE = $(V)
endif
ifndef VERBOSE
VERBOSE = 0
endif
ifeq ($(VERBOSE),0)
MAKEFLAGS += --no-print-directory
endif
PREFIX?=/usr
LIBDIR?=$(PREFIX)/lib
SBINDIR?=/sbin
CONFDIR?=/etc/iproute2
NETNS_RUN_DIR?=/var/run/netns
NETNS_ETC_DIR?=/etc/netns
DATADIR?=$(PREFIX)/share
HDRDIR?=$(PREFIX)/include/iproute2
DOCDIR?=$(DATADIR)/doc/iproute2
MANDIR?=$(DATADIR)/man
ARPDDIR?=/var/lib/arpd
@ -23,72 +37,101 @@ ifneq ($(SHARED_LIBS),y)
DEFINES+= -DNO_SHARED_LIBS
endif
DEFINES+=-DCONFDIR=\"$(CONFDIR)\"
DEFINES+=-DCONFDIR=\"$(CONFDIR)\" \
-DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \
-DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\"
#options for decnet
ADDLIB+=dnet_ntop.o dnet_pton.o
#options for AX.25
ADDLIB+=ax25_ntop.o
#options for ipx
ADDLIB+=ipx_ntop.o ipx_pton.o
#options for AX.25
ADDLIB+=rose_ntop.o
#options for mpls
ADDLIB+=mpls_ntop.o mpls_pton.o
#options for NETROM
ADDLIB+=netrom_ntop.o
CC := gcc
HOSTCC ?= $(CC)
DEFINES += -D_GNU_SOURCE
# Turn on transparent support for LFS
DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
CCOPTS = -O2
CCOPTS = -O2 -pipe
WFLAGS := -Wall -Wstrict-prototypes -Wmissing-prototypes
WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
CFLAGS := $(WFLAGS) $(CCOPTS) -I../include $(DEFINES) $(CFLAGS)
CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS)
YACCFLAGS = -d -t -v
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink man
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma dcb man vdpa
LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
LDLIBS += $(LIBNETLINK)
all: Config
all: config.mk
@set -e; \
for i in $(SUBDIRS); \
do echo; echo $$i; $(MAKE) $(MFLAGS) -C $$i; done
do echo; echo $$i; $(MAKE) -C $$i; done
Config:
sh configure $(KERNEL_INCLUDE)
.PHONY: clean clobber distclean check cscope version
help:
@echo "Make Targets:"
@echo " all - build binaries"
@echo " clean - remove products of build"
@echo " distclean - remove configuration and build"
@echo " install - install binaries on local machine"
@echo " check - run tests"
@echo " cscope - build cscope database"
@echo " version - update version"
@echo ""
@echo "Make Arguments:"
@echo " V=[0|1] - set build verbosity level"
config.mk:
@if [ ! -f config.mk -o configure -nt config.mk ]; then \
sh configure $(KERNEL_INCLUDE); \
fi
install: all
install -m 0755 -d $(DESTDIR)$(SBINDIR)
install -m 0755 -d $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(ARPDDIR)
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples/diffserv
install -m 0644 README.iproute2+tc $(shell find examples -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples
install -m 0644 $(shell find examples/diffserv -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples/diffserv
@for i in $(SUBDIRS) doc; do $(MAKE) -C $$i install; done
install -m 0755 -d $(DESTDIR)$(HDRDIR)
@for i in $(SUBDIRS); do $(MAKE) -C $$i install; done
install -m 0644 $(shell find etc/iproute2 -maxdepth 1 -type f) $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/tc $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/devlink $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR)
snapshot:
echo "static const char SNAPSHOT[] = \""`date +%y%m%d`"\";" \
> include/SNAPSHOT.h
version:
echo "static const char version[] = \""`git describe --tags --long`"\";" \
> include/version.h
clean:
@for i in $(SUBDIRS) doc; \
do $(MAKE) $(MFLAGS) -C $$i clean; done
@for i in $(SUBDIRS) testsuite; \
do $(MAKE) -C $$i clean; done
clobber:
touch Config
$(MAKE) $(MFLAGS) clean
rm -f Config cscope.*
touch config.mk
$(MAKE) clean
rm -f config.mk cscope.*
distclean: clobber
check: all
$(MAKE) -C testsuite
$(MAKE) -C testsuite alltests
@if command -v man >/dev/null 2>&1; then \
echo "Checking manpages for syntax errors..."; \
$(MAKE) -C man check; \
else \
echo "man not installed, skipping checks for syntax errors."; \
fi
cscope:
cscope -b -q -R -Iinclude -sip -slib -smisc -snetem -stc

33
README
View File

@ -1,40 +1,39 @@
This is a set of utilities for Linux networking.
Information:
http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2
https://wiki.linuxfoundation.org/networking/iproute2
Download:
http://www.kernel.org/pub/linux/utils/net/iproute2/
Repository:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
Stable version repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
Development repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
How to compile this.
--------------------
1. libdbm
arpd needs to have the db4 development libraries. For Debian
users this is the package with a name like libdb4.x-dev.
arpd needs to have the berkeleydb development libraries. For Debian
users this is the package with a name like libdbX.X-dev.
DBM_INCLUDE points to the directory with db_185.h which
is the include file used by arpd to get to the old format Berkeley
database routines. Often this is in the db-devel package.
2. make
The makefile will automatically build a Config file which
contains whether or not ATM is available, etc.
The makefile will automatically build a config.mk file which
contains definitions of libraries that may or may not be available
on the system such as: ATM, ELF, MNL, and SELINUX.
3. To make documentation, cd to doc/ directory , then
look at start of Makefile and set correct values for
PAGESIZE=a4 , ie: a4 , letter ... (string)
PAGESPERPAGE=2 , ie: 1 , 2 ... (numeric)
and make there. It assumes, that latex, dvips and psnup
are in your path.
3. include/uapi
4. This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different
kernel include files.
This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different
kernel include files.
Stephen Hemminger
stephen@networkplumber.org

View File

@ -1,33 +0,0 @@
Here are a few quick points about DECnet support...
o iproute2 is the tool of choice for configuring the DECnet support for
Linux. For many features, it is the only tool which can be used to
configure them.
o No name resolution is available as yet, all addresses must be
entered numerically.
o Remember to set the hardware address of the interface using:
ip link set ethX address xx:xx:xx:xx:xx:xx
(where xx:xx:xx:xx:xx:xx is the MAC address for your DECnet node
address)
if your Ethernet card won't listen to more than one unicast
mac address at once. If the Linux DECnet stack doesn't talk to
any other DECnet nodes, then check this with tcpdump and if its
a problem, change the mac address (but do this _before_ starting
any other network protocol on the interface)
o Whilst you can use ip addr add to add more than one DECnet address to an
interface, don't expect addresses which are not the same as the
kernels node address to work properly with 2.4 kernels. This should
be fine with 2.6 kernels as the routing code has been extensively
modified and improved.
o The DECnet support is currently self contained. It does not depend on
the libdnet library.
Steve Whitehouse <steve@chygwyn.com>

View File

@ -4,12 +4,15 @@ development. Most new features require a kernel and a utility component.
Please submit both to the Linux networking mailing list
<netdev@vger.kernel.org>
The current source is in the git repository:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
The current source for the stable version is in the git repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
The master branch contains the source corresponding to the current
code in the mainline Linux kernel (ie follows Linus). The net-next
branch is a temporary branch that tracks the code intended for the
next release; it corresponds with networking development branch in
the kernel.
The development git repository is available at the following address:
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
The stable repository contains the source corresponding to the
current code in the Linux networking tree (net), which in turn is
aligned on the mainline Linux kernel (ie follows Linus).
The iproute2-next repository tracks the code intended for the next
release; it corresponds with networking development tree (net-next)
in the kernel.

View File

@ -1,95 +0,0 @@
I. About the distribution tables
The table used for "synthesizing" the distribution is essentially a scaled,
translated, inverse to the cumulative distribution function.
Here's how to think about it: Let F() be the cumulative distribution
function for a probability distribution X. We'll assume we've scaled
things so that X has mean 0 and standard deviation 1, though that's not
so important here. Then:
F(x) = P(X <= x) = \int_{-inf}^x f
where f is the probability density function.
F is monotonically increasing, so has an inverse function G, with range
0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may
have singularities if X has point masses, i.e., points x such that
P(X = x) > 0.)
Now we create a tabular representation of G as follows: Choose some table
size N, and for the ith entry, put in G(i/N). Let's call this table T.
The claim now is, I can create a (discrete) random variable Y whose
distribution has the same approximate "shape" as X, simply by letting
Y = T(U), where U is a discrete uniform random variable with range 1 to N.
To see this, it's enough to show that Y's cumulative distribution function,
(let's call it H), is a discrete approximation to F. But
H(x) = P(Y <= x)
= (# of entries in T <= x) / N -- as Y chosen uniformly from T
= i/N, where i is the largest integer such that G(i/N) <= x
= i/N, where i is the largest integer such that i/N <= F(x)
-- since G and F are inverse functions (and F is
increasing)
= floor(N*F(x))/N
as desired.
II. How to create distribution tables (in theory)
How can we create this table in practice? In some cases, F may have a
simple expression which allows evaluating its inverse directly. The
Pareto distribution is one example of this. In other cases, and
especially for matching an experimentally observed distribution, it's
easiest simply to create a table for F and "invert" it. Here, we give
a concrete example, namely how the new "experimental" distribution was
created.
1. Collect enough data points to characterize the distribution. Here, I
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
That's far more data than is really necessary, but it was fairly painless to
collect it, so...
2. Normalize the data so that it has mean 0 and standard deviation 1.
3. Determine the cumulative distribution. The code I wrote creates a table
covering the range -10 to +10, with granularity .00005. Obviously, this
is absurdly over-precise, but since it's a one-time only computation, I
figured it hardly mattered.
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
(here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table
for the ("normalized") inverse of size TABLESIZE, covering its domain 0
to 1 with granularity 1/TABLESIZE. Note that even with the granularity
used in creating the table for F, it's possible not all the entries in
the table for G will be filled in. So, make a pass through the
inverse's table, filling in any missing entries by linear interpolation.
III. How to create distribution tables (in practice)
If you want to do all this yourself, I've provided several tools to help:
1. maketable does the steps 2-4 above, and then generates the appropriate
header file. So if you have your own time distribution, you can generate
the header simply by:
maketable < time.values > header.h
2. As explained in the other README file, the somewhat sleazy way I have
of generating correlated values needs correction. You can generate your
own correction tables by compiling makesigtable and makemutable with
your header file. Check the Makefile to see how this is done.
3. Warning: maketable, makesigtable and especially makemutable do
enormous amounts of floating point arithmetic. Don't try running
these on an old 486. (NIST Net itself will run fine on such a
system, since in operation, it just needs to do a few simple integral
calculations. But getting there takes some work.)
4. The tables produced are all normalized for mean 0 and standard
deviation 1. How do you know what values to use for real? Here, I've
provided a simple "stats" utility. Give it a series of floating point
values, and it will return their mean (mu), standard deviation (sigma),
and correlation coefficient (rho). You can then plug these values
directly into NIST Net.

View File

@ -1,123 +0,0 @@
iproute2+tc*
It's the first release of Linux traffic control engine.
NOTES.
* csz scheduler is inoperational at the moment, and probably
never will be repaired but replaced with h-pfq scheduler.
* To use "fw" classifier you will need ipfwchains patch.
* No manual available. Ask me, if you have problems (only try to guess
answer yourself at first 8)).
Micro-manual how to start it the first time
-------------------------------------------
A. Attach CBQ to eth1:
tc qdisc add dev eth1 root handle 1: cbq bandwidth 10Mbit allot 1514 cell 8 \
avpkt 1000 mpu 64
B. Add root class:
tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 10Mbit rate 10Mbit \
allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000
C. Add default interactive class:
tc class add dev eth1 parent 1:1 classid 1:2 cbq bandwidth 10Mbit rate 1Mbit \
allot 1514 cell 8 weight 100Kbit prio 3 maxburst 20 avpkt 1000 split 1:0 \
defmap c0
D. Add default class:
tc class add dev eth1 parent 1:1 classid 1:3 cbq bandwidth 10Mbit rate 8Mbit \
allot 1514 cell 8 weight 800Kbit prio 7 maxburst 20 avpkt 1000 split 1:0 \
defmap 3f
etc. etc. etc. Well, it is enough to start 8) The rest can be guessed 8)
Look also at more elaborated example, ready to start rsvpd,
in rsvp/cbqinit.eth1.
Terminology and advices about setting CBQ parameters may be found in Sally Floyd
papers.
Pairs X:Y are class handles, X:0 are qdisc handles.
weight should be proportional to rate for leaf classes
(I choosed it ten times less, but it is not necessary)
defmap is bitmap of logical priorities served by this class.
E. Another qdiscs are simpler. F.e. let's join TBF on class 1:2
tc qdisc add dev eth1 parent 1:2 tbf rate 64Kbit buffer 5Kb/8 limit 10Kb
F. Look at all that we created:
tc qdisc ls dev eth1
tc class ls dev eth1
G. Install "route" classifier on root of cbq and map destination from realm
1 to class 1:2
tc filter add dev eth1 parent 1:0 protocol ip prio 100 route to 1 classid 1:2
H. Assign routes to 10.11.12.0/24 to realm 1
ip route add 10.11.12.0/24 dev eth1 via whatever realm 1
etc. The same thing can be made with rules.
I still did not test ipchains, but they should work too.
Setup and code example of BPF classifier and action can be found under
examples/bpf/, which should explain everything for getting started.
Setup of rsvp and u32 classifiers is more hairy.
If you read RSVP specs, you will understand how rsvp classifier
works easily. What's about u32... That's example:
#! /bin/sh
TC=/home/root/tc
# Setup classifier root on eth1 root (it is cbq)
$TC filter add dev eth1 parent 1:0 prio 5 protocol ip u32
# Create hash table of 256 slots with ID 1:
$TC filter add dev eth1 parent 1:0 prio 5 handle 1: u32 divisor 256
# Add to 6th slot of hash table rule to select tcp/telnet to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (32kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:6: \
match ip dst 193.233.7.75 \
match tcp dst 0x17 0xffff \
flowid 1:4 \
police rate 32kbit buffer 5kb/8 mpu 64 mtu 1514 index 1
# Add to 1th slot of hash table rule to select icmp to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (10kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:: \
sample ip protocol 1 0xff \
match ip dst 193.233.7.75 \
flowid 1:4 \
police rate 10kbit buffer 5kb/8 mpu 64 mtu 1514 index 2
# Lookup hash table, if it is not fragmented frame
# Use protocol as hash key
$TC filter add dev eth1 parent 1:0 prio 5 handle ::1 u32 ht 800:: \
match ip nofrag \
offset mask 0x0F00 shift 6 \
hashkey mask 0x00ff0000 at 8 \
link 1:
Alexey Kuznetsov
kuznet@ms2.inr.ac.ru

View File

@ -1,81 +0,0 @@
lnstat - linux networking statistics
(C) 2004 Harald Welte <laforge@gnumonks.org
======================================================================
This tool is a generalized and more feature-complete replacement for the old
'rtstat' program.
In addition to routing cache statistics, it supports any kind of statistics
the linux kernel exports via a file in /proc/net/stat. In a stock 2.6.9
kernel, this is
per-protocol neighbour cache statistics
(ipv4, ipv6, atm, decnet)
routing cache statistics
(ipv4)
connection tracking statistics
(ipv4)
Please note that lnstat will adopt to any additional statistics that might be
added to the kernel at some later point
I personally always like examples more than any reference documentation, so I
list the following examples. If somebody wants to do a manpage, feel free
to send me a patch :)
EXAMPLES:
In order to get a list of supported statistics files, you can run
lnstat -d
It will display something like
/proc/net/stat/arp_cache:
1: entries
2: allocs
3: destroys
[...]
/proc/net/stat/rt_cache:
1: entries
2: in_hit
3: in_slow_tot
You can now select the files/keys you are interested by something like
lnstat -k arp_cache:entries,rt_cache:in_hit,arp_cache:destroys
arp_cach|rt_cache|arp_cach|
entries| in_hit|destroys|
6| 6| 0|
6| 0| 0|
6| 2| 0|
You can specify the interval (e.g. 10 seconds) by:
lnstat -i 10
You can specify to only use one particular statistics file:
lnstat -f ip_conntrack
You can specify individual field widths
lnstat -k arp_cache:entries,rt_cache:entries -w 20,8
You can specify not to print a header at all
lnstat -s 0
You can specify to print a header only at start of the program
lnstat -s 1
You can specify to print a header at start and every 20 lines:
lnstat -s 20
You can specify the number of samples you want to take (e.g. 5):
lnstat -c 5

1006
bash-completion/devlink Normal file

File diff suppressed because it is too large Load Diff

View File

@ -2,6 +2,12 @@
# Copyright 2016 6WIND S.A.
# Copyright 2016 Quentin Monnet <quentin.monnet@6wind.com>
QDISC_KIND=' choke codel bfifo pfifo pfifo_head_drop fq fq_codel gred hhf \
mqprio multiq netem pfifo_fast pie fq_pie red rr sfb sfq tbf atm \
cbq drr dsmark hfsc htb prio qfq '
FILTER_KIND=' basic bpf cgroup flow flower fw route rsvp tcindex u32 matchall '
ACTION_KIND=' gact mirred bpf sample '
# Takes a list of words in argument; each one of them is added to COMPREPLY if
# it is not already present on the command line. Returns no value.
_tc_once_attr()
@ -20,6 +26,26 @@ _tc_once_attr()
done
}
# Takes a list of words in argument; each one of them is added to COMPREPLY if
# it is not already present on the command line from the provided index. Returns
# no value.
_tc_once_attr_from()
{
local w subcword found from=$1
shift
for w in $*; do
found=0
for (( subcword=$from; subcword < ${#words[@]}-1; subcword++ )); do
if [[ $w == ${words[subcword]} ]]; then
found=1
break
fi
done
[[ $found -eq 0 ]] && \
COMPREPLY+=( $( compgen -W "$w" -- "$cur" ) )
done
}
# Takes a list of words in argument; adds them all to COMPREPLY if none of them
# is already present on the command line. Returns no value.
_tc_one_of_list()
@ -33,6 +59,21 @@ _tc_one_of_list()
COMPREPLY+=( $( compgen -W "$*" -- "$cur" ) )
}
# Takes a list of words in argument; adds them all to COMPREPLY if none of them
# is already present on the command line from the provided index. Returns no
# value.
_tc_one_of_list_from()
{
local w subcword from=$1
shift
for w in $*; do
for (( subcword=$from; subcword < ${#words[@]}-1; subcword++ )); do
[[ $w == ${words[subcword]} ]] && return 1
done
done
COMPREPLY+=( $( compgen -W "$*" -- "$cur" ) )
}
# Returns "$cur ${cur}arg1 ${cur}arg2 ..."
_tc_expand_units()
{
@ -261,7 +302,7 @@ _tc_qdisc_options()
;;
gred)
_tc_once_attr 'setup vqs default grio vq prio limit min max avpkt \
burst probability bandwidth'
burst probability bandwidth ecn harddrop'
return 0
;;
hhf)
@ -282,6 +323,15 @@ _tc_qdisc_options()
_tc_once_attr 'limit target tupdate alpha beta'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
fq_pie)
_tc_once_attr 'limit flows target tupdate \
alpha beta quantum memory_limit ecn_prob'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
red)
@ -345,11 +395,44 @@ _tc_bpf_options()
return 0
}
# Complete with options names for filter actions.
# This function is recursive, thus allowing multiple actions statement to be
# parsed.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_filter_action_options()
{
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++));
do
if [[ action == ${words[acwd]} ]]; then
_tc_filter_action_options $((acwd+1)) && return 0
fi
done
local action acwd
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++)); do
if [[ $ACTION_KIND =~ ' '${words[acwd]}' ' ]]; then
_tc_one_of_list_from $acwd action
_tc_action_options $acwd && return 0
fi
done
_tc_one_of_list_from $acwd $ACTION_KIND
return 0
}
# Complete with options names for filters.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_filter_options()
{
case $1 in
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++));
do
if [[ action == ${words[acwd]} ]]; then
_tc_filter_action_options $((acwd+1)) && return 0
fi
done
filter=${words[$1]}
case $filter in
basic)
_tc_once_attr 'match action classid'
return 0
@ -375,6 +458,10 @@ _tc_filter_options()
_tc_once_attr 'map hash divisor baseclass match action'
return 0
;;
matchall)
_tc_once_attr 'action classid skip_sw skip_hw'
return 0
;;
flower)
_tc_once_attr 'action classid indev dst_mac src_mac eth_type \
ip_proto dst_ip src_ip dst_port src_port'
@ -419,20 +506,28 @@ _tc_filter_options()
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_action_options()
{
case $1 in
local from=$1
local action=${words[from]}
case $action in
bpf)
_tc_bpf_options
return 0
;;
mirred)
_tc_one_of_list 'ingress egress'
_tc_one_of_list 'mirror redirect'
_tc_once_attr 'index dev'
_tc_one_of_list_from $from 'ingress egress'
_tc_one_of_list_from $from 'mirror redirect'
_tc_once_attr_from $from 'index dev'
return 0
;;
sample)
_tc_once_attr_from $from 'rate'
_tc_once_attr_from $from 'trunc'
_tc_once_attr_from $from 'group'
return 0
;;
gact)
_tc_one_of_list 'reclassify drop continue pass'
_tc_once_attr 'random'
_tc_one_of_list_from $from 'reclassify drop continue pass'
_tc_once_attr_from $from 'random'
return 0
;;
esac
@ -562,10 +657,7 @@ _tc()
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local qdisc qdwd QDISC_KIND=' choke codel bfifo pfifo \
pfifo_head_drop fq fq_codel gred hhf mqprio multiq \
netem pfifo_fast pie red rr sfb sfq tbf atm cbq drr \
dsmark hfsc htb prio qfq '
local qdisc qdwd
for ((qdwd=$subcword; qdwd < ${#words[@]}-1; qdwd++)); do
if [[ $QDISC_KIND =~ ' '${words[qdwd]}' ' ]]; then
qdisc=${words[qdwd]}
@ -600,10 +692,7 @@ _tc()
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local qdisc qdwd QDISC_KIND=' choke codel bfifo pfifo \
pfifo_head_drop fq fq_codel gred hhf mqprio multiq \
netem pfifo_fast pie red rr sfb sfq tbf atm cbq drr \
dsmark hfsc htb prio qfq '
local qdisc qdwd
for ((qdwd=$subcword; qdwd < ${#words[@]}-1; qdwd++)); do
if [[ $QDISC_KIND =~ ' '${words[qdwd]}' ' ]]; then
qdisc=${words[qdwd]}
@ -638,13 +727,11 @@ _tc()
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local filter fltwd FILTER_KIND=' basic bpf cgroup flow \
flower fw route rsvp tcindex u32 '
local filter fltwd
for ((fltwd=$subcword; fltwd < ${#words[@]}-1; fltwd++));
do
if [[ $FILTER_KIND =~ ' '${words[fltwd]}' ' ]]; then
filter=${words[fltwd]}
_tc_filter_options $filter && return 0
_tc_filter_options $fltwd && return 0
fi
done
_tc_one_of_list $FILTER_KIND
@ -671,11 +758,10 @@ _tc()
action)
case $subcmd in
add|change|replace)
local action acwd ACTION_KIND=' gact mirred bpf '
local action acwd
for ((acwd=$subcword; acwd < ${#words[@]}-1; acwd++)); do
if [[ $ACTION_KIND =~ ' '${words[acwd]}' ' ]]; then
action=${words[acwd]}
_tc_action_options $action && return 0
_tc_action_options $acwd && return 0
fi
done
_tc_one_of_list $ACTION_KIND

View File

@ -1,14 +1,11 @@
# SPDX-License-Identifier: GPL-2.0
BROBJ = bridge.o fdb.o monitor.o link.o mdb.o vlan.o
include ../Config
ifeq ($(IP_CONFIG_SETNS),y)
CFLAGS += -DHAVE_SETNS
endif
include ../config.mk
all: bridge
bridge: $(BROBJ) $(LIBNETLINK)
bridge: $(BROBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
@ -16,4 +13,3 @@ install: all
clean:
rm -f $(BROBJ) bridge

View File

@ -1,27 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#define MDB_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(struct br_mdb_entry))))
#define MDB_RTR_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32))))
extern int print_linkinfo(const struct sockaddr_nl *who,
struct nlmsghdr *n,
void *arg);
extern int print_fdb(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg);
extern int print_mdb(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg);
void print_vlan_info(struct rtattr *tb, int ifindex);
int print_linkinfo(struct nlmsghdr *n, void *arg);
int print_mdb_mon(struct nlmsghdr *n, void *arg);
int print_fdb(struct nlmsghdr *n, void *arg);
void print_stp_state(__u8 state);
int parse_stp_state(const char *arg);
int print_vlan_rtm(struct nlmsghdr *n, void *arg, bool monitor,
bool global_only);
void br_print_router_port_stats(struct rtattr *pattr);
extern int do_fdb(int argc, char **argv);
extern int do_mdb(int argc, char **argv);
extern int do_monitor(int argc, char **argv);
extern int do_vlan(int argc, char **argv);
extern int do_link(int argc, char **argv);
int do_fdb(int argc, char **argv);
int do_mdb(int argc, char **argv);
int do_monitor(int argc, char **argv);
int do_vlan(int argc, char **argv);
int do_link(int argc, char **argv);
extern int preferred_family;
extern int show_stats;
extern int show_details;
extern int timestamp;
extern int compress_vlans;
extern int json_output;
extern int json;
extern struct rtnl_handle rth;

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get/set/delete bridge with netlink
*
@ -11,23 +12,23 @@
#include <string.h>
#include <errno.h>
#include "SNAPSHOT.h"
#include "version.h"
#include "utils.h"
#include "br_common.h"
#include "namespace.h"
#include "color.h"
struct rtnl_handle rth = { .fd = -1 };
int preferred_family = AF_UNSPEC;
int resolve_hosts;
int oneline;
int show_stats;
int show_details;
static int color;
int compress_vlans;
int json_output;
int json;
int timestamp;
char *batch_file;
static const char *batch_file;
int force;
const char *_SL_;
static void usage(void) __attribute__((noreturn));
@ -36,10 +37,10 @@ static void usage(void)
fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
" bridge [ -force ] -batch filename\n"
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -j{son} }\n");
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -color -p[retty] -j[son] }\n");
exit(-1);
}
@ -76,45 +77,23 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1;
}
static int br_batch_cmd(int argc, char *argv[], void *data)
{
return do_cmd(argv[0], argc, argv);
}
static int batch(const char *name)
{
char *line = NULL;
size_t len = 0;
int ret = EXIT_SUCCESS;
if (name && strcmp(name, "-") != 0) {
if (freopen(name, "r", stdin) == NULL) {
fprintf(stderr,
"Cannot open file \"%s\" for reading: %s\n",
name, strerror(errno));
return EXIT_FAILURE;
}
}
int ret;
if (rtnl_open(&rth, 0) < 0) {
fprintf(stderr, "Cannot open rtnetlink\n");
return EXIT_FAILURE;
}
cmdlineno = 0;
while (getcmdline(&line, &len, stdin) != -1) {
char *largv[100];
int largc;
rtnl_set_strict_dump(&rth);
largc = makeargs(line, largv, 100);
if (largc == 0)
continue; /* blank line */
if (do_cmd(largv[0], largc, largv)) {
fprintf(stderr, "Command failed %s:%d\n",
name, cmdlineno);
ret = EXIT_FAILURE;
if (!force)
break;
}
}
if (line)
free(line);
ret = do_batch(name, force, br_batch_cmd, NULL);
rtnl_close(&rth);
return ret;
@ -138,7 +117,7 @@ main(int argc, char **argv)
if (matches(opt, "-help") == 0) {
usage();
} else if (matches(opt, "-Version") == 0) {
printf("bridge utility, 0.0\n");
printf("bridge utility, %s\n", version);
exit(0);
} else if (matches(opt, "-stats") == 0 ||
matches(opt, "-statistics") == 0) {
@ -170,12 +149,15 @@ main(int argc, char **argv)
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
++force;
} else if (matches(opt, "-json") == 0) {
++json_output;
++json;
} else if (matches(opt, "-pretty") == 0) {
++pretty;
} else if (matches(opt, "-batch") == 0) {
argc--;
argv++;
@ -193,12 +175,16 @@ main(int argc, char **argv)
_SL_ = oneline ? "\\" : "\n";
check_enable_color(color, json);
if (batch_file)
return batch(batch_file);
if (rtnl_open(&rth, 0) < 0)
exit(1);
rtnl_set_strict_dump(&rth);
if (argc > 1)
return do_cmd(argv[1], argc-1, argv+1);

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get/set/delete fdb table with netlink
*
@ -21,25 +22,29 @@
#include <linux/neighbour.h>
#include <string.h>
#include <limits.h>
#include <json_writer.h>
#include <stdbool.h>
#include "json_print.h"
#include "libnetlink.h"
#include "br_common.h"
#include "rt_names.h"
#include "utils.h"
static unsigned int filter_index, filter_vlan;
json_writer_t *jw_global;
static unsigned int filter_index, filter_dynamic, filter_master,
filter_state, filter_vlan;
static void usage(void)
{
fprintf(stderr, "Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n"
" [ self ] [ master ] [ use ] [ router ]\n"
" [ local | static | dynamic ] [ dst IPADDR ] [ vlan VID ]\n"
" [ port PORT] [ vni VNI ] [ via DEV ]\n");
fprintf(stderr, " bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] ]\n");
fprintf(stderr,
"Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n"
" [ self ] [ master ] [ use ] [ router ] [ extern_learn ]\n"
" [ sticky ] [ local | static | dynamic ] [ vlan VID ]\n"
" { [ dst IPADDR ] [ port PORT] [ vni VNI ] | [ nhid NHID ] }\n"
" [ via DEV ] [ src_vni VNI ]\n"
" bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ]\n"
" [ state STATE ] [ dynamic ] ]\n"
" bridge fdb get [ to ] LLADDR [ br BRDEV ] { brport | dev } DEV\n"
" [ vlan VID ] [ vni VNI ] [ self ] [ master ] [ dynamic ]\n");
exit(-1);
}
@ -59,28 +64,83 @@ static const char *state_n2a(unsigned int s)
if (s & NUD_REACHABLE)
return "";
sprintf(buf, "state=%#x", s);
if (is_json_context())
sprintf(buf, "%#x", s);
else
sprintf(buf, "state=%#x", s);
return buf;
}
static void start_json_fdb_flags_array(bool *fdb_flags)
static int state_a2n(unsigned int *s, const char *arg)
{
if (*fdb_flags)
return;
jsonw_name(jw_global, "flags");
jsonw_start_array(jw_global);
*fdb_flags = true;
if (matches(arg, "permanent") == 0)
*s = NUD_PERMANENT;
else if (matches(arg, "static") == 0 || matches(arg, "temp") == 0)
*s = NUD_NOARP;
else if (matches(arg, "stale") == 0)
*s = NUD_STALE;
else if (matches(arg, "reachable") == 0 || matches(arg, "dynamic") == 0)
*s = NUD_REACHABLE;
else if (strcmp(arg, "all") == 0)
*s = ~0;
else if (get_unsigned(s, arg, 0))
return -1;
return 0;
}
int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
static void fdb_print_flags(FILE *fp, unsigned int flags)
{
open_json_array(PRINT_JSON,
is_json_context() ? "flags" : "");
if (flags & NTF_SELF)
print_string(PRINT_ANY, NULL, "%s ", "self");
if (flags & NTF_ROUTER)
print_string(PRINT_ANY, NULL, "%s ", "router");
if (flags & NTF_EXT_LEARNED)
print_string(PRINT_ANY, NULL, "%s ", "extern_learn");
if (flags & NTF_OFFLOADED)
print_string(PRINT_ANY, NULL, "%s ", "offload");
if (flags & NTF_MASTER)
print_string(PRINT_ANY, NULL, "%s ", "master");
if (flags & NTF_STICKY)
print_string(PRINT_ANY, NULL, "%s ", "sticky");
close_json_array(PRINT_JSON, NULL);
}
static void fdb_print_stats(FILE *fp, const struct nda_cacheinfo *ci)
{
static int hz;
if (!hz)
hz = get_user_hz();
if (is_json_context()) {
print_uint(PRINT_JSON, "used", NULL,
ci->ndm_used / hz);
print_uint(PRINT_JSON, "updated", NULL,
ci->ndm_updated / hz);
} else {
fprintf(fp, "used %d/%d ", ci->ndm_used / hz,
ci->ndm_updated / hz);
}
}
int print_fdb(struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
struct ndmsg *r = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[NDA_MAX+1];
__u16 vid = 0;
bool fdb_flags = false;
const char *state_s;
if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
@ -100,10 +160,8 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
if (filter_index && filter_index != r->ndm_ifindex)
return 0;
if (jw_global) {
jsonw_pretty(jw_global, 1);
jsonw_start_object(jw_global);
}
if (filter_state && !(r->ndm_state & filter_state))
return 0;
parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
@ -114,170 +172,143 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
if (filter_vlan && filter_vlan != vid)
return 0;
if (n->nlmsg_type == RTM_DELNEIGH) {
if (jw_global)
jsonw_string_field(jw_global, "opCode", "deleted");
else
fprintf(fp, "Deleted ");
}
if (filter_dynamic && (r->ndm_state & NUD_PERMANENT))
return 0;
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELNEIGH)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
if (tb[NDA_LLADDR]) {
const char *lladdr;
SPRINT_BUF(b1);
ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
RTA_PAYLOAD(tb[NDA_LLADDR]),
ll_index_to_type(r->ndm_ifindex),
b1, sizeof(b1));
if (jw_global)
jsonw_string_field(jw_global, "mac", b1);
else
fprintf(fp, "%s ", b1);
lladdr = ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
RTA_PAYLOAD(tb[NDA_LLADDR]),
ll_index_to_type(r->ndm_ifindex),
b1, sizeof(b1));
print_color_string(PRINT_ANY, COLOR_MAC,
"mac", "%s ", lladdr);
}
if (!filter_index && r->ndm_ifindex) {
if (jw_global)
jsonw_string_field(jw_global, "dev",
ll_index_to_name(r->ndm_ifindex));
else
fprintf(fp, "dev %s ",
ll_index_to_name(r->ndm_ifindex));
print_string(PRINT_FP, NULL, "dev ", NULL);
print_color_string(PRINT_ANY, COLOR_IFNAME,
"ifname", "%s ",
ll_index_to_name(r->ndm_ifindex));
}
if (tb[NDA_DST]) {
int family = AF_INET;
const char *abuf_s;
const char *dst;
if (RTA_PAYLOAD(tb[NDA_DST]) == sizeof(struct in6_addr))
family = AF_INET6;
abuf_s = format_host(family,
RTA_PAYLOAD(tb[NDA_DST]),
RTA_DATA(tb[NDA_DST]));
if (jw_global)
jsonw_string_field(jw_global, "dst", abuf_s);
else
fprintf(fp, "dst %s ", abuf_s);
dst = format_host(family,
RTA_PAYLOAD(tb[NDA_DST]),
RTA_DATA(tb[NDA_DST]));
print_string(PRINT_FP, NULL, "dst ", NULL);
print_color_string(PRINT_ANY,
ifa_family_color(family),
"dst", "%s ", dst);
}
if (vid) {
if (jw_global)
jsonw_uint_field(jw_global, "vlan", vid);
else
fprintf(fp, "vlan %hu ", vid);
}
if (vid)
print_uint(PRINT_ANY,
"vlan", "vlan %hu ", vid);
if (tb[NDA_PORT]) {
if (jw_global)
jsonw_uint_field(jw_global, "port",
ntohs(rta_getattr_u16(tb[NDA_PORT])));
else
fprintf(fp, "port %d ",
ntohs(rta_getattr_u16(tb[NDA_PORT])));
}
if (tb[NDA_PORT])
print_uint(PRINT_ANY,
"port", "port %u ",
rta_getattr_be16(tb[NDA_PORT]));
if (tb[NDA_VNI]) {
if (jw_global)
jsonw_uint_field(jw_global, "vni",
rta_getattr_u32(tb[NDA_VNI]));
else
fprintf(fp, "vni %d ",
rta_getattr_u32(tb[NDA_VNI]));
}
if (tb[NDA_VNI])
print_uint(PRINT_ANY,
"vni", "vni %u ",
rta_getattr_u32(tb[NDA_VNI]));
if (tb[NDA_SRC_VNI])
print_uint(PRINT_ANY,
"src_vni", "src_vni %u ",
rta_getattr_u32(tb[NDA_SRC_VNI]));
if (tb[NDA_IFINDEX]) {
unsigned int ifindex = rta_getattr_u32(tb[NDA_IFINDEX]);
if (ifindex) {
char ifname[IF_NAMESIZE];
if (!tb[NDA_LINK_NETNSID] &&
if_indextoname(ifindex, ifname)) {
if (jw_global)
jsonw_string_field(jw_global, "viaIf",
ifname);
else
fprintf(fp, "via %s ", ifname);
} else {
if (jw_global)
jsonw_uint_field(jw_global, "viaIfIndex",
ifindex);
else
fprintf(fp, "via ifindex %u ", ifindex);
}
}
}
if (tb[NDA_LINK_NETNSID]) {
if (jw_global)
jsonw_uint_field(jw_global, "linkNetNsId",
rta_getattr_u32(tb[NDA_LINK_NETNSID]));
if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY,
"viaIfIndex", "via ifindex %u ",
ifindex);
else
fprintf(fp, "link-netnsid %d ",
rta_getattr_u32(tb[NDA_LINK_NETNSID]));
print_string(PRINT_ANY,
"viaIf", "via %s ",
ll_index_to_name(ifindex));
}
if (show_stats && tb[NDA_CACHEINFO]) {
struct nda_cacheinfo *ci = RTA_DATA(tb[NDA_CACHEINFO]);
int hz = get_user_hz();
if (tb[NDA_NH_ID])
print_uint(PRINT_ANY, "nhid", "nhid %u ",
rta_getattr_u32(tb[NDA_NH_ID]));
if (jw_global) {
jsonw_uint_field(jw_global, "used",
ci->ndm_used/hz);
jsonw_uint_field(jw_global, "updated",
ci->ndm_updated/hz);
} else {
fprintf(fp, "used %d/%d ", ci->ndm_used/hz,
ci->ndm_updated/hz);
}
if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY,
"linkNetNsId", "link-netnsid %d ",
rta_getattr_u32(tb[NDA_LINK_NETNSID]));
if (show_stats && tb[NDA_CACHEINFO])
fdb_print_stats(fp, RTA_DATA(tb[NDA_CACHEINFO]));
fdb_print_flags(fp, r->ndm_flags);
if (tb[NDA_MASTER])
print_string(PRINT_ANY, "master", "master %s ",
ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
print_string(PRINT_ANY, "state", "%s\n",
state_n2a(r->ndm_state));
close_json_object();
fflush(fp);
return 0;
}
static int fdb_linkdump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
if (filter_index) {
struct ifinfomsg *ifm = NLMSG_DATA(nlh);
ifm->ifi_index = filter_index;
}
if (jw_global) {
if (r->ndm_flags & NTF_SELF) {
start_json_fdb_flags_array(&fdb_flags);
jsonw_string(jw_global, "self");
}
if (r->ndm_flags & NTF_ROUTER) {
start_json_fdb_flags_array(&fdb_flags);
jsonw_string(jw_global, "router");
}
if (r->ndm_flags & NTF_EXT_LEARNED) {
start_json_fdb_flags_array(&fdb_flags);
jsonw_string(jw_global, "offload");
}
if (r->ndm_flags & NTF_MASTER)
jsonw_string(jw_global, "master");
if (fdb_flags)
jsonw_end_array(jw_global);
if (tb[NDA_MASTER])
jsonw_string_field(jw_global,
"master",
ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
} else {
if (r->ndm_flags & NTF_SELF)
fprintf(fp, "self ");
if (r->ndm_flags & NTF_ROUTER)
fprintf(fp, "router ");
if (r->ndm_flags & NTF_EXT_LEARNED)
fprintf(fp, "offload ");
if (tb[NDA_MASTER]) {
fprintf(fp, "master %s ",
ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
} else if (r->ndm_flags & NTF_MASTER) {
fprintf(fp, "master ");
}
if (filter_master) {
err = addattr32(nlh, reqlen, IFLA_MASTER, filter_master);
if (err)
return err;
}
state_s = state_n2a(r->ndm_state);
if (jw_global) {
if (state_s[0])
jsonw_string_field(jw_global, "state", state_s);
return 0;
}
jsonw_end_object(jw_global);
} else {
fprintf(fp, "%s\n", state_s);
static int fdb_dump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
fflush(fp);
if (filter_index) {
struct ndmsg *ndm = NLMSG_DATA(nlh);
ndm->ndm_ifindex = filter_index;
}
if (filter_master) {
err = addattr32(nlh, reqlen, NDA_MASTER, filter_master);
if (err)
return err;
}
return 0;
@ -285,18 +316,9 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
static int fdb_show(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ifinfomsg ifm;
char buf[256];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.ifm.ifi_family = PF_BRIDGE,
};
char *filter_dev = NULL;
char *br = NULL;
int msg_size = sizeof(struct ifinfomsg);
int rc;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
@ -310,6 +332,15 @@ static int fdb_show(int argc, char **argv)
if (filter_vlan)
duparg("vlan", *argv);
filter_vlan = atoi(*argv);
} else if (strcmp(*argv, "state") == 0) {
unsigned int state;
NEXT_ARG();
if (state_a2n(&state, *argv))
invarg("invalid state", *argv);
filter_state |= state;
} else if (strcmp(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (matches(*argv, "help") == 0)
usage();
@ -324,42 +355,32 @@ static int fdb_show(int argc, char **argv)
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
addattr32(&req.n, sizeof(req), IFLA_MASTER, br_ifindex);
msg_size += RTA_LENGTH(4);
filter_master = br_ifindex;
}
/*we'll keep around filter_dev for older kernels */
if (filter_dev) {
filter_index = if_nametoindex(filter_dev);
if (filter_index == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n",
filter_dev);
return -1;
}
req.ifm.ifi_index = filter_index;
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
if (rtnl_dump_request(&rth, RTM_GETNEIGH, &req.ifm, msg_size) < 0) {
if (rth.flags & RTNL_HANDLE_F_STRICT_CHK)
rc = rtnl_neighdump_req(&rth, PF_BRIDGE, fdb_dump_filter);
else
rc = rtnl_fdb_linkdump_req_filter_fn(&rth, fdb_linkdump_filter);
if (rc < 0) {
perror("Cannot send dump request");
exit(1);
}
if (json_output) {
jw_global = jsonw_new(stdout);
if (!jw_global) {
fprintf(stderr, "Error allocation json object\n");
exit(1);
}
jsonw_start_array(jw_global);
}
new_json_obj(json);
if (rtnl_dump_filter(&rth, print_fdb, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
if (jw_global) {
jsonw_end_array(jw_global);
jsonw_destroy(&jw_global);
}
delete_json_obj();
fflush(stdout);
return 0;
}
@ -384,9 +405,11 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
inet_prefix dst;
unsigned long port = 0;
unsigned long vni = ~0;
unsigned long src_vni = ~0;
unsigned int via = 0;
char *endptr;
short vid = -1;
__u32 nhid = 0;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@ -398,6 +421,10 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
duparg2("dst", *argv);
get_addr(&dst, *argv, preferred_family);
dst_ok = 1;
} else if (strcmp(*argv, "nhid") == 0) {
NEXT_ARG();
if (get_u32(&nhid, *argv, 0))
invarg("\"id\" value is invalid\n", *argv);
} else if (strcmp(*argv, "port") == 0) {
NEXT_ARG();
@ -417,11 +444,17 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "src_vni") == 0) {
NEXT_ARG();
src_vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(src_vni >> 24) || src_vni == ULONG_MAX)
invarg("invalid src VNI\n", *argv);
} else if (strcmp(*argv, "via") == 0) {
NEXT_ARG();
via = if_nametoindex(*argv);
if (via == 0)
invarg("invalid device\n", *argv);
via = ll_name_to_index(*argv);
if (!via)
exit(nodev(*argv));
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
@ -444,10 +477,14 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
vid = atoi(*argv);
} else if (matches(*argv, "use") == 0) {
req.ndm.ndm_flags |= NTF_USE;
} else if (matches(*argv, "extern_learn") == 0) {
req.ndm.ndm_flags |= NTF_EXT_LEARNED;
} else if (matches(*argv, "sticky") == 0) {
req.ndm.ndm_flags |= NTF_STICKY;
} else {
if (strcmp(*argv, "to") == 0) {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
}
if (matches(*argv, "help") == 0)
usage();
if (addr)
@ -462,6 +499,11 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
return -1;
}
if (nhid && (dst_ok || port || vni != ~0)) {
fprintf(stderr, "dst, port, vni are mutually exclusive with nhid\n");
return -1;
}
/* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF;
@ -483,6 +525,8 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
if (vid >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vid);
if (nhid > 0)
addattr32(&req.n, sizeof(req), NDA_NH_ID, nhid);
if (port) {
unsigned short dport;
@ -492,17 +536,132 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
}
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (src_vni != ~0)
addattr32(&req.n, sizeof(req), NDA_SRC_VNI, src_vni);
if (via)
addattr32(&req.n, sizeof(req), NDA_IFINDEX, via);
req.ndm.ndm_ifindex = ll_name_to_index(d);
if (req.ndm.ndm_ifindex == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n", d);
if (!req.ndm.ndm_ifindex)
return nodev(d);
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
static int fdb_get(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ndmsg ndm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_GETNEIGH,
.ndm.ndm_family = AF_BRIDGE,
};
char *d = NULL, *br = NULL;
struct nlmsghdr *answer;
unsigned long vni = ~0;
char abuf[ETH_ALEN];
int br_ifindex = 0;
char *addr = NULL;
short vlan = -1;
char *endptr;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "br") == 0) {
NEXT_ARG();
br = *argv;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "vni") == 0) {
NEXT_ARG();
vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
req.ndm.ndm_flags |= NTF_MASTER;
} else if (matches(*argv, "vlan") == 0) {
if (vlan >= 0)
duparg2("vlan", *argv);
NEXT_ARG();
vlan = atoi(*argv);
} else if (matches(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
if (matches(*argv, "help") == 0)
usage();
if (addr)
duparg2("to", *argv);
addr = *argv;
}
argc--; argv++;
}
if ((d == NULL && br == NULL) || addr == NULL) {
fprintf(stderr, "Device or master and address are required arguments.\n");
return -1;
}
if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
if (sscanf(addr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
abuf, abuf+1, abuf+2,
abuf+3, abuf+4, abuf+5) != 6) {
fprintf(stderr, "Invalid mac address %s\n", addr);
return -1;
}
addattr_l(&req.n, sizeof(req), NDA_LLADDR, abuf, ETH_ALEN);
if (vlan >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vlan);
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (d) {
req.ndm.ndm_ifindex = ll_name_to_index(d);
if (!req.ndm.ndm_ifindex) {
fprintf(stderr, "Cannot find device \"%s\"\n", d);
return -1;
}
}
if (br) {
br_ifindex = ll_name_to_index(br);
if (!br_ifindex) {
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
addattr32(&req.n, sizeof(req), NDA_MASTER, br_ifindex);
}
if (rtnl_talk(&rth, &req.n, &answer) < 0)
return -2;
/*
* Initialize a json_writer and open an array object
* if -json was specified.
*/
new_json_obj(json);
if (print_fdb(answer, stdout) < 0) {
fprintf(stderr, "An error :-)\n");
return -1;
}
delete_json_obj();
return 0;
}
@ -520,6 +679,8 @@ int do_fdb(int argc, char **argv)
return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_REPLACE, argc-1, argv+1);
if (matches(*argv, "delete") == 0)
return fdb_modify(RTM_DELNEIGH, 0, argc-1, argv+1);
if (matches(*argv, "get") == 0)
return fdb_get(argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <stdio.h>
#include <stdlib.h>
@ -11,13 +12,14 @@
#include <string.h>
#include <stdbool.h>
#include "json_print.h"
#include "libnetlink.h"
#include "utils.h"
#include "br_common.h"
static unsigned int filter_index;
static const char *port_states[] = {
static const char *stp_states[] = {
[BR_STATE_DISABLED] = "disabled",
[BR_STATE_LISTENING] = "listening",
[BR_STATE_LEARNING] = "learning",
@ -25,17 +27,21 @@ static const char *port_states[] = {
[BR_STATE_BLOCKING] = "blocking",
};
extern char *if_indextoname(unsigned int __ifindex, char *__ifname);
static const char *hw_mode[] = {
"VEB", "VEPA"
};
static void print_link_flags(FILE *fp, unsigned int flags)
static void print_link_flags(FILE *fp, unsigned int flags, unsigned int mdown)
{
fprintf(fp, "<");
open_json_array(PRINT_ANY, is_json_context() ? "flags" : "<");
if (flags & IFF_UP && !(flags & IFF_RUNNING))
fprintf(fp, "NO-CARRIER%s", flags ? "," : "");
print_string(PRINT_ANY, NULL,
flags ? "%s," : "%s", "NO-CARRIER");
flags &= ~IFF_RUNNING;
#define _PF(f) if (flags&IFF_##f) { \
flags &= ~IFF_##f ; \
fprintf(fp, #f "%s", flags ? "," : ""); }
#define _PF(f) if (flags&IFF_##f) { \
flags &= ~IFF_##f ; \
print_string(PRINT_ANY, NULL, flags ? "%s," : "%s", #f); }
_PF(LOOPBACK);
_PF(BROADCAST);
_PF(POINTOPOINT);
@ -56,54 +62,152 @@ static void print_link_flags(FILE *fp, unsigned int flags)
_PF(ECHO);
#undef _PF
if (flags)
fprintf(fp, "%x", flags);
fprintf(fp, "> ");
print_hex(PRINT_ANY, NULL, "%x", flags);
if (mdown)
print_string(PRINT_ANY, NULL, ",%s", "M-DOWN");
close_json_array(PRINT_ANY, "> ");
}
static const char *oper_states[] = {
"UNKNOWN", "NOTPRESENT", "DOWN", "LOWERLAYERDOWN",
"TESTING", "DORMANT", "UP"
};
static const char *hw_mode[] = {"VEB", "VEPA"};
static void print_operstate(FILE *f, __u8 state)
{
if (state >= ARRAY_SIZE(oper_states))
fprintf(f, "state %#x ", state);
else
fprintf(f, "state %s ", oper_states[state]);
}
static void print_portstate(FILE *f, __u8 state)
void print_stp_state(__u8 state)
{
if (state <= BR_STATE_BLOCKING)
fprintf(f, "state %s ", port_states[state]);
print_string(PRINT_ANY, "state",
"state %s ", stp_states[state]);
else
fprintf(f, "state (%d) ", state);
print_uint(PRINT_ANY, "state",
"state (%d) ", state);
}
static void print_onoff(FILE *f, char *flag, __u8 val)
int parse_stp_state(const char *arg)
{
fprintf(f, "%s %s ", flag, val ? "on" : "off");
size_t nstates = ARRAY_SIZE(stp_states);
int state;
for (state = 0; state < nstates; state++)
if (strcmp(stp_states[state], arg) == 0)
break;
if (state == nstates)
state = -1;
return state;
}
static void print_hwmode(FILE *f, __u16 mode)
static void print_hwmode(__u16 mode)
{
if (mode >= ARRAY_SIZE(hw_mode))
fprintf(f, "hwmode %#hx ", mode);
print_0xhex(PRINT_ANY, "hwmode",
"hwmode %#llx ", mode);
else
fprintf(f, "hwmode %s ", hw_mode[mode]);
print_string(PRINT_ANY, "hwmode",
"hwmode %s ", hw_mode[mode]);
}
int print_linkinfo(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg)
static void print_protinfo(FILE *fp, struct rtattr *attr)
{
if (attr->rta_type & NLA_F_NESTED) {
struct rtattr *prtb[IFLA_BRPORT_MAX + 1];
parse_rtattr_nested(prtb, IFLA_BRPORT_MAX, attr);
if (prtb[IFLA_BRPORT_STATE])
print_stp_state(rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
if (prtb[IFLA_BRPORT_PRIORITY])
print_uint(PRINT_ANY, "priority",
"priority %u ",
rta_getattr_u16(prtb[IFLA_BRPORT_PRIORITY]));
if (prtb[IFLA_BRPORT_COST])
print_uint(PRINT_ANY, "cost",
"cost %u ",
rta_getattr_u32(prtb[IFLA_BRPORT_COST]));
if (!show_details)
return;
if (!is_json_context())
fprintf(fp, "%s ", _SL_);
if (prtb[IFLA_BRPORT_MODE])
print_on_off(PRINT_ANY, "hairpin", "hairpin %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
if (prtb[IFLA_BRPORT_GUARD])
print_on_off(PRINT_ANY, "guard", "guard %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
if (prtb[IFLA_BRPORT_PROTECT])
print_on_off(PRINT_ANY, "root_block", "root_block %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
if (prtb[IFLA_BRPORT_FAST_LEAVE])
print_on_off(PRINT_ANY, "fastleave", "fastleave %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
if (prtb[IFLA_BRPORT_LEARNING])
print_on_off(PRINT_ANY, "learning", "learning %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
if (prtb[IFLA_BRPORT_LEARNING_SYNC])
print_on_off(PRINT_ANY, "learning_sync", "learning_sync %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
print_on_off(PRINT_ANY, "flood", "flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_FLOOD])
print_on_off(PRINT_ANY, "mcast_flood", "mcast_flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_TO_UCAST])
print_on_off(PRINT_ANY, "mcast_to_unicast", "mcast_to_unicast %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_TO_UCAST]));
if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS])
print_on_off(PRINT_ANY, "neigh_suppress", "neigh_suppress %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS]));
if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_on_off(PRINT_ANY, "vlan_tunnel", "vlan_tunnel %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
if (prtb[IFLA_BRPORT_BACKUP_PORT]) {
int ifidx;
ifidx = rta_getattr_u32(prtb[IFLA_BRPORT_BACKUP_PORT]);
print_string(PRINT_ANY,
"backup_port", "backup_port %s ",
ll_index_to_name(ifidx));
}
if (prtb[IFLA_BRPORT_ISOLATED])
print_on_off(PRINT_ANY, "isolated", "isolated %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
} else
print_stp_state(rta_getattr_u8(attr));
}
/*
* This is reported by HW devices that have some bridging
* capabilities.
*/
static void print_af_spec(struct rtattr *attr, int ifindex)
{
struct rtattr *aftb[IFLA_BRIDGE_MAX+1];
parse_rtattr_nested(aftb, IFLA_BRIDGE_MAX, attr);
if (aftb[IFLA_BRIDGE_MODE])
print_hwmode(rta_getattr_u16(aftb[IFLA_BRIDGE_MODE]));
if (!show_details)
return;
if (aftb[IFLA_BRIDGE_VLAN_INFO])
print_vlan_info(aftb[IFLA_BRIDGE_VLAN_INFO], ifindex);
}
int print_linkinfo(struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
int len = n->nlmsg_len;
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
char b1[IFNAMSIZ];
unsigned int m_flag = 0;
int len = n->nlmsg_len;
const char *name;
len -= NLMSG_LENGTH(sizeof(*ifi));
if (len < 0) {
@ -119,136 +223,65 @@ int print_linkinfo(const struct sockaddr_nl *who,
parse_rtattr_flags(tb, IFLA_MAX, IFLA_RTA(ifi), len, NLA_F_NESTED);
if (tb[IFLA_IFNAME] == NULL) {
fprintf(stderr, "BUG: nil ifname\n");
name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
if (!name)
return -1;
}
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
fprintf(fp, "%d: %s ", ifi->ifi_index,
tb[IFLA_IFNAME] ? rta_getattr_str(tb[IFLA_IFNAME]) : "<nil>");
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
if (tb[IFLA_LINK]) {
SPRINT_BUF(b1);
int iflink = rta_getattr_u32(tb[IFLA_LINK]);
if (iflink == 0)
fprintf(fp, "@NONE: ");
else
fprintf(fp, "@%s: ",
if_indextoname(iflink, b1));
} else
fprintf(fp, ": ");
print_link_flags(fp, ifi->ifi_flags);
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
m_flag = print_name_and_link("%s: ", name, tb);
print_link_flags(fp, ifi->ifi_flags, m_flag);
if (tb[IFLA_MTU])
fprintf(fp, "mtu %u ", rta_getattr_u32(tb[IFLA_MTU]));
print_int(PRINT_ANY,
"mtu", "mtu %u ",
rta_getattr_u32(tb[IFLA_MTU]));
if (tb[IFLA_MASTER])
fprintf(fp, "master %s ",
if_indextoname(rta_getattr_u32(tb[IFLA_MASTER]), b1));
if (tb[IFLA_MASTER]) {
int master = rta_getattr_u32(tb[IFLA_MASTER]);
if (tb[IFLA_PROTINFO]) {
if (tb[IFLA_PROTINFO]->rta_type & NLA_F_NESTED) {
struct rtattr *prtb[IFLA_BRPORT_MAX+1];
parse_rtattr_nested(prtb, IFLA_BRPORT_MAX,
tb[IFLA_PROTINFO]);
if (prtb[IFLA_BRPORT_STATE])
print_portstate(fp,
rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
if (prtb[IFLA_BRPORT_PRIORITY])
fprintf(fp, "priority %hu ",
rta_getattr_u16(prtb[IFLA_BRPORT_PRIORITY]));
if (prtb[IFLA_BRPORT_COST])
fprintf(fp, "cost %u ",
rta_getattr_u32(prtb[IFLA_BRPORT_COST]));
if (show_details) {
fprintf(fp, "%s ", _SL_);
if (prtb[IFLA_BRPORT_MODE])
print_onoff(fp, "hairpin",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
if (prtb[IFLA_BRPORT_GUARD])
print_onoff(fp, "guard",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
if (prtb[IFLA_BRPORT_PROTECT])
print_onoff(fp, "root_block",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
if (prtb[IFLA_BRPORT_FAST_LEAVE])
print_onoff(fp, "fastleave",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
if (prtb[IFLA_BRPORT_LEARNING])
print_onoff(fp, "learning",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
if (prtb[IFLA_BRPORT_LEARNING_SYNC])
print_onoff(fp, "learning_sync",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
print_onoff(fp, "flood",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
}
} else
print_portstate(fp, rta_getattr_u8(tb[IFLA_PROTINFO]));
print_string(PRINT_ANY, "master", "master %s ",
ll_index_to_name(master));
}
if (tb[IFLA_AF_SPEC]) {
/* This is reported by HW devices that have some bridging
* capabilities.
*/
struct rtattr *aftb[IFLA_BRIDGE_MAX+1];
if (tb[IFLA_PROTINFO])
print_protinfo(fp, tb[IFLA_PROTINFO]);
parse_rtattr_nested(aftb, IFLA_BRIDGE_MAX, tb[IFLA_AF_SPEC]);
if (tb[IFLA_AF_SPEC])
print_af_spec(tb[IFLA_AF_SPEC], ifi->ifi_index);
if (aftb[IFLA_BRIDGE_MODE])
print_hwmode(fp, rta_getattr_u16(aftb[IFLA_BRIDGE_MODE]));
}
fprintf(fp, "\n");
print_string(PRINT_FP, NULL, "%s", "\n");
close_json_object();
fflush(fp);
return 0;
}
static void usage(void)
{
fprintf(stderr, "Usage: bridge link set dev DEV [ cost COST ] [ priority PRIO ] [ state STATE ]\n");
fprintf(stderr, " [ guard {on | off} ]\n");
fprintf(stderr, " [ hairpin {on | off} ]\n");
fprintf(stderr, " [ fastleave {on | off} ]\n");
fprintf(stderr, " [ root_block {on | off} ]\n");
fprintf(stderr, " [ learning {on | off} ]\n");
fprintf(stderr, " [ learning_sync {on | off} ]\n");
fprintf(stderr, " [ flood {on | off} ]\n");
fprintf(stderr, " [ hwmode {vepa | veb} ]\n");
fprintf(stderr, " [ self ] [ master ]\n");
fprintf(stderr, " bridge link show [dev DEV]\n");
fprintf(stderr,
"Usage: bridge link set dev DEV [ cost COST ] [ priority PRIO ] [ state STATE ]\n"
" [ guard {on | off} ]\n"
" [ hairpin {on | off} ]\n"
" [ fastleave {on | off} ]\n"
" [ root_block {on | off} ]\n"
" [ learning {on | off} ]\n"
" [ learning_sync {on | off} ]\n"
" [ flood {on | off} ]\n"
" [ mcast_flood {on | off} ]\n"
" [ mcast_to_unicast {on | off} ]\n"
" [ neigh_suppress {on | off} ]\n"
" [ vlan_tunnel {on | off} ]\n"
" [ isolated {on | off} ]\n"
" [ hwmode {vepa | veb} ]\n"
" [ backup_port DEVICE ] [ nobackup_port ]\n"
" [ self ] [ master ]\n"
" bridge link show [dev DEV]\n");
exit(-1);
}
static bool on_off(char *arg, __s8 *attr, char *val)
{
if (strcmp(val, "on") == 0)
*attr = 1;
else if (strcmp(val, "off") == 0)
*attr = 0;
else {
fprintf(stderr,
"Error: argument of \"%s\" must be \"on\" or \"off\"\n",
arg);
return false;
}
return true;
}
static int brlink_modify(int argc, char **argv)
{
struct {
@ -262,9 +295,15 @@ static int brlink_modify(int argc, char **argv)
.ifm.ifi_family = PF_BRIDGE,
};
char *d = NULL;
int backup_port_idx = -1;
__s8 neigh_suppress = -1;
__s8 learning = -1;
__s8 learning_sync = -1;
__s8 flood = -1;
__s8 vlan_tunnel = -1;
__s8 mcast_flood = -1;
__s8 mcast_to_unicast = -1;
__s8 isolated = -1;
__s8 hairpin = -1;
__s8 bpdu_guard = -1;
__s8 fast_leave = -1;
@ -275,6 +314,7 @@ static int brlink_modify(int argc, char **argv)
__s16 mode = -1;
__u16 flags = 0;
struct rtattr *nest;
int ret;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@ -282,32 +322,49 @@ static int brlink_modify(int argc, char **argv)
d = *argv;
} else if (strcmp(*argv, "guard") == 0) {
NEXT_ARG();
if (!on_off("guard", &bpdu_guard, *argv))
return -1;
bpdu_guard = parse_on_off("guard", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "hairpin") == 0) {
NEXT_ARG();
if (!on_off("hairping", &hairpin, *argv))
return -1;
hairpin = parse_on_off("hairpin", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "fastleave") == 0) {
NEXT_ARG();
if (!on_off("fastleave", &fast_leave, *argv))
return -1;
fast_leave = parse_on_off("fastleave", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "root_block") == 0) {
NEXT_ARG();
if (!on_off("root_block", &root_block, *argv))
return -1;
root_block = parse_on_off("root_block", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning") == 0) {
NEXT_ARG();
if (!on_off("learning", &learning, *argv))
return -1;
learning = parse_on_off("learning", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning_sync") == 0) {
NEXT_ARG();
if (!on_off("learning_sync", &learning_sync, *argv))
return -1;
learning_sync = parse_on_off("learning_sync", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "flood") == 0) {
NEXT_ARG();
if (!on_off("flood", &flood, *argv))
return -1;
flood = parse_on_off("flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_flood") == 0) {
NEXT_ARG();
mcast_flood = parse_on_off("mcast_flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_to_unicast") == 0) {
NEXT_ARG();
mcast_to_unicast = parse_on_off("mcast_to_unicast", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "cost") == 0) {
NEXT_ARG();
cost = atoi(*argv);
@ -317,14 +374,11 @@ static int brlink_modify(int argc, char **argv)
} else if (strcmp(*argv, "state") == 0) {
NEXT_ARG();
char *endptr;
size_t nstates = ARRAY_SIZE(port_states);
state = strtol(*argv, &endptr, 10);
if (!(**argv != '\0' && *endptr == '\0')) {
for (state = 0; state < nstates; state++)
if (strcmp(port_states[state], *argv) == 0)
break;
if (state == nstates) {
state = parse_stp_state(*argv);
if (state == -1) {
fprintf(stderr,
"Error: invalid STP port state\n");
return -1;
@ -346,6 +400,31 @@ static int brlink_modify(int argc, char **argv)
flags |= BRIDGE_FLAGS_SELF;
} else if (strcmp(*argv, "master") == 0) {
flags |= BRIDGE_FLAGS_MASTER;
} else if (strcmp(*argv, "neigh_suppress") == 0) {
NEXT_ARG();
neigh_suppress = parse_on_off("neigh_suppress", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "vlan_tunnel") == 0) {
NEXT_ARG();
vlan_tunnel = parse_on_off("vlan_tunnel", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "isolated") == 0) {
NEXT_ARG();
isolated = parse_on_off("isolated", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "backup_port") == 0) {
NEXT_ARG();
backup_port_idx = ll_name_to_index(*argv);
if (!backup_port_idx) {
fprintf(stderr, "Error: device %s does not exist\n",
*argv);
return -1;
}
} else if (strcmp(*argv, "nobackup_port") == 0) {
backup_port_idx = 0;
} else {
usage();
}
@ -380,6 +459,12 @@ static int brlink_modify(int argc, char **argv)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_PROTECT, root_block);
if (flood >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_UNICAST_FLOOD, flood);
if (mcast_flood >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_FLOOD,
mcast_flood);
if (mcast_to_unicast >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_TO_UCAST,
mcast_to_unicast);
if (learning >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING, learning);
if (learning_sync >= 0)
@ -395,6 +480,19 @@ static int brlink_modify(int argc, char **argv)
if (state >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_STATE, state);
if (neigh_suppress != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_NEIGH_SUPPRESS,
neigh_suppress);
if (vlan_tunnel != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_VLAN_TUNNEL,
vlan_tunnel);
if (isolated != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_ISOLATED, isolated);
if (backup_port_idx != -1)
addattr32(&req.n, sizeof(req), IFLA_BRPORT_BACKUP_PORT,
backup_port_idx);
addattr_nest_end(&req.n, nest);
/* IFLA_AF_SPEC nested attribute. Contains IFLA_BRIDGE_FLAGS that
@ -414,7 +512,7 @@ static int brlink_modify(int argc, char **argv)
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
@ -435,22 +533,34 @@ static int brlink_show(int argc, char **argv)
}
if (filter_dev) {
if ((filter_index = ll_name_to_index(filter_dev)) == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n",
filter_dev);
return -1;
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
if (show_details) {
if (rtnl_linkdump_req_filter(&rth, PF_BRIDGE,
(compress_vlans ?
RTEXT_FILTER_BRVLAN_COMPRESSED :
RTEXT_FILTER_BRVLAN)) < 0) {
perror("Cannon send dump request");
exit(1);
}
} else {
if (rtnl_linkdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannon send dump request");
exit(1);
}
}
if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETLINK) < 0) {
perror("Cannon send dump request");
exit(1);
}
new_json_obj(json);
if (rtnl_dump_filter(&rth, print_linkinfo, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
delete_json_obj();
fflush(stdout);
return 0;
}

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get mdb table with netlink
*/
@ -15,9 +16,10 @@
#include <arpa/inet.h>
#include "libnetlink.h"
#include "utils.h"
#include "br_common.h"
#include "rt_names.h"
#include "utils.h"
#include "json_print.h"
#ifndef MDBA_RTA
#define MDBA_RTA(r) \
@ -28,8 +30,9 @@ static unsigned int filter_index, filter_vlan;
static void usage(void)
{
fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n");
fprintf(stderr, " bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
fprintf(stderr,
"Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [src SOURCE] [permanent | temp] [vid VID]\n"
" bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
exit(-1);
}
@ -38,81 +41,213 @@ static bool is_temp_mcast_rtr(__u8 type)
return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP;
}
static void __print_router_port_stats(FILE *f, struct rtattr *pattr)
static const char *format_timer(__u32 ticks, int align)
{
struct timeval tv;
static char tbuf[32];
__jiffies_to_tv(&tv, ticks);
if (align)
snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
else
snprintf(tbuf, sizeof(tbuf), "%lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
return tbuf;
}
void br_print_router_port_stats(struct rtattr *pattr)
{
struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1];
struct timeval tv;
__u8 type;
parse_rtattr(tb, MDBA_ROUTER_PATTR_MAX, MDB_RTR_RTA(RTA_DATA(pattr)),
RTA_PAYLOAD(pattr) - RTA_ALIGN(sizeof(uint32_t)));
if (tb[MDBA_ROUTER_PATTR_TIMER]) {
__jiffies_to_tv(&tv,
rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]));
fprintf(f, " %4i.%.2i",
(int)tv.tv_sec, (int)tv.tv_usec/10000);
__u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer, 1));
}
if (tb[MDBA_ROUTER_PATTR_TYPE]) {
type = rta_getattr_u8(tb[MDBA_ROUTER_PATTR_TYPE]);
fprintf(f, " %s",
is_temp_mcast_rtr(type) ? "temp" : "permanent");
__u8 type = rta_getattr_u8(tb[MDBA_ROUTER_PATTR_TYPE]);
print_string(PRINT_ANY, "type", " %s",
is_temp_mcast_rtr(type) ? "temp" : "permanent");
}
}
static void br_print_router_ports(FILE *f, struct rtattr *attr, __u32 brifidx)
static void br_print_router_ports(FILE *f, struct rtattr *attr,
const char *brifname)
{
uint32_t *port_ifindex;
int rem = RTA_PAYLOAD(attr);
struct rtattr *i;
int rem;
if (!show_stats)
fprintf(f, "router ports on %s: ", ll_index_to_name(brifidx));
if (is_json_context())
open_json_array(PRINT_JSON, brifname);
else if (!show_stats)
fprintf(f, "router ports on %s: ", brifname);
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
port_ifindex = RTA_DATA(i);
if (show_stats) {
uint32_t *port_ifindex = RTA_DATA(i);
const char *port_ifname = ll_index_to_name(*port_ifindex);
if (is_json_context()) {
open_json_object(NULL);
print_string(PRINT_JSON, "port", NULL, port_ifname);
if (show_stats)
br_print_router_port_stats(i);
close_json_object();
} else if (show_stats) {
fprintf(f, "router ports on %s: %s",
ll_index_to_name(brifidx),
ll_index_to_name(*port_ifindex));
__print_router_port_stats(f, i);
brifname, port_ifname);
br_print_router_port_stats(i);
fprintf(f, "\n");
} else {
fprintf(f, "%s ", ll_index_to_name(*port_ifindex));
fprintf(f, "%s ", port_ifname);
}
}
if (!show_stats)
fprintf(f, "\n");
print_nl();
close_json_array(PRINT_JSON, NULL);
}
static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e,
static void print_src_entry(struct rtattr *src_attr, int af, const char *sep)
{
struct rtattr *stb[MDBA_MDB_SRCATTR_MAX + 1];
SPRINT_BUF(abuf);
const char *addr;
__u32 timer_val;
parse_rtattr_nested(stb, MDBA_MDB_SRCATTR_MAX, src_attr);
if (!stb[MDBA_MDB_SRCATTR_ADDRESS] || !stb[MDBA_MDB_SRCATTR_TIMER])
return;
addr = inet_ntop(af, RTA_DATA(stb[MDBA_MDB_SRCATTR_ADDRESS]), abuf,
sizeof(abuf));
if (!addr)
return;
timer_val = rta_getattr_u32(stb[MDBA_MDB_SRCATTR_TIMER]);
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%s", sep);
print_color_string(PRINT_ANY, ifa_family_color(af),
"address", "%s", addr);
print_string(PRINT_ANY, "timer", "/%s", format_timer(timer_val, 0));
close_json_object();
}
static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
struct nlmsghdr *n, struct rtattr **tb)
{
const void *grp, *src;
const char *addr;
SPRINT_BUF(abuf);
const void *src;
const char *dev;
int af;
if (filter_vlan && e->vid != filter_vlan)
return;
af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
(const void *)&e->addr.u.ip6;
if (n->nlmsg_type == RTM_DELMDB)
fprintf(f, "Deleted ");
fprintf(f, "dev %s port %s grp %s %s %s", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
inet_ntop(af, src, abuf, sizeof(abuf)),
(e->state & MDB_PERMANENT) ? "permanent" : "temp",
(e->flags & MDB_FLAGS_OFFLOAD) ? "offload" : "");
if (e->vid)
fprintf(f, " vid %hu", e->vid);
if (show_stats && tb && tb[MDBA_MDB_EATTR_TIMER]) {
struct timeval tv;
__jiffies_to_tv(&tv, rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]));
fprintf(f, "%4i.%.2i", (int)tv.tv_sec, (int)tv.tv_usec/10000);
if (!e->addr.proto) {
af = AF_PACKET;
grp = &e->addr.u.mac_addr;
} else if (e->addr.proto == htons(ETH_P_IP)) {
af = AF_INET;
grp = &e->addr.u.ip4;
} else {
af = AF_INET6;
grp = &e->addr.u.ip6;
}
fprintf(f, "\n");
dev = ll_index_to_name(ifindex);
open_json_object(NULL);
print_int(PRINT_JSON, "index", NULL, ifindex);
print_color_string(PRINT_ANY, COLOR_IFNAME, "dev", "dev %s", dev);
print_string(PRINT_ANY, "port", " port %s",
ll_index_to_name(e->ifindex));
/* The ETH_ALEN argument is ignored for all cases but AF_PACKET */
addr = rt_addr_n2a_r(af, ETH_ALEN, grp, abuf, sizeof(abuf));
if (!addr)
return;
print_color_string(PRINT_ANY, ifa_family_color(af),
"grp", " grp %s", addr);
if (tb && tb[MDBA_MDB_EATTR_SOURCE]) {
src = (const void *)RTA_DATA(tb[MDBA_MDB_EATTR_SOURCE]);
print_color_string(PRINT_ANY, ifa_family_color(af),
"src", " src %s",
inet_ntop(af, src, abuf, sizeof(abuf)));
}
print_string(PRINT_ANY, "state", " %s",
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
if (show_details && tb) {
if (tb[MDBA_MDB_EATTR_GROUP_MODE]) {
__u8 mode = rta_getattr_u8(tb[MDBA_MDB_EATTR_GROUP_MODE]);
print_string(PRINT_ANY, "filter_mode", " filter_mode %s",
mode == MCAST_INCLUDE ? "include" :
"exclude");
}
if (tb[MDBA_MDB_EATTR_SRC_LIST]) {
struct rtattr *i, *attr = tb[MDBA_MDB_EATTR_SRC_LIST];
const char *sep = " ";
int rem;
open_json_array(PRINT_ANY, is_json_context() ?
"source_list" :
" source_list");
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem);
i = RTA_NEXT(i, rem)) {
print_src_entry(i, af, sep);
sep = ",";
}
close_json_array(PRINT_JSON, NULL);
}
if (tb[MDBA_MDB_EATTR_RTPROT]) {
__u8 rtprot = rta_getattr_u8(tb[MDBA_MDB_EATTR_RTPROT]);
SPRINT_BUF(rtb);
print_string(PRINT_ANY, "protocol", " proto %s ",
rtnl_rtprot_n2a(rtprot, rtb, sizeof(rtb)));
}
}
open_json_array(PRINT_JSON, "flags");
if (e->flags & MDB_FLAGS_OFFLOAD)
print_string(PRINT_ANY, NULL, " %s", "offload");
if (e->flags & MDB_FLAGS_FAST_LEAVE)
print_string(PRINT_ANY, NULL, " %s", "fast_leave");
if (e->flags & MDB_FLAGS_STAR_EXCL)
print_string(PRINT_ANY, NULL, " %s", "added_by_star_ex");
if (e->flags & MDB_FLAGS_BLOCKED)
print_string(PRINT_ANY, NULL, " %s", "blocked");
close_json_array(PRINT_JSON, NULL);
if (e->vid)
print_uint(PRINT_ANY, "vid", " vid %u", e->vid);
if (show_stats && tb && tb[MDBA_MDB_EATTR_TIMER]) {
__u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer, 1));
}
print_nl();
close_json_object();
}
static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
@ -126,21 +261,61 @@ static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
e = RTA_DATA(i);
parse_rtattr(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)));
parse_rtattr_flags(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)),
NLA_F_NESTED);
print_mdb_entry(f, ifindex, e, n, etb);
}
}
int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
static void print_mdb_entries(FILE *fp, struct nlmsghdr *n,
int ifindex, struct rtattr *mdb)
{
int rem = RTA_PAYLOAD(mdb);
struct rtattr *i;
for (i = RTA_DATA(mdb); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
br_print_mdb_entry(fp, ifindex, i, n);
}
static void print_router_entries(FILE *fp, struct nlmsghdr *n,
int ifindex, struct rtattr *router)
{
const char *brifname = ll_index_to_name(ifindex);
if (n->nlmsg_type == RTM_GETMDB) {
if (show_details)
br_print_router_ports(fp, router, brifname);
} else {
struct rtattr *i = RTA_DATA(router);
uint32_t *port_ifindex = RTA_DATA(i);
const char *port_name = ll_index_to_name(*port_ifindex);
if (is_json_context()) {
open_json_array(PRINT_JSON, brifname);
open_json_object(NULL);
print_string(PRINT_JSON, "port", NULL,
port_name);
close_json_object();
close_json_array(PRINT_JSON, NULL);
} else {
fprintf(fp, "router port dev %s master %s\n",
port_name, brifname);
}
}
}
static int __parse_mdb_nlmsg(struct nlmsghdr *n, struct rtattr **tb)
{
FILE *fp = arg;
struct br_port_msg *r = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[MDBA_MAX+1], *i;
if (n->nlmsg_type != RTM_GETMDB && n->nlmsg_type != RTM_NEWMDB && n->nlmsg_type != RTM_DELMDB) {
fprintf(stderr, "Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
if (n->nlmsg_type != RTM_GETMDB &&
n->nlmsg_type != RTM_NEWMDB &&
n->nlmsg_type != RTM_DELMDB) {
fprintf(stderr,
"Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
@ -157,32 +332,62 @@ int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
if (tb[MDBA_MDB]) {
int rem = RTA_PAYLOAD(tb[MDBA_MDB]);
return 1;
}
for (i = RTA_DATA(tb[MDBA_MDB]); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
br_print_mdb_entry(fp, r->ifindex, i, n);
}
static int print_mdbs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
if (tb[MDBA_ROUTER]) {
if (n->nlmsg_type == RTM_GETMDB) {
if (show_details)
br_print_router_ports(fp, tb[MDBA_ROUTER],
r->ifindex);
} else {
uint32_t *port_ifindex;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
i = RTA_DATA(tb[MDBA_ROUTER]);
port_ifindex = RTA_DATA(i);
if (n->nlmsg_type == RTM_DELMDB)
fprintf(fp, "Deleted ");
fprintf(fp, "router port dev %s master %s\n",
ll_index_to_name(*port_ifindex),
ll_index_to_name(r->ifindex));
}
}
if (tb[MDBA_MDB])
print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
fflush(fp);
return 0;
}
static int print_rtrs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (tb[MDBA_ROUTER])
print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
return 0;
}
int print_mdb_mon(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (n->nlmsg_type == RTM_DELMDB)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
if (tb[MDBA_MDB])
print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
if (tb[MDBA_ROUTER])
print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
return 0;
}
@ -207,27 +412,66 @@ static int mdb_show(int argc, char **argv)
}
if (filter_dev) {
filter_index = if_nametoindex(filter_dev);
if (filter_index == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n",
filter_dev);
return -1;
}
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETMDB) < 0) {
new_json_obj(json);
open_json_object(NULL);
/* get mdb entries */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
open_json_array(PRINT_JSON, "mdb");
if (rtnl_dump_filter(&rth, print_mdbs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_array(PRINT_JSON, NULL);
/* get router ports */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
open_json_object("router");
if (rtnl_dump_filter(&rth, print_rtrs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_object();
close_json_object();
delete_json_obj();
fflush(stdout);
return 0;
}
static int mdb_parse_grp(const char *grp, struct br_mdb_entry *e)
{
if (inet_pton(AF_INET, grp, &e->addr.u.ip4)) {
e->addr.proto = htons(ETH_P_IP);
return 0;
}
if (inet_pton(AF_INET6, grp, &e->addr.u.ip6)) {
e->addr.proto = htons(ETH_P_IPV6);
return 0;
}
if (ll_addr_a2n((char *)e->addr.u.mac_addr, sizeof(e->addr.u.mac_addr),
grp) == ETH_ALEN) {
e->addr.proto = 0;
return 0;
}
return -1;
}
static int mdb_modify(int cmd, int flags, int argc, char **argv)
{
struct {
@ -240,8 +484,8 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
.n.nlmsg_type = cmd,
.bpm.family = PF_BRIDGE,
};
char *d = NULL, *p = NULL, *grp = NULL, *src = NULL;
struct br_mdb_entry entry = {};
char *d = NULL, *p = NULL, *grp = NULL;
short vid = 0;
while (argc > 0) {
@ -262,6 +506,9 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
} else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG();
vid = atoi(*argv);
} else if (strcmp(*argv, "src") == 0) {
NEXT_ARG();
src = *argv;
} else {
if (matches(*argv, "help") == 0)
usage();
@ -275,30 +522,40 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
}
req.bpm.ifindex = ll_name_to_index(d);
if (req.bpm.ifindex == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n", d);
return -1;
}
if (!req.bpm.ifindex)
return nodev(d);
entry.ifindex = ll_name_to_index(p);
if (entry.ifindex == 0) {
fprintf(stderr, "Cannot find device \"%s\"\n", p);
if (!entry.ifindex)
return nodev(p);
if (mdb_parse_grp(grp, &entry)) {
fprintf(stderr, "Invalid address \"%s\"\n", grp);
return -1;
}
if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) {
if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) {
fprintf(stderr, "Invalid address \"%s\"\n", grp);
return -1;
} else
entry.addr.proto = htons(ETH_P_IPV6);
} else
entry.addr.proto = htons(ETH_P_IP);
entry.vid = vid;
addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
if (src) {
struct rtattr *nest = addattr_nest(&req.n, sizeof(req),
MDBA_SET_ENTRY_ATTRS);
struct in6_addr src_ip6;
__be32 src_ip4;
if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
nest->rta_type |= NLA_F_NESTED;
if (!inet_pton(AF_INET, src, &src_ip4)) {
if (!inet_pton(AF_INET6, src, &src_ip6)) {
fprintf(stderr, "Invalid source address \"%s\"\n", src);
return -1;
}
addattr_l(&req.n, sizeof(req), MDBE_ATTR_SOURCE, &src_ip6, sizeof(src_ip6));
} else {
addattr32(&req.n, sizeof(req), MDBE_ATTR_SOURCE, src_ip4);
}
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;

View File

@ -27,16 +27,15 @@
static void usage(void) __attribute__((noreturn));
int prefix_banner;
static int prefix_banner;
static void usage(void)
{
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | all]\n");
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | vlan | all]\n");
exit(-1);
}
static int accept_msg(const struct sockaddr_nl *who,
struct rtnl_ctrl_data *ctrl,
static int accept_msg(struct rtnl_ctrl_data *ctrl,
struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
@ -50,24 +49,30 @@ static int accept_msg(const struct sockaddr_nl *who,
if (prefix_banner)
fprintf(fp, "[LINK]");
return print_linkinfo(who, n, arg);
return print_linkinfo(n, arg);
case RTM_NEWNEIGH:
case RTM_DELNEIGH:
if (prefix_banner)
fprintf(fp, "[NEIGH]");
return print_fdb(who, n, arg);
return print_fdb(n, arg);
case RTM_NEWMDB:
case RTM_DELMDB:
if (prefix_banner)
fprintf(fp, "[MDB]");
return print_mdb(who, n, arg);
return print_mdb_mon(n, arg);
case NLMSG_TSTAMP:
print_nlmsg_timestamp(fp, n);
return 0;
case RTM_NEWVLAN:
case RTM_DELVLAN:
if (prefix_banner)
fprintf(fp, "[VLAN]");
return print_vlan_rtm(n, arg, true, false);
default:
return 0;
}
@ -80,6 +85,7 @@ int do_monitor(int argc, char **argv)
int llink = 0;
int lneigh = 0;
int lmdb = 0;
int lvlan = 0;
rtnl_close(&rth);
@ -96,8 +102,12 @@ int do_monitor(int argc, char **argv)
} else if (matches(*argv, "mdb") == 0) {
lmdb = 1;
groups = 0;
} else if (matches(*argv, "vlan") == 0) {
lvlan = 1;
groups = 0;
} else if (strcmp(*argv, "all") == 0) {
groups = ~RTMGRP_TC;
lvlan = 1;
prefix_banner = 1;
} else if (matches(*argv, "help") == 0) {
usage();
@ -135,6 +145,12 @@ int do_monitor(int argc, char **argv)
if (rtnl_open(&rth, groups) < 0)
exit(1);
if (lvlan && rtnl_add_nl_group(&rth, RTNLGRP_BRVLAN) < 0) {
fprintf(stderr, "Failed to add bridge vlan group to list\n");
exit(1);
}
ll_init_map(&rth);
if (rtnl_listen(&rth, accept_msg, stdout) < 0)

File diff suppressed because it is too large Load Diff

450
configure vendored
View File

@ -1,38 +1,28 @@
#! /bin/bash
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# This is not an autoconf generated configure
#
INCLUDE=${1:-"$PWD/include"}
INCLUDE="$PWD/include"
PREFIX="/usr"
LIBDIR="\${prefix}/lib"
# Output file which is input to Makefile
CONFIG=config.mk
# Make a temp directory in build tree.
TMPDIR=$(mktemp -d config.XXXXXX)
trap 'status=$?; rm -rf $TMPDIR; exit $status' EXIT HUP INT QUIT TERM
check_prog()
{
echo -n "$2"
command -v $1 >/dev/null 2>&1 && (echo "$3:=y" >> Config; echo "yes") || (echo "no"; return 1)
}
check_docs()
{
if check_prog latex " latex: " HAVE_LATEX; then
check_prog pdflatex " pdflatex: " HAVE_PDFLATEX || echo " WARNING: no PDF docs can be built from LaTeX files"
check_prog sgml2latex " sgml2latex: " HAVE_SGML2LATEX || echo " WARNING: no LaTeX files can be build from SGML files"
else
echo " WARNING: no docs can be built from LaTeX files"
fi
check_prog sgml2html " sgml2html: " HAVE_SGML2HTML || echo " WARNING: no HTML docs can be built from SGML"
}
check_toolchain()
{
: ${PKG_CONFIG:=pkg-config}
: ${AR=ar}
: ${CC=gcc}
echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
echo "AR:=${AR}" >>Config
echo "CC:=${CC}" >>Config
: ${YACC=bison}
echo "PKG_CONFIG:=${PKG_CONFIG}" >>$CONFIG
echo "AR:=${AR}" >>$CONFIG
echo "CC:=${CC}" >>$CONFIG
echo "YACC:=${YACC}" >>$CONFIG
}
check_atm()
@ -46,10 +36,8 @@ int main(int argc, char **argv) {
}
EOF
$CC -I$INCLUDE -o $TMPDIR/atmtest $TMPDIR/atmtest.c -latm >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "TC_CONFIG_ATM:=y" >>Config
if $CC -I$INCLUDE -o $TMPDIR/atmtest $TMPDIR/atmtest.c -latm >/dev/null 2>&1; then
echo "TC_CONFIG_ATM:=y" >>$CONFIG
echo yes
else
echo no
@ -57,6 +45,13 @@ EOF
rm -f $TMPDIR/atmtest.c $TMPDIR/atmtest
}
check_xtables()
{
if ! ${PKG_CONFIG} xtables --exists; then
echo "TC_CONFIG_NO_XT:=y" >>$CONFIG
fi
}
check_xt()
{
#check if we have xtables from iptables >= 1.4.5.
@ -80,9 +75,8 @@ int main(int argc, char **argv)
EOF
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL \
$(${PKG_CONFIG} xtables --cflags --libs) -ldl >/dev/null 2>&1
then
echo "TC_CONFIG_XT:=y" >>Config
$(${PKG_CONFIG} xtables --cflags --libs) -ldl >/dev/null 2>&1; then
echo "TC_CONFIG_XT:=y" >>$CONFIG
echo "using xtables"
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
@ -90,13 +84,10 @@ EOF
check_xt_old()
{
# bail if previous XT checks has already succeded.
if grep -q TC_CONFIG_XT Config
then
return
fi
# bail if previous XT checks has already succeeded.
grep -q TC_CONFIG_XT $CONFIG && return
#check if we dont need our internal header ..
#check if we don't need our internal header ..
cat >$TMPDIR/ipttest.c <<EOF
#include <xtables.h>
char *lib_dir;
@ -118,10 +109,8 @@ int main(int argc, char **argv) {
EOF
$CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "TC_CONFIG_XT_OLD:=y" >>Config
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1; then
echo "TC_CONFIG_XT_OLD:=y" >>$CONFIG
echo "using old xtables (no need for xt-internal.h)"
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
@ -129,11 +118,8 @@ EOF
check_xt_old_internal_h()
{
# bail if previous XT checks has already succeded.
if grep -q TC_CONFIG_XT Config
then
return
fi
# bail if previous XT checks has already succeeded.
grep -q TC_CONFIG_XT $CONFIG && return
#check if we need our own internal.h
cat >$TMPDIR/ipttest.c <<EOF
@ -157,20 +143,25 @@ int main(int argc, char **argv) {
}
EOF
$CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1
if [ $? -eq 0 ]
then
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1; then
echo "using old xtables with xt-internal.h"
echo "TC_CONFIG_XT_OLD_H:=y" >>Config
echo "TC_CONFIG_XT_OLD_H:=y" >>$CONFIG
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
}
check_lib_dir()
{
LIBDIR=$(echo $LIBDIR | sed "s|\${prefix}|$PREFIX|")
echo -n "lib directory: "
echo "$LIBDIR"
echo "LIBDIR:=$LIBDIR" >> $CONFIG
}
check_ipt()
{
if ! grep TC_CONFIG_XT Config > /dev/null
then
if ! grep TC_CONFIG_XT $CONFIG > /dev/null; then
echo "using iptables"
fi
}
@ -180,16 +171,16 @@ check_ipt_lib_dir()
IPT_LIB_DIR=$(${PKG_CONFIG} --variable=xtlibdir xtables)
if [ -n "$IPT_LIB_DIR" ]; then
echo $IPT_LIB_DIR
echo "IPT_LIB_DIR:=$IPT_LIB_DIR" >> Config
echo "IPT_LIB_DIR:=$IPT_LIB_DIR" >> $CONFIG
return
fi
for dir in /lib /usr/lib /usr/local/lib
do
for file in $dir/{xtables,iptables}/lib*t_*so ; do
for dir in /lib /usr/lib /usr/local/lib; do
for file in "xtables" "iptables"; do
file="$dir/$file/lib*t_*so"
if [ -f $file ]; then
echo ${file%/*}
echo "IPT_LIB_DIR:=${file%/*}" >> Config
echo "IPT_LIB_DIR:=${file%/*}" >> $CONFIG
return
fi
done
@ -207,17 +198,41 @@ int main(int argc, char **argv)
return 0;
}
EOF
$CC -I$INCLUDE -o $TMPDIR/setnstest $TMPDIR/setnstest.c >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "IP_CONFIG_SETNS:=y" >>Config
if $CC -I$INCLUDE -o $TMPDIR/setnstest $TMPDIR/setnstest.c >/dev/null 2>&1; then
echo "IP_CONFIG_SETNS:=y" >>$CONFIG
echo "yes"
echo "CFLAGS += -DHAVE_SETNS" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/setnstest.c $TMPDIR/setnstest
}
check_name_to_handle_at()
{
cat >$TMPDIR/name_to_handle_at_test.c <<EOF
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
struct file_handle *fhp;
int mount_id, flags, dirfd;
char *pathname;
name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/name_to_handle_at_test $TMPDIR/name_to_handle_at_test.c >/dev/null 2>&1; then
echo "yes"
echo "CFLAGS += -DHAVE_HANDLE_AT" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/name_to_handle_at_test.c $TMPDIR/name_to_handle_at_test
}
check_ipset()
{
cat >$TMPDIR/ipsettest.c <<EOF
@ -229,7 +244,7 @@ typedef unsigned short ip_set_id_t;
#include <linux/netfilter/xt_set.h>
struct xt_set_info info;
#if IPSET_PROTOCOL == 6
#if IPSET_PROTOCOL == 6 || IPSET_PROTOCOL == 7
int main(void)
{
return IPSET_MAXNAMELEN;
@ -239,9 +254,8 @@ int main(void)
#endif
EOF
if $CC -I$INCLUDE -o $TMPDIR/ipsettest $TMPDIR/ipsettest.c >/dev/null 2>&1
then
echo "TC_CONFIG_IPSET:=y" >>Config
if $CC -I$INCLUDE -o $TMPDIR/ipsettest $TMPDIR/ipsettest.c >/dev/null 2>&1; then
echo "TC_CONFIG_IPSET:=y" >>$CONFIG
echo "yes"
else
echo "no"
@ -251,34 +265,131 @@ EOF
check_elf()
{
cat >$TMPDIR/elftest.c <<EOF
#include <libelf.h>
#include <gelf.h>
int main(void)
{
Elf_Scn *scn;
GElf_Shdr shdr;
return elf_version(EV_CURRENT);
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/elftest $TMPDIR/elftest.c -lelf >/dev/null 2>&1
then
echo "TC_CONFIG_ELF:=y" >>Config
if ${PKG_CONFIG} libelf --exists; then
echo "HAVE_ELF:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_ELF' `${PKG_CONFIG} libelf --cflags` >> $CONFIG
echo 'LDLIBS += ' `${PKG_CONFIG} libelf --libs` >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/elftest.c $TMPDIR/elftest
}
have_libbpf_basic()
{
cat >$TMPDIR/libbpf_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
bpf_program__set_autoload(NULL, false);
bpf_map__ifindex(NULL);
bpf_map__set_pin_path(NULL, NULL);
bpf_object__open_file(NULL, NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_test $TMPDIR/libbpf_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_test.c $TMPDIR/libbpf_test
return $ret
}
have_libbpf_sec_name()
{
cat >$TMPDIR/libbpf_sec_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
void *ptr;
bpf_program__section_name(NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_sec_test $TMPDIR/libbpf_sec_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_sec_test.c $TMPDIR/libbpf_sec_test
return $ret
}
check_force_libbpf_on()
{
# if set LIBBPF_FORCE=on but no libbpf support, just exist the config
# process to make sure we don't build without libbpf.
if [ "$LIBBPF_FORCE" = on ]; then
echo " LIBBPF_FORCE=on set, but couldn't find a usable libbpf"
exit 1
fi
}
check_libbpf()
{
# if set LIBBPF_FORCE=off, disable libbpf entirely
if [ "$LIBBPF_FORCE" = off ]; then
echo "no"
return
fi
if ! ${PKG_CONFIG} libbpf --exists && [ -z "$LIBBPF_DIR" ] ; then
echo "no"
check_force_libbpf_on
return
fi
if [ $(uname -m) = x86_64 ]; then
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib64"
else
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib"
fi
if [ -n "$LIBBPF_DIR" ]; then
LIBBPF_CFLAGS="-I${LIBBPF_DIR}/usr/include"
LIBBPF_LDLIBS="${LIBBPF_LIBDIR}/libbpf.a -lz -lelf"
LIBBPF_VERSION=$(PKG_CONFIG_LIBDIR=${LIBBPF_LIBDIR}/pkgconfig ${PKG_CONFIG} libbpf --modversion)
else
LIBBPF_CFLAGS=$(${PKG_CONFIG} libbpf --cflags)
LIBBPF_LDLIBS=$(${PKG_CONFIG} libbpf --libs)
LIBBPF_VERSION=$(${PKG_CONFIG} libbpf --modversion)
fi
if ! have_libbpf_basic; then
echo "no"
echo " libbpf version $LIBBPF_VERSION is too low, please update it to at least 0.1.0"
check_force_libbpf_on
return
else
echo "HAVE_LIBBPF:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF ' $LIBBPF_CFLAGS >> $CONFIG
echo "CFLAGS += -DLIBBPF_VERSION=\\\"$LIBBPF_VERSION\\\"" >> $CONFIG
echo 'LDLIBS += ' $LIBBPF_LDLIBS >> $CONFIG
if [ -z "$LIBBPF_DIR" ]; then
echo "CFLAGS += -DLIBBPF_DYNAMIC" >> $CONFIG
fi
fi
# bpf_program__title() is deprecated since libbpf 0.2.0, use
# bpf_program__section_name() instead if we support
if have_libbpf_sec_name; then
echo "HAVE_LIBBPF_SECTION_NAME:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF_SECTION_NAME ' >> $CONFIG
fi
echo "yes"
echo " libbpf version $LIBBPF_VERSION"
}
check_selinux()
# SELinux is a compile time option in the ss utility
{
if ${PKG_CONFIG} libselinux --exists
then
echo "HAVE_SELINUX:=y" >>Config
if ${PKG_CONFIG} libselinux --exists; then
echo "HAVE_SELINUX:=y" >>$CONFIG
echo "yes"
echo 'LDLIBS +=' `${PKG_CONFIG} --libs libselinux` >>$CONFIG
echo 'CFLAGS += -DHAVE_SELINUX' `${PKG_CONFIG} --cflags libselinux` >>$CONFIG
else
echo "no"
fi
@ -286,10 +397,12 @@ check_selinux()
check_mnl()
{
if ${PKG_CONFIG} libmnl --exists
then
echo "HAVE_MNL:=y" >>Config
if ${PKG_CONFIG} libmnl --exists; then
echo "HAVE_MNL:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_LIBMNL' `${PKG_CONFIG} libmnl --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libmnl --libs` >> $CONFIG
else
echo "no"
fi
@ -306,10 +419,8 @@ int main(int argc, char **argv) {
return 0;
}
EOF
$CC -I$INCLUDE -o $TMPDIR/dbtest $TMPDIR/dbtest.c -ldb >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "HAVE_BERKELEY_DB:=y" >>Config
if $CC -I$INCLUDE -o $TMPDIR/dbtest $TMPDIR/dbtest.c -ldb >/dev/null 2>&1; then
echo "HAVE_BERKELEY_DB:=y" >>$CONFIG
echo "yes"
else
echo "no"
@ -317,6 +428,44 @@ EOF
rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
}
check_strlcpy()
{
cat >$TMPDIR/strtest.c <<EOF
#include <string.h>
int main(int argc, char **argv) {
char dst[10];
strlcpy(dst, "test", sizeof(dst));
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/strtest $TMPDIR/strtest.c >/dev/null 2>&1; then
echo "no"
else
if ${PKG_CONFIG} libbsd --exists; then
echo 'CFLAGS += -DHAVE_LIBBSD' `${PKG_CONFIG} libbsd --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libbsd --libs` >> $CONFIG
echo "no"
else
echo 'CFLAGS += -DNEED_STRLCPY' >>$CONFIG
echo "yes"
fi
fi
rm -f $TMPDIR/strtest.c $TMPDIR/strtest
}
check_cap()
{
if ${PKG_CONFIG} libcap --exists; then
echo "HAVE_CAP:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_LIBCAP' `${PKG_CONFIG} libcap --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libcap --libs` >> $CONFIG
else
echo "no"
fi
}
quiet_config()
{
cat <<EOF
@ -343,8 +492,78 @@ endif
EOF
}
echo "# Generated config based on" $INCLUDE >Config
quiet_config >> Config
usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
--include_dir <dir> Path to iproute2 include dir
--libdir <dir> Path to iproute2 lib dir
--libbpf_dir <dir> Path to libbpf DESTDIR
--libbpf_force <on|off> Enable/disable libbpf by force. Available options:
on: require link against libbpf, quit config if no libbpf support
off: disable libbpf probing
--prefix <dir> Path prefix of the lib files to install
-h | --help Show this usage info
EOF
exit $1
}
# Compat with the old INCLUDE path setting method.
if [ $# -eq 1 ] && [ "$(echo $1 | cut -c 1)" != '-' ]; then
INCLUDE="$1"
else
while [ "$#" -gt 0 ]; do
case "$1" in
--include_dir)
shift
INCLUDE="$1" ;;
--include_dir=*)
INCLUDE="${1#*=}" ;;
--libdir)
shift
LIBDIR="$1" ;;
--libdir=*)
LIBDIR="${1#*=}" ;;
--libbpf_dir)
shift
LIBBPF_DIR="$1" ;;
--libbpf_dir=*)
LIBBPF_DIR="${1#*=}" ;;
--libbpf_force)
shift
LIBBPF_FORCE="$1" ;;
--libbpf_force=*)
LIBBPF_FORCE="${1#*=}" ;;
--prefix)
shift
PREFIX="$1" ;;
--prefix=*)
PREFIX="${1#*=}" ;;
-h | --help)
usage 0 ;;
--*)
;;
*)
usage 1 ;;
esac
[ "$#" -gt 0 ] && shift
done
fi
[ -d "$INCLUDE" ] || usage 1
if [ "${LIBBPF_DIR-unused}" != "unused" ]; then
[ -d "$LIBBPF_DIR" ] || usage 1
fi
if [ "${LIBBPF_FORCE-unused}" != "unused" ]; then
if [ "$LIBBPF_FORCE" != 'on' ] && [ "$LIBBPF_FORCE" != 'off' ]; then
usage 1
fi
fi
[ -z "$PREFIX" ] && usage 1
[ -z "$LIBDIR" ] && usage 1
echo "# Generated config based on" $INCLUDE >$CONFIG
quiet_config >> $CONFIG
check_toolchain
@ -353,25 +572,37 @@ echo "TC schedulers"
echo -n " ATM "
check_atm
echo -n " IPT "
check_xt
check_xt_old
check_xt_old_internal_h
check_ipt
check_xtables
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n " IPT "
check_xt
check_xt_old
check_xt_old_internal_h
check_ipt
echo -n " IPSET "
check_ipset
echo -n " IPSET "
check_ipset
fi
echo
echo -n "iptables modules directory: "
check_ipt_lib_dir
check_lib_dir
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n "iptables modules directory: "
check_ipt_lib_dir
fi
echo -n "libc has setns: "
check_setns
echo -n "libc has name_to_handle_at: "
check_name_to_handle_at
echo -n "SELinux support: "
check_selinux
echo -n "libbpf support: "
check_libbpf
echo -n "ELF support: "
check_elf
@ -381,11 +612,12 @@ check_mnl
echo -n "Berkeley DB: "
check_berkeley_db
echo
echo -n "docs:"
check_docs
echo
echo -n "need for strlcpy: "
check_strlcpy
echo >> Config
echo "%.o: %.c" >> Config
echo ' $(QUIET_CC)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) -c -o $@ $<' >> Config
echo -n "libcap support: "
check_cap
echo >> $CONFIG
echo "%.o: %.c" >> $CONFIG
echo ' $(QUIET_CC)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(CPPFLAGS) -c -o $@ $<' >> $CONFIG

1
dcb/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
dcb

31
dcb/Makefile Normal file
View File

@ -0,0 +1,31 @@
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DCBOBJ = dcb.o \
dcb_app.o \
dcb_buffer.o \
dcb_dcbx.o \
dcb_ets.o \
dcb_maxrate.o \
dcb_pfc.o
TARGETS += dcb
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
dcb: $(DCBOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DCBOBJ) $(TARGETS)

611
dcb/dcb.c Normal file
View File

@ -0,0 +1,611 @@
// SPDX-License-Identifier: GPL-2.0+
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include <libmnl/libmnl.h>
#include <getopt.h>
#include "dcb.h"
#include "mnl_utils.h"
#include "namespace.h"
#include "utils.h"
#include "version.h"
static int dcb_init(struct dcb *dcb)
{
dcb->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (dcb->buf == NULL) {
perror("Netlink buffer allocation");
return -1;
}
dcb->nl = mnlu_socket_open(NETLINK_ROUTE);
if (dcb->nl == NULL) {
perror("Open netlink socket");
goto err_socket_open;
}
new_json_obj_plain(dcb->json_output);
return 0;
err_socket_open:
free(dcb->buf);
return -1;
}
static void dcb_fini(struct dcb *dcb)
{
delete_json_obj_plain();
mnl_socket_close(dcb->nl);
free(dcb->buf);
}
static struct dcb *dcb_alloc(void)
{
struct dcb *dcb;
dcb = calloc(1, sizeof(*dcb));
if (!dcb)
return NULL;
return dcb;
}
static void dcb_free(struct dcb *dcb)
{
free(dcb);
}
struct dcb_get_attribute {
struct dcb *dcb;
int attr;
void *payload;
__u16 payload_len;
};
static int dcb_get_attribute_attr_ieee_cb(const struct nlattr *attr, void *data)
{
struct dcb_get_attribute *ga = data;
if (mnl_attr_get_type(attr) != ga->attr)
return MNL_CB_OK;
ga->payload = mnl_attr_get_payload(attr);
ga->payload_len = mnl_attr_get_payload_len(attr);
return MNL_CB_STOP;
}
static int dcb_get_attribute_attr_cb(const struct nlattr *attr, void *data)
{
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE)
return MNL_CB_OK;
return mnl_attr_parse_nested(attr, dcb_get_attribute_attr_ieee_cb, data);
}
static int dcb_get_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_get_attribute_attr_cb, data);
}
static int dcb_get_attribute_bare_cb(const struct nlmsghdr *nlh, void *data)
{
/* Bare attributes (e.g. DCB_ATTR_DCBX) are not wrapped inside an IEEE
* container, so this does not have to go through unpacking in
* dcb_get_attribute_attr_cb().
*/
return mnl_attr_parse(nlh, sizeof(struct dcbmsg),
dcb_get_attribute_attr_ieee_cb, data);
}
struct dcb_set_attribute_response {
int response_attr;
};
static int dcb_set_attribute_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_set_attribute_response *resp = data;
uint16_t len;
uint8_t err;
if (mnl_attr_get_type(attr) != resp->response_attr)
return MNL_CB_OK;
len = mnl_attr_get_payload_len(attr);
if (len != 1) {
fprintf(stderr, "Response attribute expected to have size 1, not %d\n", len);
return MNL_CB_ERROR;
}
err = mnl_attr_get_u8(attr);
if (err) {
fprintf(stderr, "Error when attempting to set attribute: %s\n",
strerror(err));
return MNL_CB_ERROR;
}
return MNL_CB_STOP;
}
static int dcb_set_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_set_attribute_attr_cb, data);
}
static int dcb_talk(struct dcb *dcb, struct nlmsghdr *nlh, mnl_cb_t cb, void *data)
{
int ret;
ret = mnl_socket_sendto(dcb->nl, nlh, nlh->nlmsg_len);
if (ret < 0) {
perror("mnl_socket_sendto");
return -1;
}
return mnlu_socket_recv_run(dcb->nl, nlh->nlmsg_seq, dcb->buf, MNL_SOCKET_BUFFER_SIZE,
cb, data);
}
static struct nlmsghdr *dcb_prepare(struct dcb *dcb, const char *dev,
uint32_t nlmsg_type, uint8_t dcb_cmd)
{
struct dcbmsg dcbm = {
.cmd = dcb_cmd,
};
struct nlmsghdr *nlh;
nlh = mnlu_msg_prepare(dcb->buf, nlmsg_type, NLM_F_REQUEST, &dcbm, sizeof(dcbm));
mnl_attr_put_strz(nlh, DCB_ATTR_IFNAME, dev);
return nlh;
}
static int __dcb_get_attribute(struct dcb *dcb, int command,
const char *dev, int attr,
void **payload_p, __u16 *payload_len_p,
int (*get_attribute_cb)(const struct nlmsghdr *nlh,
void *data))
{
struct dcb_get_attribute ga;
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_GETDCB, command);
ga = (struct dcb_get_attribute) {
.dcb = dcb,
.attr = attr,
.payload = NULL,
};
ret = dcb_talk(dcb, nlh, get_attribute_cb, &ga);
if (ret) {
perror("Attribute read");
return ret;
}
if (ga.payload == NULL) {
perror("Attribute not found");
return -ENOENT;
}
*payload_p = ga.payload;
*payload_len_p = ga.payload_len;
return 0;
}
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, DCB_CMD_IEEE_GET, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_cb);
}
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, cmd, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_bare_cb);
}
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr, void *data, size_t data_len)
{
__u16 payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, attr, &payload, &payload_len);
if (ret)
return ret;
if (payload_len != data_len) {
fprintf(stderr, "Wrong len %d, expected %zd\n", payload_len, data_len);
return -EINVAL;
}
memcpy(data, payload, data_len);
return 0;
}
static int __dcb_set_attribute(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *, struct nlmsghdr *, void *),
void *data, int response_attr)
{
struct dcb_set_attribute_response resp = {
.response_attr = response_attr,
};
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_SETDCB, command);
ret = cb(dcb, nlh, data);
if (ret)
return ret;
ret = dcb_talk(dcb, nlh, dcb_set_attribute_cb, &resp);
if (ret) {
perror("Attribute write");
return ret;
}
return 0;
}
struct dcb_set_attribute_ieee_cb {
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data);
void *data;
};
static int dcb_set_attribute_ieee_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute_ieee_cb *ieee_data = data;
struct nlattr *nest;
int ret;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE);
ret = ieee_data->cb(dcb, nlh, ieee_data->data);
if (ret)
return ret;
mnl_attr_nest_end(nlh, nest);
return 0;
}
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data)
{
struct dcb_set_attribute_ieee_cb ieee_data = {
.cb = cb,
.data = data,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_ieee_cb, &ieee_data,
DCB_ATTR_IEEE);
}
struct dcb_set_attribute {
int attr;
const void *data;
size_t data_len;
};
static int dcb_set_attribute_put(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute *dsa = data;
mnl_attr_put(nlh, dsa->attr, dsa->data_len, dsa->data);
return 0;
}
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr, const void *data, size_t data_len)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return dcb_set_attribute_va(dcb, DCB_CMD_IEEE_SET, dev,
&dcb_set_attribute_put, &dsa);
}
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_put, &dsa, response_attr);
}
void dcb_print_array_u8(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%d ", i);
print_uint(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_u64(const __u64 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%" PRIu64 " ", i);
print_u64(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_on_off(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_on_off(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < array_size; i++) {
__u8 emt = array[i];
snprintf(b, sizeof(b), "%zd:%%s ", i);
if (emt < kw_size && kw[emt])
print_string(PRINT_ANY, NULL, b, kw[emt]);
else
print_string(PRINT_ANY, NULL, b, "???");
}
}
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t))
{
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
print_array(array, size);
close_json_array(PRINT_JSON, json_name);
}
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data)
{
bool is_all = key == (__u32) -1;
if (!is_all && key > max_key) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%d\n",
what_key, what_value, what_key, max_key);
return -EINVAL;
}
if (value > max_value) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%llu\n",
what_key, what_value, what_value, max_value);
return -EINVAL;
}
if (is_all) {
for (key = 0; key <= max_key; key++)
set_array(key, value, set_array_data);
} else {
set_array(key, value, set_array_data);
}
return 0;
}
void dcb_set_u8(__u32 key, __u64 value, void *data)
{
__u8 *array = data;
array[key] = value;
}
void dcb_set_u32(__u32 key, __u64 value, void *data)
{
__u32 *array = data;
array[key] = value;
}
void dcb_set_u64(__u32 key, __u64 value, void *data)
{
__u64 *array = data;
array[key] = value;
}
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void))
{
const char *dev;
if (!argc || matches(*argv, "help") == 0) {
help();
return 0;
} else if (matches(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
if (check_ifname(dev)) {
invarg("not a valid ifname", *argv);
return -EINVAL;
}
NEXT_ARG_FWD();
return and_then(dcb, dev, argc, argv);
} else {
fprintf(stderr, "Expected `dev DEV', not `%s'", *argv);
help();
return -EINVAL;
}
}
static void dcb_help(void)
{
fprintf(stderr,
"Usage: dcb [ OPTIONS ] OBJECT { COMMAND | help }\n"
" dcb [ -f | --force ] { -b | --batch } filename [ -n | --netns ] netnsname\n"
"where OBJECT := { app | buffer | dcbx | ets | maxrate | pfc }\n"
" OPTIONS := [ -V | --Version | -i | --iec | -j | --json\n"
" | -N | --Numeric | -p | --pretty\n"
" | -s | --statistics | -v | --verbose]\n");
}
static int dcb_cmd(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_help();
return 0;
} else if (matches(*argv, "app") == 0) {
return dcb_cmd_app(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "buffer") == 0) {
return dcb_cmd_buffer(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "dcbx") == 0) {
return dcb_cmd_dcbx(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "ets") == 0) {
return dcb_cmd_ets(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "maxrate") == 0) {
return dcb_cmd_maxrate(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "pfc") == 0) {
return dcb_cmd_pfc(dcb, argc - 1, argv + 1);
}
fprintf(stderr, "Object \"%s\" is unknown\n", *argv);
return -ENOENT;
}
static int dcb_batch_cmd(int argc, char *argv[], void *data)
{
struct dcb *dcb = data;
return dcb_cmd(dcb, argc, argv);
}
static int dcb_batch(struct dcb *dcb, const char *name, bool force)
{
return do_batch(name, force, dcb_batch_cmd, dcb);
}
int main(int argc, char **argv)
{
static const struct option long_options[] = {
{ "Version", no_argument, NULL, 'V' },
{ "force", no_argument, NULL, 'f' },
{ "batch", required_argument, NULL, 'b' },
{ "iec", no_argument, NULL, 'i' },
{ "json", no_argument, NULL, 'j' },
{ "Numeric", no_argument, NULL, 'N' },
{ "pretty", no_argument, NULL, 'p' },
{ "statistics", no_argument, NULL, 's' },
{ "netns", required_argument, NULL, 'n' },
{ "help", no_argument, NULL, 'h' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
bool force = false;
struct dcb *dcb;
int opt;
int err;
int ret;
dcb = dcb_alloc();
if (!dcb) {
fprintf(stderr, "Failed to allocate memory for dcb\n");
return EXIT_FAILURE;
}
while ((opt = getopt_long(argc, argv, "b:fhijn:psvNV",
long_options, NULL)) >= 0) {
switch (opt) {
case 'V':
printf("dcb utility, iproute2-%s\n", version);
ret = EXIT_SUCCESS;
goto dcb_free;
case 'f':
force = true;
break;
case 'b':
batch_file = optarg;
break;
case 'j':
dcb->json_output = true;
break;
case 'N':
dcb->numeric = true;
break;
case 'p':
pretty = true;
break;
case 's':
dcb->stats = true;
break;
case 'n':
if (netns_switch(optarg)) {
ret = EXIT_FAILURE;
goto dcb_free;
}
break;
case 'i':
dcb->use_iec = true;
break;
case 'h':
dcb_help();
ret = EXIT_SUCCESS;
goto dcb_free;
default:
fprintf(stderr, "Unknown option.\n");
dcb_help();
ret = EXIT_FAILURE;
goto dcb_free;
}
}
argc -= optind;
argv += optind;
err = dcb_init(dcb);
if (err) {
ret = EXIT_FAILURE;
goto dcb_free;
}
if (batch_file)
err = dcb_batch(dcb, batch_file, force);
else
err = dcb_cmd(dcb, argc, argv);
if (err) {
ret = EXIT_FAILURE;
goto dcb_fini;
}
ret = EXIT_SUCCESS;
dcb_fini:
dcb_fini(dcb);
dcb_free:
dcb_free(dcb);
return ret;
}

81
dcb/dcb.h Normal file
View File

@ -0,0 +1,81 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __DCB_H__
#define __DCB_H__ 1
#include <libmnl/libmnl.h>
#include <stdbool.h>
#include <stddef.h>
/* dcb.c */
struct dcb {
char *buf;
struct mnl_socket *nl;
bool json_output;
bool stats;
bool use_iec;
bool numeric;
};
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data);
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void));
void dcb_set_u8(__u32 key, __u64 value, void *data);
void dcb_set_u32(__u32 key, __u64 value, void *data);
void dcb_set_u64(__u32 key, __u64 value, void *data);
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr,
void *data, size_t data_len);
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr,
const void *data, size_t data_len);
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data);
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr);
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t));
void dcb_print_array_u8(const __u8 *array, size_t size);
void dcb_print_array_u64(const __u64 *array, size_t size);
void dcb_print_array_on_off(const __u8 *array, size_t size);
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size);
/* dcb_app.c */
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv);
/* dcb_buffer.c */
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv);
/* dcb_dcbx.c */
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv);
/* dcb_ets.c */
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv);
/* dcb_maxrate.c */
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv);
/* dcb_pfc.c */
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv);
#endif /* __DCB_H__ */

795
dcb/dcb_app.c Normal file
View File

@ -0,0 +1,795 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <libmnl/libmnl.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
#include "rt_names.h"
static void dcb_app_help_add(void)
{
fprintf(stderr,
"Usage: dcb app { add | del | replace } dev STRING\n"
" [ default-prio PRIO ]\n"
" [ ethtype-prio ET:PRIO ]\n"
" [ stream-port-prio PORT:PRIO ]\n"
" [ dgram-port-prio PORT:PRIO ]\n"
" [ port-prio PORT:PRIO ]\n"
" [ dscp-prio INTEGER:PRIO ]\n"
"\n"
" where PRIO := { 0 .. 7 }\n"
" ET := { 0x600 .. 0xffff }\n"
" PORT := { 1 .. 65535 }\n"
" DSCP := { 0 .. 63 }\n"
"\n"
);
}
static void dcb_app_help_show_flush(void)
{
fprintf(stderr,
"Usage: dcb app { show | flush } dev STRING\n"
" [ default-prio ]\n"
" [ ethtype-prio ]\n"
" [ stream-port-prio ]\n"
" [ dgram-port-prio ]\n"
" [ port-prio ]\n"
" [ dscp-prio ]\n"
"\n"
);
}
static void dcb_app_help(void)
{
fprintf(stderr,
"Usage: dcb app help\n"
"\n"
);
dcb_app_help_show_flush();
dcb_app_help_add();
}
struct dcb_app_table {
struct dcb_app *apps;
size_t n_apps;
};
static void dcb_app_table_fini(struct dcb_app_table *tab)
{
free(tab->apps);
}
static int dcb_app_table_push(struct dcb_app_table *tab, struct dcb_app *app)
{
struct dcb_app *apps = realloc(tab->apps, (tab->n_apps + 1) * sizeof(*tab->apps));
if (apps == NULL) {
perror("Cannot allocate APP table");
return -ENOMEM;
}
tab->apps = apps;
tab->apps[tab->n_apps++] = *app;
return 0;
}
static void dcb_app_table_remove_existing(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol &&
aa->priority == ab->priority) {
found = true;
break;
}
}
if (!found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static void dcb_app_table_remove_replaced(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool present = false;
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol)
present = true;
else
continue;
if (aa->priority == ab->priority) {
found = true;
break;
}
}
/* Entries that remain in A will be removed, so keep in the
* table only APP entries whose sel/pid is mentioned in B,
* but that do not have the full sel/pid/prio match.
*/
if (present && !found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static int dcb_app_table_copy(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t i;
int ret;
for (i = 0; i < b->n_apps; i++) {
ret = dcb_app_table_push(a, &b->apps[i]);
if (ret != 0)
return ret;
}
return 0;
}
static int dcb_app_cmp(const struct dcb_app *a, const struct dcb_app *b)
{
if (a->protocol < b->protocol)
return -1;
if (a->protocol > b->protocol)
return 1;
return a->priority - b->priority;
}
static int dcb_app_cmp_cb(const void *a, const void *b)
{
return dcb_app_cmp(a, b);
}
static void dcb_app_table_sort(struct dcb_app_table *tab)
{
qsort(tab->apps, tab->n_apps, sizeof(*tab->apps), dcb_app_cmp_cb);
}
struct dcb_app_parse_mapping {
__u8 selector;
struct dcb_app_table *tab;
int err;
};
static void dcb_app_parse_mapping_cb(__u32 key, __u64 value, void *data)
{
struct dcb_app_parse_mapping *pm = data;
struct dcb_app app = {
.selector = pm->selector,
.priority = value,
.protocol = key,
};
if (pm->err)
return;
pm->err = dcb_app_table_push(pm->tab, &app);
}
static int dcb_app_parse_mapping_ethtype_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key < 0x600) {
fprintf(stderr, "Protocol IDs < 0x600 are reserved for EtherType\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("ETHTYPE", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_dscp(__u32 *key, const char *arg)
{
if (parse_mapping_num_all(key, arg) == 0)
return 0;
if (rtnl_dsfield_a2n(key, arg) != 0)
return -1;
if (*key & 0x03) {
fprintf(stderr, "The values `%s' uses non-DSCP bits.\n", arg);
return -1;
}
/* Unshift the value to convert it from dsfield to DSCP. */
*key >>= 2;
return 0;
}
static int dcb_app_parse_mapping_dscp_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("DSCP", key, 63,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_mapping_port_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key == 0) {
fprintf(stderr, "Port ID of 0 is invalid\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("PORT", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_default_prio(int *argcp, char ***argvp, struct dcb_app_table *tab)
{
int argc = *argcp;
char **argv = *argvp;
int ret = 0;
while (argc > 0) {
struct dcb_app app;
__u8 prio;
if (get_u8(&prio, *argv, 0)) {
ret = 1;
break;
}
app = (struct dcb_app){
.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE,
.protocol = 0,
.priority = prio,
};
ret = dcb_app_table_push(tab, &app);
if (ret != 0)
break;
argc--, argv++;
}
*argcp = argc;
*argvp = argv;
return ret;
}
static bool dcb_app_is_ethtype(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol != 0;
}
static bool dcb_app_is_default(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol == 0;
}
static bool dcb_app_is_dscp(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DSCP;
}
static bool dcb_app_is_stream_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_STREAM;
}
static bool dcb_app_is_dgram_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DGRAM;
}
static bool dcb_app_is_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ANY;
}
static int dcb_app_print_key_dec(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static int dcb_app_print_key_hex(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%x:", protocol);
}
static int dcb_app_print_key_dscp(__u16 protocol)
{
const char *name = rtnl_dsfield_get_name(protocol << 2);
if (!is_json_context() && name != NULL)
return print_string(PRINT_FP, NULL, "%s:", name);
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static void dcb_app_print_filtered(const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *),
int (*print_key)(__u16 protocol),
const char *json_name,
const char *fp_name)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
struct dcb_app *app = &tab->apps[i];
if (!filter(app))
continue;
if (first) {
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
first = false;
}
open_json_array(PRINT_JSON, NULL);
print_key(app->protocol);
print_uint(PRINT_ANY, NULL, "%d ", app->priority);
close_json_array(PRINT_JSON, NULL);
}
if (!first) {
close_json_array(PRINT_JSON, json_name);
print_nl();
}
}
static void dcb_app_print_ethtype_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_ethtype, dcb_app_print_key_hex,
"ethtype_prio", "ethtype-prio");
}
static void dcb_app_print_dscp_prio(const struct dcb *dcb,
const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dscp,
dcb->numeric ? dcb_app_print_key_dec
: dcb_app_print_key_dscp,
"dscp_prio", "dscp-prio");
}
static void dcb_app_print_stream_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_stream_port, dcb_app_print_key_dec,
"stream_port_prio", "stream-port-prio");
}
static void dcb_app_print_dgram_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dgram_port, dcb_app_print_key_dec,
"dgram_port_prio", "dgram-port-prio");
}
static void dcb_app_print_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_port, dcb_app_print_key_dec,
"port_prio", "port-prio");
}
static void dcb_app_print_default_prio(const struct dcb_app_table *tab)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
if (!dcb_app_is_default(&tab->apps[i]))
continue;
if (first) {
open_json_array(PRINT_JSON, "default_prio");
print_string(PRINT_FP, NULL, "default-prio ", NULL);
first = false;
}
print_uint(PRINT_ANY, NULL, "%d ", tab->apps[i].priority);
}
if (!first) {
close_json_array(PRINT_JSON, "default_prio");
print_nl();
}
}
static void dcb_app_print(const struct dcb *dcb, const struct dcb_app_table *tab)
{
dcb_app_print_ethtype_prio(tab);
dcb_app_print_default_prio(tab);
dcb_app_print_dscp_prio(dcb, tab);
dcb_app_print_stream_port_prio(tab);
dcb_app_print_dgram_port_prio(tab);
dcb_app_print_port_prio(tab);
}
static int dcb_app_get_table_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_app_table *tab = data;
struct dcb_app *app;
int ret;
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE_APP) {
fprintf(stderr, "Unknown attribute in DCB_ATTR_IEEE_APP_TABLE: %d\n",
mnl_attr_get_type(attr));
return MNL_CB_OK;
}
if (mnl_attr_get_payload_len(attr) < sizeof(struct dcb_app)) {
fprintf(stderr, "DCB_ATTR_IEEE_APP payload expected to have size %zd, not %d\n",
sizeof(struct dcb_app), mnl_attr_get_payload_len(attr));
return MNL_CB_OK;
}
app = mnl_attr_get_payload(attr);
ret = dcb_app_table_push(tab, app);
if (ret != 0)
return MNL_CB_ERROR;
return MNL_CB_OK;
}
static int dcb_app_get(struct dcb *dcb, const char *dev, struct dcb_app_table *tab)
{
uint16_t payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, DCB_ATTR_IEEE_APP_TABLE, &payload, &payload_len);
if (ret != 0)
return ret;
ret = mnl_attr_parse_payload(payload, payload_len, dcb_app_get_table_attr_cb, tab);
if (ret != MNL_CB_OK)
return -EINVAL;
return 0;
}
struct dcb_app_add_del {
const struct dcb_app_table *tab;
bool (*filter)(const struct dcb_app *app);
};
static int dcb_app_add_del_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_app_add_del *add_del = data;
struct nlattr *nest;
size_t i;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE_APP_TABLE);
for (i = 0; i < add_del->tab->n_apps; i++) {
const struct dcb_app *app = &add_del->tab->apps[i];
if (add_del->filter == NULL || add_del->filter(app))
mnl_attr_put(nlh, DCB_ATTR_IEEE_APP, sizeof(*app), app);
}
mnl_attr_nest_end(nlh, nest);
return 0;
}
static int dcb_app_add_del(struct dcb *dcb, const char *dev, int command,
const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *))
{
struct dcb_app_add_del add_del = {
.tab = tab,
.filter = filter,
};
if (tab->n_apps == 0)
return 0;
return dcb_set_attribute_va(dcb, command, dev, dcb_app_add_del_cb, &add_del);
}
static int dcb_cmd_app_parse_add_del(struct dcb *dcb, const char *dev,
int argc, char **argv, struct dcb_app_table *tab)
{
struct dcb_app_parse_mapping pm = {
.tab = tab,
};
int ret;
if (!argc) {
dcb_app_help_add();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_add();
return 0;
} else if (matches(*argv, "ethtype-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_ethtype_prio,
&pm);
} else if (matches(*argv, "default-prio") == 0) {
NEXT_ARG();
ret = dcb_app_parse_default_prio(&argc, &argv, pm.tab);
if (ret != 0) {
fprintf(stderr, "Invalid default priority %s\n", *argv);
return ret;
}
} else if (matches(*argv, "dscp-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DSCP;
ret = parse_mapping_gen(&argc, &argv,
&dcb_app_parse_dscp,
&dcb_app_parse_mapping_dscp_prio,
&pm);
} else if (matches(*argv, "stream-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_STREAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "dgram-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ANY;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_add();
return -EINVAL;
}
if (ret != 0) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
if (pm.err)
return pm.err;
} while (argc > 0);
return 0;
}
static int dcb_cmd_app_add(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_del(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
dcb_app_table_sort(&tab);
open_json_object(NULL);
if (!argc) {
dcb_app_print(dcb, &tab);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
dcb_app_print_ethtype_prio(&tab);
} else if (matches(*argv, "dscp-prio") == 0) {
dcb_app_print_dscp_prio(dcb, &tab);
} else if (matches(*argv, "stream-port-prio") == 0) {
dcb_app_print_stream_port_prio(&tab);
} else if (matches(*argv, "dgram-port-prio") == 0) {
dcb_app_print_dgram_port_prio(&tab);
} else if (matches(*argv, "port-prio") == 0) {
dcb_app_print_port_prio(&tab);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_flush(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
if (!argc) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_ethtype);
if (ret != 0)
goto out;
} else if (matches(*argv, "default-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_default);
if (ret != 0)
goto out;
} else if (matches(*argv, "dscp-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_dscp);
if (ret != 0)
goto out;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_replace(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table orig = {};
struct dcb_app_table tab = {};
struct dcb_app_table new = {};
int ret;
ret = dcb_app_get(dcb, dev, &orig);
if (ret != 0)
return ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
goto out;
/* Attempts to add an existing entry would be rejected, so drop
* these entries from tab.
*/
ret = dcb_app_table_copy(&new, &tab);
if (ret != 0)
goto out;
dcb_app_table_remove_existing(&new, &orig);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &new, NULL);
if (ret != 0) {
fprintf(stderr, "Could not add new APP entries\n");
goto out;
}
/* Remove the obsolete entries. */
dcb_app_table_remove_replaced(&orig, &tab);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &orig, NULL);
if (ret != 0) {
fprintf(stderr, "Could not remove replaced APP entries\n");
goto out;
}
out:
dcb_app_table_fini(&new);
dcb_app_table_fini(&tab);
dcb_app_table_fini(&orig);
return 0;
}
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_app_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_show, dcb_app_help_show_flush);
} else if (matches(*argv, "flush") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_flush, dcb_app_help_show_flush);
} else if (matches(*argv, "add") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_add, dcb_app_help_add);
} else if (matches(*argv, "del") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_del, dcb_app_help_add);
} else if (matches(*argv, "replace") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_replace, dcb_app_help_add);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help();
return -EINVAL;
}
}

235
dcb/dcb_buffer.c Normal file
View File

@ -0,0 +1,235 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_buffer_help_set(void)
{
fprintf(stderr,
"Usage: dcb buffer set dev STRING\n"
" [ prio-buffer PRIO-MAP ]\n"
" [ buffer-size SIZE-MAP ]\n"
"\n"
" where PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:BUFFER\n"
" SIZE-MAP := [ SIZE-MAP ] SIZE-MAPPING\n"
" SIZE-MAPPING := { all | BUFFER }:INTEGER\n"
" PRIO := { 0 .. 7 }\n"
" BUFFER := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_buffer_help_show(void)
{
fprintf(stderr,
"Usage: dcb buffer show dev STRING\n"
" [ prio-buffer ] [ buffer-size ] [ total-size ]\n"
"\n"
);
}
static void dcb_buffer_help(void)
{
fprintf(stderr,
"Usage: dcb buffer help\n"
"\n"
);
dcb_buffer_help_show();
dcb_buffer_help_set();
}
static int dcb_buffer_parse_mapping_prio_buffer(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
__u8 buf;
if (get_u8(&buf, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021Q_MAX_PRIORITIES - 1,
"BUFFER", buf, DCBX_MAX_BUFFERS - 1,
dcb_set_u8, buffer->prio2buffer);
}
static int dcb_buffer_parse_mapping_buffer_size(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
unsigned int size;
if (get_size(&size, value)) {
fprintf(stderr, "%d:%s: Illegal value for buffer size\n", key, value);
return -EINVAL;
}
return dcb_parse_mapping("BUFFER", key, DCBX_MAX_BUFFERS - 1,
"INTEGER", size, -1,
dcb_set_u32, buffer->buffer_size);
}
static void dcb_buffer_print_total_size(const struct dcbnl_buffer *buffer)
{
print_size(PRINT_ANY, "total_size", "total-size %s ", buffer->total_size);
}
static void dcb_buffer_print_prio_buffer(const struct dcbnl_buffer *buffer)
{
dcb_print_named_array("prio_buffer", "prio-buffer",
buffer->prio2buffer, ARRAY_SIZE(buffer->prio2buffer),
dcb_print_array_u8);
}
static void dcb_buffer_print_buffer_size(const struct dcbnl_buffer *buffer)
{
size_t size = ARRAY_SIZE(buffer->buffer_size);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "buffer_size");
print_string(PRINT_FP, NULL, "buffer-size ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_size(PRINT_ANY, NULL, b, buffer->buffer_size[i]);
}
close_json_array(PRINT_JSON, "buffer_size");
}
static void dcb_buffer_print(const struct dcbnl_buffer *buffer)
{
dcb_buffer_print_prio_buffer(buffer);
print_nl();
dcb_buffer_print_buffer_size(buffer);
print_nl();
dcb_buffer_print_total_size(buffer);
print_nl();
}
static int dcb_buffer_get(struct dcb *dcb, const char *dev, struct dcbnl_buffer *buffer)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_buffer_set(struct dcb *dcb, const char *dev, const struct dcbnl_buffer *buffer)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_cmd_buffer_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
if (!argc) {
dcb_buffer_help_set();
return 0;
}
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_set();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_prio_buffer, &buffer);
if (ret) {
fprintf(stderr, "Invalid priority mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "buffer-size") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_buffer_size, &buffer);
if (ret) {
fprintf(stderr, "Invalid buffer size mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_buffer_set(dcb, dev, &buffer);
}
static int dcb_cmd_buffer_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_buffer_print(&buffer);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_show();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
dcb_buffer_print_prio_buffer(&buffer);
print_nl();
} else if (matches(*argv, "buffer-size") == 0) {
dcb_buffer_print_buffer_size(&buffer);
print_nl();
} else if (matches(*argv, "total-size") == 0) {
dcb_buffer_print_total_size(&buffer);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_buffer_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_show, dcb_buffer_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_set, dcb_buffer_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help();
return -EINVAL;
}
}

192
dcb/dcb_dcbx.c Normal file
View File

@ -0,0 +1,192 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_dcbx_help_set(void)
{
fprintf(stderr,
"Usage: dcb dcbx set dev STRING\n"
" [ host | lld-managed ]\n"
" [ cee | ieee ] [ static ]\n"
"\n"
);
}
static void dcb_dcbx_help_show(void)
{
fprintf(stderr,
"Usage: dcb dcbx show dev STRING\n"
"\n"
);
}
static void dcb_dcbx_help(void)
{
fprintf(stderr,
"Usage: dcb dcbx help\n"
"\n"
);
dcb_dcbx_help_show();
dcb_dcbx_help_set();
}
struct dcb_dcbx_flag {
__u8 value;
const char *key_fp;
const char *key_json;
};
static struct dcb_dcbx_flag dcb_dcbx_flags[] = {
{DCB_CAP_DCBX_HOST, "host"},
{DCB_CAP_DCBX_LLD_MANAGED, "lld-managed", "lld_managed"},
{DCB_CAP_DCBX_VER_CEE, "cee"},
{DCB_CAP_DCBX_VER_IEEE, "ieee"},
{DCB_CAP_DCBX_STATIC, "static"},
};
static void dcb_dcbx_print(__u8 dcbx)
{
int bit;
int i;
while ((bit = ffs(dcbx))) {
bool found = false;
bit--;
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (flag->value == 1 << bit) {
print_bool(PRINT_JSON, flag->key_json ?: flag->key_fp,
NULL, true);
print_string(PRINT_FP, NULL, "%s ", flag->key_fp);
found = true;
break;
}
}
if (!found)
fprintf(stderr, "Unknown DCBX bit %#x.\n", 1 << bit);
dcbx &= ~(1 << bit);
}
print_nl();
}
static int dcb_dcbx_get(struct dcb *dcb, const char *dev, __u8 *dcbx)
{
__u16 payload_len;
void *payload;
int err;
err = dcb_get_attribute_bare(dcb, DCB_CMD_IEEE_GET, dev, DCB_ATTR_DCBX,
&payload, &payload_len);
if (err != 0)
return err;
if (payload_len != 1) {
fprintf(stderr, "DCB_ATTR_DCBX payload has size %d, expected 1.\n",
payload_len);
return -EINVAL;
}
*dcbx = *(__u8 *) payload;
return 0;
}
static int dcb_dcbx_set(struct dcb *dcb, const char *dev, __u8 dcbx)
{
return dcb_set_attribute_bare(dcb, DCB_CMD_SDCBX, dev, DCB_ATTR_DCBX,
&dcbx, 1, DCB_ATTR_DCBX);
}
static int dcb_cmd_dcbx_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx = 0;
__u8 i;
if (!argc) {
dcb_dcbx_help_set();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_set();
return 0;
}
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (matches(*argv, flag->key_fp) == 0) {
dcbx |= flag->value;
NEXT_ARG_FWD();
goto next;
}
}
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_set();
return -EINVAL;
next:
;
} while (argc > 0);
return dcb_dcbx_set(dcb, dev, dcbx);
}
static int dcb_cmd_dcbx_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx;
int ret;
ret = dcb_dcbx_get(dcb, dev, &dcbx);
if (ret != 0)
return ret;
while (argc > 0) {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_show();
return 0;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
}
open_json_object(NULL);
dcb_dcbx_print(dcbx);
close_json_object();
return 0;
}
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_dcbx_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_show, dcb_dcbx_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_set, dcb_dcbx_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help();
return -EINVAL;
}
}

435
dcb/dcb_ets.c Normal file
View File

@ -0,0 +1,435 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_ets_help_set(void)
{
fprintf(stderr,
"Usage: dcb ets set dev STRING\n"
" [ willing { on | off } ]\n"
" [ { tc-tsa | reco-tc-tsa } TSA-MAP ]\n"
" [ { pg-bw | tc-bw | reco-tc-bw } BW-MAP ]\n"
" [ { prio-tc | reco-prio-tc } PRIO-MAP ]\n"
"\n"
" where TSA-MAP := [ TSA-MAP ] TSA-MAPPING\n"
" TSA-MAPPING := { all | TC }:{ strict | cbs | ets | vendor }\n"
" BW-MAP := [ BW-MAP ] BW-MAPPING\n"
" BW-MAPPING := { all | TC }:INTEGER\n"
" PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:TC\n"
" TC := { 0 .. 7 }\n"
" PRIO := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_ets_help_show(void)
{
fprintf(stderr,
"Usage: dcb ets show dev STRING\n"
" [ willing ] [ ets-cap ] [ cbs ] [ tc-tsa ]\n"
" [ reco-tc-tsa ] [ pg-bw ] [ tc-bw ] [ reco-tc-bw ]\n"
" [ prio-tc ] [ reco-prio-tc ]\n"
"\n"
);
}
static void dcb_ets_help(void)
{
fprintf(stderr,
"Usage: dcb ets help\n"
"\n"
);
dcb_ets_help_show();
dcb_ets_help_set();
}
static const char *const tsa_names[] = {
[IEEE_8021QAZ_TSA_STRICT] = "strict",
[IEEE_8021QAZ_TSA_CB_SHAPER] = "cbs",
[IEEE_8021QAZ_TSA_ETS] = "ets",
[IEEE_8021QAZ_TSA_VENDOR] = "vendor",
};
static int dcb_ets_parse_mapping_tc_tsa(__u32 key, char *value, void *data)
{
__u8 tsa;
int ret;
tsa = parse_one_of("TSA", value, tsa_names, ARRAY_SIZE(tsa_names), &ret);
if (ret)
return ret;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"TSA", tsa, -1U,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_tc_bw(__u32 key, char *value, void *data)
{
__u8 bw;
if (get_u8(&bw, value, 0))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"BW", bw, 100,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_prio_tc(unsigned int key, char *value, void *data)
{
__u8 tc;
if (get_u8(&tc, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"TC", tc, IEEE_8021QAZ_MAX_TCS - 1,
dcb_set_u8, data);
}
static void dcb_print_array_tsa(const __u8 *array, size_t size)
{
dcb_print_array_kw(array, size, tsa_names, ARRAY_SIZE(tsa_names));
}
static void dcb_ets_print_willing(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "willing", "willing %s ", ets->willing);
}
static void dcb_ets_print_ets_cap(const struct ieee_ets *ets)
{
print_uint(PRINT_ANY, "ets_cap", "ets-cap %d ", ets->ets_cap);
}
static void dcb_ets_print_cbs(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "cbs", "cbs %s ", ets->cbs);
}
static void dcb_ets_print_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_bw", "tc-bw",
ets->tc_tx_bw, ARRAY_SIZE(ets->tc_tx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_pg_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("pg_bw", "pg-bw",
ets->tc_rx_bw, ARRAY_SIZE(ets->tc_rx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_tsa", "tc-tsa",
ets->tc_tsa, ARRAY_SIZE(ets->tc_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("prio_tc", "prio-tc",
ets->prio_tc, ARRAY_SIZE(ets->prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_bw", "reco-tc-bw",
ets->tc_reco_bw, ARRAY_SIZE(ets->tc_reco_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_tsa", "reco-tc-tsa",
ets->tc_reco_tsa, ARRAY_SIZE(ets->tc_reco_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_reco_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_prio_tc", "reco-prio-tc",
ets->reco_prio_tc, ARRAY_SIZE(ets->reco_prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print(const struct ieee_ets *ets)
{
dcb_ets_print_willing(ets);
dcb_ets_print_ets_cap(ets);
dcb_ets_print_cbs(ets);
print_nl();
dcb_ets_print_tc_bw(ets);
print_nl();
dcb_ets_print_pg_bw(ets);
print_nl();
dcb_ets_print_tc_tsa(ets);
print_nl();
dcb_ets_print_prio_tc(ets);
print_nl();
dcb_ets_print_reco_tc_bw(ets);
print_nl();
dcb_ets_print_reco_tc_tsa(ets);
print_nl();
dcb_ets_print_reco_prio_tc(ets);
print_nl();
}
static int dcb_ets_get(struct dcb *dcb, const char *dev, struct ieee_ets *ets)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_ets_validate_bw(const __u8 bw[], const __u8 tsa[], const char *what)
{
bool has_ets = false;
unsigned int total = 0;
unsigned int tc;
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (tsa[tc] == IEEE_8021QAZ_TSA_ETS) {
has_ets = true;
break;
}
}
/* TC bandwidth is only intended for ETS, but 802.1Q-2018 only requires
* that the sum be 100, and individual entries 0..100. It explicitly
* notes that non-ETS TCs can have non-0 TC bandwidth during
* reconfiguration.
*/
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (bw[tc] > 100) {
fprintf(stderr, "%d%% for TC %d of %s is not a valid bandwidth percentage, expected 0..100%%\n",
bw[tc], tc, what);
return -EINVAL;
}
total += bw[tc];
}
/* This is what 802.1Q-2018 requires. */
if (total == 100)
return 0;
/* But this requirement does not make sense for all-strict
* configurations. Anything else than 0 does not make sense: either BW
* has not been reconfigured for the all-strict allocation yet, at which
* point we expect sum of 100. Or it has already been reconfigured, at
* which point accept 0.
*/
if (!has_ets && total == 0)
return 0;
fprintf(stderr, "Bandwidth percentages in %s sum to %d%%, expected %d%%\n",
what, total, has_ets ? 100 : 0);
return -EINVAL;
}
static int dcb_ets_set(struct dcb *dcb, const char *dev, const struct ieee_ets *ets)
{
/* Do not validate pg-bw, which is not standard and has unclear
* meaning.
*/
if (dcb_ets_validate_bw(ets->tc_tx_bw, ets->tc_tsa, "tc-bw") ||
dcb_ets_validate_bw(ets->tc_reco_bw, ets->tc_reco_tsa, "reco-tc-bw"))
return -EINVAL;
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_cmd_ets_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
if (!argc) {
dcb_ets_help_set();
return 1;
}
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_set();
return 0;
} else if (matches(*argv, "willing") == 0) {
NEXT_ARG();
ets.willing = parse_on_off("willing", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_tsa);
if (ret) {
fprintf(stderr, "Invalid tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_reco_tsa);
if (ret) {
fprintf(stderr, "Invalid reco-tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_tx_bw);
if (ret) {
fprintf(stderr, "Invalid tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "pg-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_rx_bw);
if (ret) {
fprintf(stderr, "Invalid pg-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_reco_bw);
if (ret) {
fprintf(stderr, "Invalid reco-tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.prio_tc);
if (ret) {
fprintf(stderr, "Invalid prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.reco_prio_tc);
if (ret) {
fprintf(stderr, "Invalid reco-prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_ets_set(dcb, dev, &ets);
}
static int dcb_cmd_ets_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_ets_print(&ets);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_show();
return 0;
} else if (matches(*argv, "willing") == 0) {
dcb_ets_print_willing(&ets);
print_nl();
} else if (matches(*argv, "ets-cap") == 0) {
dcb_ets_print_ets_cap(&ets);
print_nl();
} else if (matches(*argv, "cbs") == 0) {
dcb_ets_print_cbs(&ets);
print_nl();
} else if (matches(*argv, "tc-tsa") == 0) {
dcb_ets_print_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-tsa") == 0) {
dcb_ets_print_reco_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "tc-bw") == 0) {
dcb_ets_print_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "pg-bw") == 0) {
dcb_ets_print_pg_bw(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-bw") == 0) {
dcb_ets_print_reco_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "prio-tc") == 0) {
dcb_ets_print_prio_tc(&ets);
print_nl();
} else if (matches(*argv, "reco-prio-tc") == 0) {
dcb_ets_print_reco_prio_tc(&ets);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_ets_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_show, dcb_ets_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_set, dcb_ets_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help();
return -EINVAL;
}
}

182
dcb/dcb_maxrate.c Normal file
View File

@ -0,0 +1,182 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_maxrate_help_set(void)
{
fprintf(stderr,
"Usage: dcb maxrate set dev STRING\n"
" [ tc-maxrate RATE-MAP ]\n"
"\n"
" where RATE-MAP := [ RATE-MAP ] RATE-MAPPING\n"
" RATE-MAPPING := { all | TC }:RATE\n"
" TC := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_maxrate_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -i ] maxrate show dev STRING\n"
" [ tc-maxrate ]\n"
"\n"
);
}
static void dcb_maxrate_help(void)
{
fprintf(stderr,
"Usage: dcb maxrate help\n"
"\n"
);
dcb_maxrate_help_show();
dcb_maxrate_help_set();
}
static int dcb_maxrate_parse_mapping_tc_maxrate(__u32 key, char *value, void *data)
{
__u64 rate;
if (get_rate64(&rate, value))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"RATE", rate, -1,
dcb_set_u64, data);
}
static void dcb_maxrate_print_tc_maxrate(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
size_t size = ARRAY_SIZE(maxrate->tc_maxrate);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "tc_maxrate");
print_string(PRINT_FP, NULL, "tc-maxrate ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_rate(dcb->use_iec, PRINT_ANY, NULL, b, maxrate->tc_maxrate[i]);
}
close_json_array(PRINT_JSON, "tc_maxrate");
}
static void dcb_maxrate_print(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
dcb_maxrate_print_tc_maxrate(dcb, maxrate);
print_nl();
}
static int dcb_maxrate_get(struct dcb *dcb, const char *dev, struct ieee_maxrate *maxrate)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_maxrate_set(struct dcb *dcb, const char *dev, const struct ieee_maxrate *maxrate)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_cmd_maxrate_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
if (!argc) {
dcb_maxrate_help_set();
return 0;
}
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_set();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_maxrate_parse_mapping_tc_maxrate, &maxrate);
if (ret) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_maxrate_set(dcb, dev, &maxrate);
}
static int dcb_cmd_maxrate_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_maxrate_print(dcb, &maxrate);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_show();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
dcb_maxrate_print_tc_maxrate(dcb, &maxrate);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_maxrate_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_show, dcb_maxrate_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_set, dcb_maxrate_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help();
return -EINVAL;
}
}

286
dcb/dcb_pfc.c Normal file
View File

@ -0,0 +1,286 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_pfc_help_set(void)
{
fprintf(stderr,
"Usage: dcb pfc set dev STRING\n"
" [ prio-pfc PFC-MAP ]\n"
" [ macsec-bypass { on | off } ]\n"
" [ delay INTEGER ]\n"
"\n"
" where PFC-MAP := [ PFC-MAP ] PFC-MAPPING\n"
" PFC-MAPPING := { all | TC }:PFC\n"
" TC := { 0 .. 7 }\n"
" PFC := { on | off }\n"
"\n"
);
}
static void dcb_pfc_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -s ] pfc show dev STRING\n"
" [ pfc-cap ] [ prio-pfc ] [ macsec-bypass ]\n"
" [ delay ] [ requests ] [ indications ]\n"
"\n"
);
}
static void dcb_pfc_help(void)
{
fprintf(stderr,
"Usage: dcb pfc help\n"
"\n"
);
dcb_pfc_help_show();
dcb_pfc_help_set();
}
static void dcb_pfc_to_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 pfc_en)
{
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
array[i] = !!(pfc_en & (1 << i));
}
static void dcb_pfc_from_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 *pfc_en_p)
{
__u8 pfc_en = 0;
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
if (array[i])
pfc_en |= 1 << i;
}
*pfc_en_p = pfc_en;
}
static int dcb_pfc_parse_mapping_prio_pfc(__u32 key, char *value, void *data)
{
struct ieee_pfc *pfc = data;
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
bool enabled;
int ret;
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
enabled = parse_on_off("PFC", value, &ret);
if (ret)
return ret;
ret = dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"PFC", enabled, -1,
dcb_set_u8, pfc_en);
if (ret)
return ret;
dcb_pfc_from_array(pfc_en, &pfc->pfc_en);
return 0;
}
static void dcb_pfc_print_pfc_cap(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "pfc_cap", "pfc-cap %d ", pfc->pfc_cap);
}
static void dcb_pfc_print_macsec_bypass(const struct ieee_pfc *pfc)
{
print_on_off(PRINT_ANY, "macsec_bypass", "macsec-bypass %s ", pfc->mbc);
}
static void dcb_pfc_print_delay(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "delay", "delay %d ", pfc->delay);
}
static void dcb_pfc_print_prio_pfc(const struct ieee_pfc *pfc)
{
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
dcb_print_named_array("prio_pfc", "prio-pfc",
pfc_en, ARRAY_SIZE(pfc_en), &dcb_print_array_on_off);
}
static void dcb_pfc_print_requests(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "requests");
print_string(PRINT_FP, NULL, "requests ", NULL);
dcb_print_array_u64(pfc->requests, ARRAY_SIZE(pfc->requests));
close_json_array(PRINT_JSON, "requests");
}
static void dcb_pfc_print_indications(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "indications");
print_string(PRINT_FP, NULL, "indications ", NULL);
dcb_print_array_u64(pfc->indications, ARRAY_SIZE(pfc->indications));
close_json_array(PRINT_JSON, "indications");
}
static void dcb_pfc_print(const struct dcb *dcb, const struct ieee_pfc *pfc)
{
dcb_pfc_print_pfc_cap(pfc);
dcb_pfc_print_macsec_bypass(pfc);
dcb_pfc_print_delay(pfc);
print_nl();
dcb_pfc_print_prio_pfc(pfc);
print_nl();
if (dcb->stats) {
dcb_pfc_print_requests(pfc);
print_nl();
dcb_pfc_print_indications(pfc);
print_nl();
}
}
static int dcb_pfc_get(struct dcb *dcb, const char *dev, struct ieee_pfc *pfc)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_pfc_set(struct dcb *dcb, const char *dev, const struct ieee_pfc *pfc)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_cmd_pfc_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
if (!argc) {
dcb_pfc_help_set();
return 0;
}
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_set();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_pfc_parse_mapping_prio_pfc, &pfc);
if (ret) {
fprintf(stderr, "Invalid pfc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "macsec-bypass") == 0) {
NEXT_ARG();
pfc.mbc = parse_on_off("macsec-bypass", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "delay") == 0) {
NEXT_ARG();
/* Do not support the size notations for delay.
* Delay is specified in "bit times", not bits, so
* it is not applicable. At the same time it would
* be confusing that 10Kbit does not mean 10240,
* but 1280.
*/
if (get_u16(&pfc.delay, *argv, 0)) {
fprintf(stderr, "Invalid delay `%s', expected an integer 0..65535\n",
*argv);
return -EINVAL;
}
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_pfc_set(dcb, dev, &pfc);
}
static int dcb_cmd_pfc_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_pfc_print(dcb, &pfc);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_show();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
dcb_pfc_print_prio_pfc(&pfc);
print_nl();
} else if (matches(*argv, "pfc-cap") == 0) {
dcb_pfc_print_pfc_cap(&pfc);
print_nl();
} else if (matches(*argv, "macsec-bypass") == 0) {
dcb_pfc_print_macsec_bypass(&pfc);
print_nl();
} else if (matches(*argv, "delay") == 0) {
dcb_pfc_print_delay(&pfc);
print_nl();
} else if (matches(*argv, "requests") == 0) {
dcb_pfc_print_requests(&pfc);
print_nl();
} else if (matches(*argv, "indications") == 0) {
dcb_pfc_print_indications(&pfc);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_pfc_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_show, dcb_pfc_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_set, dcb_pfc_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help();
return -EINVAL;
}
}

View File

@ -1,21 +1,25 @@
include ../Config
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DEVLINKOBJ = devlink.o mnlg.o
TARGETS=devlink
CFLAGS += $(shell $(PKG_CONFIG) libmnl --cflags)
LDLIBS += $(shell $(PKG_CONFIG) libmnl --libs)
TARGETS += devlink
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
devlink: $(DEVLINKOBJ)
devlink: $(DEVLINKOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
install -m 0755 $(TARGETS) $(DESTDIR)$(SBINDIR)
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DEVLINKOBJ) $(TARGETS)

File diff suppressed because it is too large Load Diff

View File

@ -14,10 +14,12 @@
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <time.h>
#include <libmnl/libmnl.h>
#include <linux/genetlink.h>
#include "libnetlink.h"
#include "mnl_utils.h"
#include "utils.h"
#include "mnlg.h"
struct mnlg_socket {
@ -26,56 +28,13 @@ struct mnlg_socket {
uint32_t id;
uint8_t version;
unsigned int seq;
unsigned int portid;
};
static struct nlmsghdr *__mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags, uint32_t id,
uint8_t version)
{
struct nlmsghdr *nlh;
struct genlmsghdr *genl;
nlh = mnl_nlmsg_put_header(nlg->buf);
nlh->nlmsg_type = id;
nlh->nlmsg_flags = flags;
nlg->seq = time(NULL);
nlh->nlmsg_seq = nlg->seq;
genl = mnl_nlmsg_put_extra_header(nlh, sizeof(struct genlmsghdr));
genl->cmd = cmd;
genl->version = version;
return nlh;
}
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags)
{
return __mnlg_msg_prepare(nlg, cmd, flags, nlg->id, nlg->version);
}
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh)
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh)
{
return mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
}
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data)
{
int err;
do {
err = mnl_socket_recvfrom(nlg->nl, nlg->buf,
MNL_SOCKET_BUFFER_SIZE);
if (err <= 0)
break;
err = mnl_cb_run(nlg->buf, err, nlg->seq, nlg->portid,
data_cb, data);
} while (err > 0);
return err;
}
struct group_info {
bool found;
uint32_t id;
@ -155,15 +114,17 @@ static int get_group_id_cb(const struct nlmsghdr *nlh, void *data)
return MNL_CB_OK;
}
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name)
{
struct nlmsghdr *nlh;
struct group_info group_info;
int err;
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);
nlh = _mnlu_gen_socket_cmd_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK,
GENL_ID_CTRL, 1);
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->family);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
@ -171,7 +132,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
group_info.found = false;
group_info.name = group_name;
err = mnlg_socket_recv_run(nlg, get_group_id_cb, &group_info);
err = mnlu_gen_socket_recv_run(nlg, get_group_id_cb, &group_info);
if (err < 0)
return err;
@ -188,87 +149,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
return 0;
}
static int get_family_id_attr_cb(const struct nlattr *attr, void *data)
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg)
{
const struct nlattr **tb = data;
int type = mnl_attr_get_type(attr);
if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0)
return MNL_CB_ERROR;
if (type == CTRL_ATTR_FAMILY_ID &&
mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
}
static int get_family_id_cb(const struct nlmsghdr *nlh, void *data)
{
uint32_t *p_id = data;
struct nlattr *tb[CTRL_ATTR_MAX + 1] = {};
struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
mnl_attr_parse(nlh, sizeof(*genl), get_family_id_attr_cb, tb);
if (!tb[CTRL_ATTR_FAMILY_ID])
return MNL_CB_ERROR;
*p_id = mnl_attr_get_u16(tb[CTRL_ATTR_FAMILY_ID]);
return MNL_CB_OK;
}
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version)
{
struct mnlg_socket *nlg;
struct nlmsghdr *nlh;
int err;
nlg = malloc(sizeof(*nlg));
if (!nlg)
return NULL;
nlg->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (!nlg->buf)
goto err_buf_alloc;
nlg->nl = mnl_socket_open(NETLINK_GENERIC);
if (!nlg->nl)
goto err_mnl_socket_open;
err = mnl_socket_bind(nlg->nl, 0, MNL_SOCKET_AUTOPID);
if (err < 0)
goto err_mnl_socket_bind;
nlg->portid = mnl_socket_get_portid(nlg->nl);
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, family_name);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
goto err_mnlg_socket_send;
err = mnlg_socket_recv_run(nlg, get_family_id_cb, &nlg->id);
if (err < 0)
goto err_mnlg_socket_recv_run;
nlg->version = version;
return nlg;
err_mnlg_socket_recv_run:
err_mnlg_socket_send:
err_mnl_socket_bind:
mnl_socket_close(nlg->nl);
err_mnl_socket_open:
free(nlg->buf);
err_buf_alloc:
free(nlg);
return NULL;
}
void mnlg_socket_close(struct mnlg_socket *nlg)
{
mnl_socket_close(nlg->nl);
free(nlg->buf);
free(nlg);
return mnl_socket_get_fd(nlg->nl);
}

View File

@ -14,14 +14,10 @@
#include <libmnl/libmnl.h>
struct mnlg_socket;
struct mnlu_gen_socket;
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags);
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh);
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data);
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name);
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version);
void mnlg_socket_close(struct mnlg_socket *nlg);
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh);
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name);
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg);
#endif /* _MNLG_H_ */

View File

@ -1,73 +0,0 @@
PSFILES=ip-cref.ps ip-tunnels.ps api-ip6-flowlabels.ps ss.ps nstat.ps arpd.ps rtstat.ps tc-filters.ps
# tc-cref.ps
# api-rtnl.tex api-pmtudisc.tex api-news.tex
# iki-netdev.ps iki-neighdst.ps
LATEX=latex
DVIPS=dvips
SGML2DVI=sgml2latex
SGML2HTML=sgml2html -s 0
LPR=lpr -Zsduplex
SHELL=bash
PAGESIZE=a4
PAGESPERPAGE=2
HTMLFILES=$(subst .sgml,.html,$(shell echo *.sgml))
DVIFILES=$(subst .ps,.dvi,$(PSFILES))
PDFFILES=$(subst .ps,.pdf,$(PSFILES))
all: pstwocol
pstwocol: $(PSFILES)
html: $(HTMLFILES)
dvi: $(DVIFILES)
pdf: $(PDFFILES)
print: $(PSFILES)
$(LPR) $(PSFILES)
%.tex: %.sgml
$(SGML2DVI) --output=tex $<
%.dvi: %.sgml
$(SGML2DVI) --output=dvi $<
%.dvi: %.tex
@set -e; pass=2; echo "Running LaTeX $<"; \
while [ `$(LATEX) $< </dev/null 2>&1 | \
grep -c '^\(LaTeX Warning: Label(s) may\|No file \|! Emergency stop\)'` -ge 1 ]; do \
if [ $$pass -gt 3 ]; then \
echo "Seems, something is wrong. Try by hands." ; exit 1 ; \
fi; \
echo "Re-running LaTeX $<, $${pass}d pass"; pass=$$[$$pass + 1]; \
done
%.pdf: %.tex
@set -e; pass=2; echo "Running pdfLaTeX $<"; \
while [ `pdflatex $< </dev/null 2>&1 | \
grep -c '^\(LaTeX Warning: Label(s) may\|No file \|! Emergency stop\)'` -ge 1 ]; do \
if [ $$pass -gt 3 ]; then \
echo "Seems, something is wrong. Try by hands." ; exit 1 ; \
fi; \
echo "Re-running pdfLaTeX $<, $${pass}d pass"; pass=$$[$$pass + 1]; \
done
#%.pdf: %.ps
# ps2pdf $<
%.ps: %.dvi
$(DVIPS) $< -o $@
%.html: %.sgml
$(SGML2HTML) $<
install:
install -m 0644 $(shell echo *.tex) $(DESTDIR)$(DOCDIR)
install -m 0644 $(shell echo *.sgml) $(DESTDIR)$(DOCDIR)
clean:
rm -f *.aux *.log *.toc $(PSFILES) $(DVIFILES) *.html *.pdf

View File

@ -1,16 +0,0 @@
Partially finished work.
1. User Reference manuals.
1.1 IP Command reference (ip-cref.tex, published)
1.2 TC Command reference (tc-cref.tex)
1.3 IP tunnels (ip-tunnels.tex, published)
2. Linux-2.2 Networking API
2.1 RTNETLINK (api-rtnl.tex)
2.2 Path MTU Discovery (api-pmtudisc.tex)
2.3 IPv6 Flow Labels (api-ip6-flowlabels.tex, published)
2.4 Miscellaneous extensions (api-misc.tex)
3. Linux-2.2 Networking Intra-Kernel Interfaces
3.1 NetDev --- Networking Devices and netdev... (iki-netdev.tex)
3.2 Neighbour cache and destination cache. (iki-neighdst.tex)

View File

@ -1 +0,0 @@
\def\Draft{020116}

View File

@ -6,8 +6,8 @@ What is it?
-----------
An extension to the filtering/classification architecture of Linux Traffic
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
i.e you could say something like:
-----
@ -17,11 +17,11 @@ tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
which implies "if a packet is seen on the ingress of the lo device with
a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and
we execute a policing action which rate limits its bandwidth utilization
we execute a policing action which rate limits its bandwidth utilization
to 1.5Mbps".
The new extensions allow for more than just policing actions to be added.
They are also fully backward compatible. If you have a kernel that doesnt
They are also fully backward compatible. If you have a kernel that doesn't
understand them, then the effect is null i.e if you have a newer tc
but older kernel, the actions are not installed. Likewise if you
have a newer kernel but older tc, obviously the tc will use current
@ -29,9 +29,9 @@ syntax which will work fine. Of course to get the required effect you need
both newer tc and kernel. If you are reading this you have the
right tc ;->
A side effect is that we can now get stateless firewalling to work with tc.
A side effect is that we can now get stateless firewalling to work with tc.
Essentially this is now an alternative to iptables.
I wont go into details of my dislike for iptables at times, but
I won't go into details of my dislike for iptables at times, but
scalability is one of the main issues; however, if you need stateful
classification - use netfilter (for now).
@ -61,7 +61,7 @@ tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
match ip src 127.0.0.1/32 flowid 1:1 \
action police mtu 4000 rate 1500kbit burst 90k
" generic Actions" (gact) at the moment are:
" generic Actions" (gact) at the moment are:
{ drop, pass, reclassify, continue}
(If you have others, no listed here give me a reason and we will add them)
+drop says to drop the packet
@ -77,7 +77,7 @@ iptable target. I have only tested with mangler targets up to now.
In terms of hooks:
*ingress is mapped to pre-routing hook
*egress is mapped to post-routing hook
I dont see much value in the other hooks, if you see it and email me good
I don't see much value in the other hooks, if you see it and email me good
reasons, the addition is trivial.
Example syntax for iptables targets usage becomes:
@ -93,43 +93,43 @@ decimal 12, then use flowid 1:c.
3) A feature i call pipe
The motivation is derived from Unix pipe mechanism but applied to packets.
Essentially take a matching packet and pass it through
Essentially take a matching packet and pass it through
action1 | action2 | action3 etc.
You could do something similar to this with the tc policer and the "continue"
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
as an example -- and please note that this is just an example _not_ The
as an example -- and please note that this is just an example _not_ The
Word Youve Been Waiting For (yes i have had problems giving examples
which ended becoming dogma in documents and people modifying them a little
to look clever);
to look clever);
i selected the metering rates to be small so that i can show better how
i selected the metering rates to be small so that i can show better how
things work.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesnt exceed rate, this is where we terminate action execution.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesn't exceed rate, this is where we terminate action execution.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
then passed through a second meter.
-The second meter is shared across all flows on that device [i am suprised
that this seems to be not a well know feature of the policer; Bert was telling
-The second meter is shared across all flows on that device [i am surpised
that this seems to be not a well know feature of the policer; Bert was telling
me that someone was writing a qdisc just to do sharing across multiple devices;
it must be the summer heat again; weve had someone doing that every year around
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
share.]
-If the second meter is exceeded the color of the flow changes further to 3.
-We then pass the packet to another meter which is shared across all devices
in the system. If this meter is exceeded we drop the packet.
Note the mark can be used further up the system to do things like policy
Note the mark can be used further up the system to do things like policy
or more interesting things on the egress.
------------------ cut here -------------------------------
@ -145,7 +145,7 @@ u32 match ip src 10.0.0.21/32 flowid 1:15 \
action ipt -j mark --set-mark 1 index 2 \
#
# then pass it through a policer which allows 1kbps; if the flow
# doesnt exceed that rate, this is where we stop, if it exceeds we
# doesn't exceed that rate, this is where we stop, if it exceeds we
# pipe the packet to the next action
action police rate 1kbit burst 9k pipe \
#
@ -161,31 +161,31 @@ action ipt -j mark --set-mark 3 \
# and then attempt to borrow from a meter used by all devices in the
# system. Should this be exceeded, drop the packet on the floor.
action police index 20 mtu 5000 rate 1kbit burst 90k drop
---------------------------------
---------------------------------
Now lets see the actions installed with
Now lets see the actions installed with
"tc filter show parent ffff: dev eth0"
-------- output -----------
jroot# tc filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
match 0a000015/ffffffff at 12
-------------------------------
@ -209,31 +209,31 @@ Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0"
--------------
jroot# tc -s filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
5
5
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
match 0a000015/ffffffff at 12
-------------------------------
@ -241,7 +241,7 @@ filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
Neat, eh?
Wanna write an action module?
Want to write an action module?
------------------------------
Its easy. Either look at the code or send me email. I will document at
some point; will also accept documentation.
@ -254,4 +254,3 @@ At the moment the focus has been on getting the architecture in place.
Expect new things in the spurious time i have to work on this
(particularly around end of year when i have typically get time off
from work).

View File

@ -1,16 +1,16 @@
gact <ACTION> [RAND] [INDEX]
Where:
ACTION := reclassify | drop | continue | pass | ok
Where:
ACTION := reclassify | drop | continue | pass | ok
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
INDEX := index value used
ACTION semantics
- pass and ok are equivalent to accept
- continue allows to restart classification lookup
- continue allows one to restart classification lookup
- drop drops packets
- reclassify implies continue classification where we left off
@ -42,14 +42,14 @@ filter u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 32 suc
random type none pass val 0
index 1 ref 1 bind 1 installed 59 sec used 35 sec
Sent 1680 bytes 20 pkts (dropped 20, overlimits 0 )
----
# example 2
#allow 1 out 10 randomly using the netrand generator
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random netrand ok 10
ping -c 20 10.0.0.9
----
@ -59,14 +59,14 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
random type netrand pass val 10
index 5 ref 1 bind 1 installed 49 sec used 25 sec
Sent 1680 bytes 20 pkts (dropped 16, overlimits 0 )
--------
#alternative: deterministically accept every second packet
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random determ ok 2
ping -c 20 10.0.0.9
tc -s filter show parent ffff: dev eth0
-----
filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1filter protocol ip pref 6 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 20 success 20)
@ -76,4 +76,3 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
index 4 ref 1 bind 1 installed 118 sec used 82 sec
Sent 1680 bytes 20 pkts (dropped 10, overlimits 0 )
-----

View File

@ -6,47 +6,47 @@ with a _lot_ less code.
Known IMQ/IFB USES
------------------
As far as i know the reasons listed below is why people use IMQ.
As far as i know the reasons listed below is why people use IMQ.
It would be nice to know of anything else that i missed.
1) qdiscs/policies that are per device as opposed to system wide.
IFB allows for sharing.
2) Allows for queueing incoming traffic for shaping instead of
dropping. I am not aware of any study that shows policing is
dropping. I am not aware of any study that shows policing is
worse than shaping in achieving the end goal of rate control.
I would be interested if anyone is experimenting.
3) Very interesting use: if you are serving p2p you may wanna give
preference to your own localy originated traffic (when responses come back)
3) Very interesting use: if you are serving p2p you may want to give
preference to your own locally originated traffic (when responses come back)
vs someone using your system to do bittorent. So QoSing based on state
comes in as the solution. What people did to achive this was stick
comes in as the solution. What people did to achieve this was stick
the IMQ somewhere prelocal hook.
I think this is a pretty neat feature to have in Linux in general.
(i.e not just for IMQ).
But i wont go back to putting netfilter hooks in the device to satisfy
this. I also dont think its worth it hacking ifb some more to be
But i won't go back to putting netfilter hooks in the device to satisfy
this. I also don't think its worth it hacking ifb some more to be
aware of say L3 info and play ip rule tricks to achieve this.
--> Instead the plan is to have a contrack related action. This action will
selectively either query/create contrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didnt have existing state. This
--> Instead the plan is to have a conntrack related action. This action will
selectively either query/create conntrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didn't have existing state. This
all however is dependent on whatever rules the admin enters.
At the moment this 3rd function does not exist yet. I have decided that
instead of sitting on the patch for another year, to release it and then
if theres pressure i will add this feature.
instead of sitting on the patch for another year, to release it and then
if there is pressure i will add this feature.
An example, to provide functionality that most people use IMQ for below:
--------
export TC="/sbin/tc"
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 parent 1:1 handle 10: sfq
$TC qdisc add dev ifb0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC filter add dev ifb0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1
$TC filter add dev ifb0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2
@ -54,7 +54,7 @@ ifconfig ifb0 up
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving in eth0 to ifb0
# redirect all IP packets arriving in eth0 to ifb0
# use mark 1 --> puts them onto class 1:1
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 \
@ -77,44 +77,44 @@ PING 10.22 (10.0.0.22): 56 data bytes
--- 10.22 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/1.3/2.8 ms
[root@jzny action-tests]#
[root@jzny action-tests]#
-----
Now look at some stats:
---
[root@jmandrake]:~# $TC -s filter show parent ffff: dev eth0
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
match 00000000/00000000 at 0
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 2: mirred (Egress Redirect to device ifb0) stolen
index 1 ref 1 bind 1 installed 165 sec used 27 sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# $TC -s qdisc
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc prio 1: dev ifb0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# ifconfig ifb0
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:6 errors:0 dropped:3 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:32
collisions:0 txqueuelen:32
RX bytes:504 (504.0 b) TX bytes:252 (252.0 b)
-----

View File

@ -7,10 +7,10 @@ flow to be mirrored. High end switches typically can select based
on more than just a port (eg a 5 tuple classifier). They may also be
capable of redirecting.
Usage:
Usage:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
DIRECTION := <ingress | egress>
ACTION := <mirror | redirect>
INDEX is the specific policy instance id
@ -18,7 +18,7 @@ DEVICENAME is the devicename
Direction:
- Ingress is not supported at the moment. It will be in the
future as well as mirror/redirecting to a socket.
future as well as mirror/redirecting to a socket.
Action:
- Mirror takes a copy of the packet and sends it to specified
@ -26,17 +26,17 @@ dev ("port" in ethernet switch/bridging terminology)
- redirect
steals the packet and redirects to specified destination dev.
What NOT to do if you dont want your machine to crash:
What NOT to do if you don't want your machine to crash:
------------------------------------------------------
Do not create loops!
Do not create loops!
Loops are not hard to create in the egress qdiscs.
Here are simple rules to follow if you dont want to get
Here are simple rules to follow if you don't want to get
hurt:
A) Do not have the same packet go to same netdevice twice
in a single graph of policies. Your machine will just hang!
This is design intent _not a bug_ to teach you some lessons.
This is design intent _not a bug_ to teach you some lessons.
In the future if there are easy ways to do this in the kernel
without affecting other packets not interested in this feature
@ -51,7 +51,7 @@ B) Do not redirect from one IFB device to another.
Remember that IFB is a very specialized case of packet redirecting
device. Instead of redirecting it puts packets at the exact spot
on the stack it found them from.
Redirecting from ifbX->ifbY will actually not crash your machine but your
Redirecting from ifbX->ifbY will actually not crash your machine but your
packets will all be dropped (this is much simpler to detect
and resolve and is only affecting users of ifb as opposed to the
whole stack).
@ -64,7 +64,7 @@ Some examples:
1) Mirror all packets arriving on eth0 to be sent out on eth1.
You may have a sniffer or some accounting box hooked up on eth1.
---
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
@ -100,7 +100,7 @@ stack (i.e ping would work).
3) Even more funky example:
#
#allow 1 out 10 packets on ingress of lo to randomly make it to the
#allow 1 out 10 packets on ingress of lo to randomly make it to the
# host A (Randomness uses the netrand generator)
#
---
@ -111,9 +111,9 @@ action mirred egress mirror dev eth0
---
4)
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
#
---
@ -129,7 +129,7 @@ so you could tcpdump them (dummy by defaults drops all packets it sees).
This is a very useful debug feature.
Lets say you are policing packets from alias 192.168.200.200/32
you dont want those to exceed 100kbps going out.
you don't want those to exceed 100kbps going out.
---
tc qdisc add dev eth0 handle 1:0 root prio
@ -158,7 +158,7 @@ Essentially a good debugging/logging interface (sort of like
BSDs speacialized log device does without needing one).
If you replace mirror with redirect, those packets will be
blackholed and will never make it out.
blackholed and will never make it out.
cheers,
jamal

View File

@ -1,429 +0,0 @@
\documentstyle[12pt,twoside]{article}
\def\TITLE{IPv6 Flow Labels}
\input preamble
\begin{center}
\Large\bf IPv6 Flow Labels in Linux-2.2.
\end{center}
\begin{center}
{ \large Alexey~N.~Kuznetsov } \\
\em Institute for Nuclear Research, Moscow \\
\verb|kuznet@ms2.inr.ac.ru| \\
\rm April 11, 1999
\end{center}
\vspace{5mm}
\tableofcontents
\section{Introduction.}
Every IPv6 packet carries 28 bits of flow information. RFC2460 splits
these bits to two fields: 8 bits of traffic class (or DS field, if you
prefer this term) and 20 bits of flow label. Currently there exist
no well-defined API to manage IPv6 flow information. In this document
I describe an attempt to design the API for Linux-2.2 IPv6 stack.
\vskip 1mm
The API must solve the following tasks:
\begin{enumerate}
\item To allow user to set traffic class bits.
\item To allow user to read traffic class bits of received packets.
This feature is not so useful as the first one, however it will be
necessary f.e.\ to implement ECN [RFC2481] for datagram oriented services
or to implement receiver side of SRP or another end-to-end protocol
using traffic class bits.
\item To assign flow labels to packets sent by user.
\item To get flow labels of received packets. I do not know
any applications of this feature, but it is possible that receiver will
want to use flow labels to distinguish sub-flows.
\item To allocate flow labels in the way, compliant to RFC2460. Namely:
\begin{itemize}
\item
Flow labels must be uniformly distributed (pseudo-)random numbers,
so that any subset of 20 bits can be used as hash key.
\item
Flows with coinciding source address and flow label must have identical
destination address and not-fragmentable extensions headers (i.e.\
hop by hop options and all the headers up to and including routing header,
if it is present.)
\begin{NB}
There is a hole in specs: some hop-by-hop options can be
defined only on per-packet base (f.e.\ jumbo payload option).
Essentially, it means that such options cannot present in packets
with flow labels.
\end{NB}
\begin{NB}
NB notes here and below reflect only my personal opinion,
they should be read with smile or should not be read at all :-).
\end{NB}
\item
Flow labels have finite lifetime and source is not allowed to reuse
flow label for another flow within the maximal lifetime has expired,
so that intermediate nodes will be able to invalidate flow state before
the label is taken over by another flow.
Flow state, including lifetime, is propagated along datagram path
by some application specific methods
(f.e.\ in RSVP PATH messages or in some hop-by-hop option).
\end{itemize}
\end{enumerate}
\section{Sending/receiving flow information.}
\paragraph{Discussion.}
\addcontentsline{toc}{subsection}{Discussion}
It was proposed (Where? I do not remember any explicit statement)
to solve the first four tasks using
\verb|sin6_flowinfo| field added to \verb|struct| \verb|sockaddr_in6|
(see RFC2553).
\begin{NB}
This method is difficult to consider as reasonable, because it
puts additional overhead to all the services, despite of only
very small subset of them (none, to be more exact) really use it.
It contradicts both to IETF spirit and the letter. Before RFC2553
one justification existed, IPv6 address alignment left 4 byte
hole in \verb|sockaddr_in6| in any case. Now it has no justification.
\end{NB}
We have two problems with this method. The first one is common for all OSes:
if \verb|recvmsg()| initializes \verb|sin6_flowinfo| to flow info
of received packet, we loose one very important property of BSD socket API,
namely, we are not allowed to use received address for reply directly
and have to mangle it, even if we are not interested in flowinfo subtleties.
\begin{NB}
RFC2553 adds new requirement: to clear \verb|sin6_flowinfo|.
Certainly, it is not solution but rather attempt to force applications
to make unnecessary work. Well, as usually, one mistake in design
is followed by attempts to patch the hole and more mistakes...
\end{NB}
Another problem is Linux specific. Historically Linux IPv6 did not
initialize \verb|sin6_flowinfo| at all, so that, if kernel does not
support flow labels, this field is not zero, but a random number.
Some applications also did not take care about it.
\begin{NB}
Following RFC2553 such applications can be considered as broken,
but I still think that they are right: clearing all the address
before filling known fields is robust but stupid solution.
Useless wasting CPU cycles and
memory bandwidth is not a good idea. Such patches are acceptable
as temporary hacks, but not as standard of the future.
\end{NB}
\paragraph{Implementation.}
\addcontentsline{toc}{subsection}{Implementation}
By default Linux IPv6 does not read \verb|sin6_flowinfo| field
assuming that common applications are not obliged to initialize it
and are permitted to consider it as pure alignment padding.
In order to tell kernel that application
is aware of this field, it is necessary to set socket option
\verb|IPV6_FLOWINFO_SEND|.
\begin{verbatim}
int on = 1;
setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO_SEND,
(void*)&on, sizeof(on));
\end{verbatim}
Linux kernel never fills \verb|sin6_flowinfo| field, when passing
message to user space, though the kernels which support flow labels
initialize it to zero. If user wants to get received flowinfo, he
will set option \verb|IPV6_FLOWINFO| and after this he will receive
flowinfo as ancillary data object of type \verb|IPV6_FLOWINFO|
(cf.\ RFC2292).
\begin{verbatim}
int on = 1;
setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO, (void*)&on, sizeof(on));
\end{verbatim}
Flowinfo received and latched by a connected TCP socket also may be fetched
with \verb|getsockopt()| \verb|IPV6_PKTOPTIONS| together with
another optional information.
Besides that, in the spirit of RFC2292 the option \verb|IPV6_FLOWINFO|
may be used as alternative way to send flowinfo with \verb|sendmsg()| or
to latch it with \verb|IPV6_PKTOPTIONS|.
\paragraph{Note about IPv6 options and destination address.}
\addcontentsline{toc}{subsection}{IPv6 options and destination address}
If \verb|sin6_flowinfo| does contain not zero flow label,
destination address in \verb|sin6_addr| and non-fragmentable
extension headers are ignored. Instead, kernel uses the values
cached at flow setup (see below). However, for connected sockets
kernel prefers the values set at connection time.
\paragraph{Example.}
\addcontentsline{toc}{subsection}{Example}
After setting socket option \verb|IPV6_FLOWINFO|
flowlabel and DS field are received as ancillary data object
of type \verb|IPV6_FLOWINFO| and level \verb|SOL_IPV6|.
In the cases when it is convenient to use \verb|recvfrom(2)|,
it is possible to replace library variant with your own one,
sort of:
\begin{verbatim}
#include <sys/socket.h>
#include <netinet/in6.h>
size_t recvfrom(int fd, char *buf, size_t len, int flags,
struct sockaddr *addr, int *addrlen)
{
size_t cc;
char cbuf[128];
struct cmsghdr *c;
struct iovec iov = { buf, len };
struct msghdr msg = { addr, *addrlen,
&iov, 1,
cbuf, sizeof(cbuf),
0 };
cc = recvmsg(fd, &msg, flags);
if (cc < 0)
return cc;
((struct sockaddr_in6*)addr)->sin6_flowinfo = 0;
*addrlen = msg.msg_namelen;
for (c=CMSG_FIRSTHDR(&msg); c; c = CMSG_NEXTHDR(&msg, c)) {
if (c->cmsg_level != SOL_IPV6 ||
c->cmsg_type != IPV6_FLOWINFO)
continue;
((struct sockaddr_in6*)addr)->sin6_flowinfo = *(__u32*)CMSG_DATA(c);
}
return cc;
}
\end{verbatim}
\section{Flow label management.}
\paragraph{Discussion.}
\addcontentsline{toc}{subsection}{Discussion}
Requirements of RFC2460 are pretty tough. Particularly, lifetimes
longer than boot time require to store allocated labels at stable
storage, so that the full implementation necessarily includes user space flow
label manager. There are at least three different approaches:
\begin{enumerate}
\item {\bf ``Cooperative''. } We could leave flow label allocation wholly
to user space. When user needs label he requests manager directly. The approach
is valid, but as any ``cooperative'' approach it suffers of security problems.
\begin{NB}
One idea is to disallow not privileged user to allocate flow
labels, but instead to pass the socket to manager via \verb|SCM_RIGHTS|
control message, so that it will allocate label and assign it to socket
itself. Hmm... the idea is interesting.
\end{NB}
\item {\bf ``Indirect''.} Kernel redirects requests to user level daemon
and does not install label until the daemon acknowledged the request.
The approach is the most promising, it is especially pleasant to recognize
parallel with IPsec API [RFC2367,Craig]. Actually, it may share API with
IPsec.
\item {\bf ``Stupid''.} To allocate labels in kernel space. It is the simplest
method, but it suffers of two serious flaws: the first,
we cannot lease labels with lifetimes longer than boot time, the second,
it is sensitive to DoS attacks. Kernel have to remember all the obsolete
labels until their expiration and malicious user may fastly eat all the
flow label space.
\end{enumerate}
Certainly, I choose the most ``stupid'' method. It is the cheapest one
for implementor (i.e.\ me), and taking into account that flow labels
still have no serious applications it is not useful to work on more
advanced API, especially, taking into account that eventually we
will get it for no fee together with IPsec.
\paragraph{Implementation.}
\addcontentsline{toc}{subsection}{Implementation}
Socket option \verb|IPV6_FLOWLABEL_MGR| allows to
request flow label manager to allocate new flow label, to reuse
already allocated one or to delete old flow label.
Its argument is \verb|struct| \verb|in6_flowlabel_req|:
\begin{verbatim}
struct in6_flowlabel_req
{
struct in6_addr flr_dst;
__u32 flr_label;
__u8 flr_action;
__u8 flr_share;
__u16 flr_flags;
__u16 flr_expires;
__u16 flr_linger;
__u32 __flr_reserved;
/* Options in format of IPV6_PKTOPTIONS */
};
\end{verbatim}
\begin{itemize}
\item \verb|dst| is IPv6 destination address associated with the label.
\item \verb|label| is flow label value in network byte order. If it is zero,
kernel will allocate new pseudo-random number. Otherwise, kernel will try
to lease flow label ordered by user. In this case, it is user task to provide
necessary flow label randomness.
\item \verb|action| is requested operation. Currently, only three operations
are defined:
\begin{verbatim}
#define IPV6_FL_A_GET 0 /* Get flow label */
#define IPV6_FL_A_PUT 1 /* Release flow label */
#define IPV6_FL_A_RENEW 2 /* Update expire time */
\end{verbatim}
\item \verb|flags| are optional modifiers. Currently
only \verb|IPV6_FL_A_GET| has modifiers:
\begin{verbatim}
#define IPV6_FL_F_CREATE 1 /* Allowed to create new label */
#define IPV6_FL_F_EXCL 2 /* Do not create new label */
\end{verbatim}
\item \verb|share| defines who is allowed to reuse the same flow label.
\begin{verbatim}
#define IPV6_FL_S_NONE 0 /* Not defined */
#define IPV6_FL_S_EXCL 1 /* Label is private */
#define IPV6_FL_S_PROCESS 2 /* May be reused by this process */
#define IPV6_FL_S_USER 3 /* May be reused by this user */
#define IPV6_FL_S_ANY 255 /* Anyone may reuse it */
\end{verbatim}
\item \verb|linger| is time in seconds. After the last user releases flow
label, it will not be reused with different destination and options at least
during this time. If \verb|share| is not \verb|IPV6_FL_S_EXCL| the label
still can be shared by another sockets. Current implementation does not allow
unprivileged user to set linger longer than 60 sec.
\item \verb|expires| is time in seconds. Flow label will be kept at least
for this time, but it will not be destroyed before user released it explicitly
or closed all the sockets using it. Current implementation does not allow
unprivileged user to set timeout longer than 60 sec. Proviledged applications
MAY set longer lifetimes, but in this case they MUST save allocated
labels at stable storage and restore them back after reboot before the first
application allocates new flow.
\end{itemize}
This structure is followed by optional extension headers associated
with this flow label in format of \verb|IPV6_PKTOPTIONS|. Only
\verb|IPV6_HOPOPTS|, \verb|IPV6_RTHDR| and, if \verb|IPV6_RTHDR| presents,
\verb|IPV6_DSTOPTS| are allowed.
\paragraph{Example.}
\addcontentsline{toc}{subsection}{Example}
The function \verb|get_flow_label| allocates
private flow label.
\begin{verbatim}
int get_flow_label(int fd, struct sockaddr_in6 *dst, __u32 fl)
{
int on = 1;
struct in6_flowlabel_req freq;
memset(&freq, 0, sizeof(freq));
freq.flr_label = htonl(fl);
freq.flr_action = IPV6_FL_A_GET;
freq.flr_flags = IPV6_FL_F_CREATE | IPV6_FL_F_EXCL;
freq.flr_share = IPV6_FL_S_EXCL;
memcpy(&freq.flr_dst, &dst->sin6_addr, 16);
if (setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
&freq, sizeof(freq)) == -1) {
perror ("can't lease flowlabel");
return -1;
}
dst->sin6_flowinfo |= freq.flr_label;
if (setsockopt(fd, SOL_IPV6, IPV6_FLOWINFO_SEND,
&on, sizeof(on)) == -1) {
perror ("can't send flowinfo");
freq.flr_action = IPV6_FL_A_PUT;
setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
&freq, sizeof(freq));
return -1;
}
return 0;
}
\end{verbatim}
A bit more complicated example using routing header can be found
in \verb|ping6| utility (\verb|iputils| package). Linux rsvpd backend
contains an example of using operation \verb|IPV6_FL_A_RENEW|.
\paragraph{Listing flow labels.}
\addcontentsline{toc}{subsection}{Listing flow labels}
List of currently allocated
flow labels may be read from \verb|/proc/net/ip6_flowlabel|.
\begin{verbatim}
Label S Owner Users Linger Expires Dst Opt
A1BE5 1 0 0 6 3 3ffe2400000000010a0020fffe71fb30 0
\end{verbatim}
\begin{itemize}
\item \verb|Label| is hexadecimal flow label value.
\item \verb|S| is sharing style.
\item \verb|Owner| is ID of creator, it is zero, pid or uid, depending on
sharing style.
\item \verb|Users| is number of applications using the label now.
\item \verb|Linger| is \verb|linger| of this label in seconds.
\item \verb|Expires| is time until expiration of the label in seconds. It may
be negative, if the label is in use.
\item \verb|Dst| is IPv6 destination address.
\item \verb|Opt| is length of options, associated with the label. Option
data are not accessible.
\end{itemize}
\paragraph{Flow labels and RSVP.}
\addcontentsline{toc}{subsection}{Flow labels and RSVP}
RSVP daemon supports IPv6 flow labels
without any modifications to standard ISI RAPI. Sender must allocate
flow label, fill corresponding sender template and submit it to local rsvp
daemon. rsvpd will check the label and start to announce it in PATH
messages. Rsvpd on sender node will renew the flow label, so that it will not
be reused before path state expires and all the intermediate
routers and receiver purge flow state.
\verb|rtap| utility is modified to parse flow labels. F.e.\ if user allocated
flow label \verb|0xA1234|, he may write:
\begin{verbatim}
RTAP> sender 3ffe:2400::1/FL0xA1234 <Tspec>
\end{verbatim}
Receiver makes reservation with command:
\begin{verbatim}
RTAP> reserve ff 3ffe:2400::1/FL0xA1234 <Flowspec>
\end{verbatim}
\end{document}

View File

@ -1,130 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>ARPD Daemon
<author>Alexey Kuznetsov, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/arpd/ is daemon collecting gratuitous ARP information, saving
it on local disk and feeding it to kernel on demand to avoid
redundant broadcasting due to limited size of kernel ARP cache.
</abstract>
<p><bf/Description/
<p>The format of the command is:
<tscreen><verb>
arpd OPTIONS [ INTERFACE [ INTERFACE ... ] ]
</verb></tscreen>
<p> <tt/OPTIONS/ are:
<itemize>
<item><tt/-l/ - dump <tt/arpd/ database to stdout and exit. Output consists
of three columns: interface index, IP address and MAC address.
Negative entries for dead hosts are also shown, in this case MAC address
is replaced by word <tt/FAILED/ followed by colon and time when the fact
that host is dead was proven the last time.
<item><tt/-f FILE/ - read and load <tt/arpd/ database from <tt/FILE/
in text format similar dumped by option <tt/-l/. Exit after load,
probably listing resulting database, if option <tt/-l/ is also given.
If <tt/FILE/ is <tt/-/, <tt/stdin/ is read to get ARP table.
<item><tt/-b DATABASE/ - location of database file. Default location is
<tt>/var/lib/arpd/arpd.db</tt>.
<item><tt/-a NUMBER/ - <tt/arpd/ not only passively listens ARP on wire, but
also send brodcast queries itself. <tt/NUMBER/ is number of such queries
to make before destination is considered as dead. When <tt/arpd/ is started
as kernel helper (i.e. with <tt/app_solicit/ enabled in <tt/sysctl/
or even with option <tt/-k/) without this option and still did not learn enough
information, you can observe 1 second gaps in service. Not fatal, but
not good.
<item><tt/-k/ - suppress sending broadcast queries by kernel. It takes
sense together with option <tt/-a/.
<item><tt/-n TIME/ - timeout of negative cache. When resolution fails <tt/arpd/
suppresses further attempts to resolve for this period. It makes sense
only together with option <tt/-k/. This timeout should not be too much
longer than boot time of a typical host not supporting gratuitous ARP.
Default value is 60 seconds.
<item><tt/-R RATE/ - maximal steady rate of broadcasts sent by <tt/arpd/
in packets per second. Default value is 1.
<item><tt/-B NUMBER/ - number of broadcasts sent by <tt/arpd/ back to back.
Default value is 3. Together with option <tt/-R/ this option allows
to police broadcasting not to exceed <tt/B+R*T/ over any interval
of time <tt/T/.
</itemize>
<p><tt/INTERFACE/ is name of networking inteface to watch.
If no interfaces given, <tt/arpd/ monitors all the interfaces.
In this case <tt/arpd/ does not adjust <tt/sysctl/ parameters,
it is supposed user does this himself after <tt/arpd/ is started.
<p> Signals
<p> <tt/arpd/ exits gracefully syncing database and restoring adjusted
<tt/sysctl/ parameters, when receives <tt/SIGINT/ or <tt/SIGTERM/.
<tt/SIGHUP/ syncs database to disk. <tt/SIGUSR1/ sends some statistics
to <tt/syslog/. Effect of another signals is undefined, they may corrupt
database and leave <tt/sysctl/ parameters in an unpredictable state.
<p> Note
<p> In order to <tt/arpd/ be able to serve as ARP resolver, kernel must be
compiled with the option <tt/CONFIG_ARPD/ and, in the case when interface list
is not given on command line, variable <tt/app_solicit/
on interfaces of interest should be set in <tt>/proc/sys/net/ipv4/neigh/*</tt>.
If this is not made <tt/arpd/ still collects gratuitous ARP information
in its database.
<p> Examples
<enum>
<item> Start <tt/arpd/ to collect gratuitous ARP, but not messing
with kernel functionality:
<tscreen><verb>
arpd -b /var/tmp/arpd.db
</verb></tscreen>
<item> Look at result after some time:
<tscreen><verb>
killall arpd
arpd -l -b /var/tmp/arpd.db
</verb></tscreen>
<item> To enable kernel helper, leaving leading role to kernel:
<tscreen><verb>
arpd -b /var/tmp/arpd.db -a 1 eth0 eth1
</verb></tscreen>
<item> Completely replace kernel resolution on interfaces <tt/eth0/
and <tt/eth1/. In this case kernel still does unicast probing to
validate entries, but all the broadcast activity is suppressed
and made under authority of <tt/arpd/:
<tscreen><verb>
arpd -b /var/tmp/arpd.db -a 3 -k eth0 eth1
</verb></tscreen>
This is mode which <tt/arpd/ is supposed to work normally.
It is not default just to prevent occasional enabling of too aggressive
mode occasionally.
</enum>
</article>

View File

@ -1,16 +0,0 @@
#! /bin/bash
# $1 = Temporary file . "string"
# $2 = File to process . "string"
# $3 = Page size . ie: a4 , letter ... "string"
# $4 = Number of pages to fit on a single sheet . "numeric"
if type psnup >&/dev/null; then
echo "psnup -$4 -p$3 $1 $2"
psnup -$4 -p$3 $1 $2
elif type psmulti >&/dev/null; then
echo "psmulti $1 > $2"
psmulti $1 > $2
else
echo "cp $1 $2"
cp $1 $2
fi

File diff suppressed because it is too large Load Diff

View File

@ -1,469 +0,0 @@
\documentstyle[12pt,twoside]{article}
\def\TITLE{Tunnels over IP}
\input preamble
\begin{center}
\Large\bf Tunnels over IP in Linux-2.2
\end{center}
\begin{center}
{ \large Alexey~N.~Kuznetsov } \\
\em Institute for Nuclear Research, Moscow \\
\verb|kuznet@ms2.inr.ac.ru| \\
\rm March 17, 1999
\end{center}
\vspace{5mm}
\tableofcontents
\section{Instead of introduction: micro-FAQ.}
\begin{itemize}
\item
Q: In linux-2.0.36 I used:
\begin{verbatim}
ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65
\end{verbatim}
to create tunnel. It does not work in 2.2.0!
A: You are right, it does not work. The command written above is split to two commands.
\begin{verbatim}
ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65
\end{verbatim}
will create tunnel device with name \verb|MY-TUNNEL|. Now you may configure
it with:
\begin{verbatim}
ifconfig MY-TUNNEL 10.0.0.1
\end{verbatim}
Certainly, if you prefer name \verb|tunl1| to \verb|MY-TUNNEL|,
you still may use it.
\item
Q: In linux-2.0.36 I used:
\begin{verbatim}
ifconfig tunl0 10.0.0.1
route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0
\end{verbatim}
to tunnel net 10.0.0.0 via router 193.233.7.65. It does not
work in 2.2.0! Moreover, \verb|route| prints a funny error sort of
``network unreachable'' and after this I found a strange direct route
to 10.0.0.0 via \verb|tunl0| in routing table.
A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly
connected network has not any exceptions. You may tell kernel, that
this particular route is {\em abnormal}:
\begin{verbatim}
ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink
\end{verbatim}
Note keyword \verb|onlink|, it is the magic key that orders kernel
not to check for consistency of gateway address.
Probably, after this explanation you have already guessed another method
to cheat kernel:
\begin{verbatim}
ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
route add -host 193.233.7.65 dev tunl0
route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65
route del -host 193.233.7.65 dev tunl0
\end{verbatim}
Well, if you like such tricks, nobody may prohibit you to use them.
Only do not forget
that between \verb|route add| and \verb|route del| host 193.233.7.65 is
unreachable.
\item
Q: In 2.0.36 I used to load \verb|tunnel| device module and \verb|ipip| module.
I cannot find any \verb|tunnel| in 2.2!
A: Linux-2.2 has single module \verb|ipip| for both directions of tunneling
and for all IPIP tunnel devices.
\item
Q: \verb|traceroute| does not work over tunnel! Well, stop... It works,
only skips some number of hops.
A: Yes. By default tunnel driver copies \verb|ttl| value from
inner packet to outer one. It means that path traversed by tunneled
packets to another endpoint is not hidden. If you dislike this, or if you
are going to use some routing protocol expecting that packets
with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP)
and you are not afraid of
tunnel loops, you may append option \verb|ttl 64|, when creating tunnel
with \verb|ip tunnel add|.
\item
Q: ... Well, list of things, which 2.0 was able to do finishes.
\end{itemize}
\paragraph{Summary of differences between 2.2 and 2.0.}
\begin{itemize}
\item {\bf In 2.0} you could compile tunnel device into kernel
and got set of 4 devices \verb|tunl0| ... \verb|tunl3| or,
alternatively, compile it as module and load new module
for each new tunnel. Also, module \verb|ipip| was necessary
to receive tunneled packets.
{\bf 2.2} has {\em one\/} module \verb|ipip|. Loading it you get base
tunnel device \verb|tunl0| and another tunnels may be created with command
\verb|ip tunnel add|. These new devices may have arbitrary names.
\item {\bf In 2.0} you set remote tunnel endpoint address with
the command \verb|ifconfig| ... \verb|pointopoint A|.
{\bf In 2.2} this command has the same semantics on all
the interfaces, namely it sets not tunnel endpoint,
but address of peering host, which is directly reachable
via this tunnel,
rather than via Internet. Actual tunnel endpoint address \verb|A|
should be set with \verb|ip tunnel add ... remote A|.
\item {\bf In 2.0} you create tunnel routes with the command:
\begin{verbatim}
route add -net 10.0.0.0 gw A dev tunl0
\end{verbatim}
{\bf 2.2} interprets this command equally for all device
kinds and gateway is required to be directly reachable via this tunnel,
rather than via Internet. You still may use \verb|ip route add ... onlink|
to override this behaviour.
\end{itemize}
\section{Tunnel setup: basics}
Standard Linux-2.2 kernel supports three flavor of tunnels,
listed in the following table:
\vspace{2mm}
\begin{tabular}{lll}
\vrule depth 0.8ex width 0pt\relax
Mode & Description & Base device \\
ipip & IP over IP & tunl0 \\
sit & IPv6 over IP & sit0 \\
gre & ANY over GRE over IP & gre0
\end{tabular}
\vspace{2mm}
\noindent All the kinds of tunnels are created with one command:
\begin{verbatim}
ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ]
\end{verbatim}
This command creates new tunnel device with name \verb|<NAME>|.
The \verb|<NAME>| is an arbitrary string. Particularly,
it may be even \verb|eth0|. The rest of parameters set
different tunnel characteristics.
\begin{itemize}
\item
\verb|mode <MODE>| sets tunnel mode. Three modes are available now
\verb|ipip|, \verb|sit| and \verb|gre|.
\item
\verb|remote <D>| sets remote endpoint of the tunnel to IP
address \verb|<D>|.
\item
\verb|local <S>| sets fixed local address for tunneled
packets. It must be an address on another interface of this host.
\end{itemize}
\let\thefootnote\oldthefootnote
Both \verb|remote| and \verb|local| may be omitted. In this case we
say that they are zero or wildcard. Two tunnels of one mode cannot
have the same \verb|remote| and \verb|local|. Particularly it means
that base device or fallback tunnel cannot be replicated.\footnote{
This restriction is relaxed for keyed GRE tunnels.}
Tunnels are divided to two classes: {\bf pointopoint} tunnels, which
have some not wildcard \verb|remote| address and deliver all the packets
to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels,
which have no \verb|remote|. Particularly, base devices (f.e.\ \verb|tunl0|)
are NBMA, because they have neither \verb|remote| nor
\verb|local| addresses.
After tunnel device is created you should configure it as you did
it with another devices. Certainly, the configuration of tunnels has
some features related to the fact that they work over existing Internet
routing infrastructure and simultaneously create new virtual links,
which changes this infrastructure. The danger that not enough careful
tunnel setup will result in formation of tunnel loops,
collapse of routing or flooding network with exponentially
growing number of tunneled fragments is very real.
Protocol setup on pointopoint tunnels does not differ of configuration
of another devices. You should set a protocol address with \verb|ifconfig|
and add routes with \verb|route| utility.
NBMA tunnels are different. To route something via NBMA tunnel
you have to explain to driver, where it should deliver packets to.
The only way to make it is to create special routes with gateway
address pointing to desired endpoint. F.e.\
\begin{verbatim}
ip route add 10.0.0.0/24 via <A> dev tunl0 onlink
\end{verbatim}
It is important to use option \verb|onlink|, otherwise
kernel will refuse request to create route via gateway not directly
reachable over device \verb|tunl0|. With IPv6 the situation is much simpler:
when you start device \verb|sit0|, it automatically configures itself
with all IPv4 addresses mapped to IPv6 space, so that all IPv4
Internet is {\em really reachable} via \verb|sit0|! Excellent, the command
\begin{verbatim}
ip route add 3FFE::/16 via ::193.233.7.65 dev sit0
\end{verbatim}
will route \verb|3FFE::/16| via \verb|sit0|, sending all the packets
destined to this prefix to 193.233.7.65.
\section{Tunnel setup: options}
Command \verb|ip tunnel add| has several additional options.
\begin{itemize}
\item \verb|ttl N| --- set fixed TTL \verb|N| on tunneled packets.
\verb|N| is number in the range 1--255. 0 is special value,
meaning that packets inherit TTL value.
Default value is: \verb|inherit|.
\item \verb|tos T| --- set fixed tos \verb|T| on tunneled packets.
Default value is: \verb|inherit|.
\item \verb|dev DEV| --- bind tunnel to device \verb|DEV|, so that
tunneled packets will be routed only via this device and will
not be able to escape to another device, when route to endpoint changes.
\item \verb|nopmtudisc| --- disable Path MTU Discovery on this tunnel.
It is enabled by default. Note that fixed ttl is incompatible
with this option: tunnels with fixed ttl always make pmtu discovery.
\end{itemize}
\verb|ipip| and \verb|sit| tunnels have no more options. \verb|gre|
tunnels are more complicated:
\begin{itemize}
\item \verb|key K| --- use keyed GRE with key \verb|K|. \verb|K| is
either number or IP address-like dotted quad.
\item \verb|csum| --- checksum tunneled packets.
\item \verb|seq| --- serialize packets.
\begin{NB}
I think this option does not
work. At least, I did not test it, did not debug it and
even do not understand, how it is supposed to work and for what
purpose Cisco planned to use it.
\end{NB}
\end{itemize}
Actually, these GRE options can be set separately for input and
output directions by prefixing corresponding keywords with letter
\verb|i| or \verb|o|. F.e.\ \verb|icsum| orders to accept only
packets with correct checksum and \verb|ocsum| means, that
our host will calculate and send checksum.
Command \verb|ip tunnel add| is not the only operation,
which can be made with tunnels. Certainly, you may get short help page
with:
\begin{verbatim}
ip tunnel help
\end{verbatim}
Besides that, you may view list of installed tunnels with the help of command:
\begin{verbatim}
ip tunnel ls
\end{verbatim}
Also you may look at statistics:
\begin{verbatim}
ip -s tunnel ls Cisco
\end{verbatim}
where \verb|Cisco| is name of tunnel device. Command
\begin{verbatim}
ip tunnel del Cisco
\end{verbatim}
destroys tunnel \verb|Cisco|. And, finally,
\begin{verbatim}
ip tunnel change Cisco mode sit local ME remote HE ttl 32
\end{verbatim}
changes its parameters.
\section{Differences 2.2 and 2.0 tunnels revisited.}
Now we can discuss more subtle differences between tunneling in 2.0
and 2.2.
\begin{itemize}
\item In 2.0 all tunneled packets were received promiscuously
as soon as you loaded module \verb|ipip|. 2.2 tries to select the best
tunnel device and packet looks as received on this. F.e.\ if host
received \verb|ipip| packet from host \verb|D| destined to our
local address \verb|S|, kernel searches for matching tunnels
in order:
\begin{tabular}{ll}
1 & \verb|remote| is \verb|D| and \verb|local| is \verb|S| \\
2 & \verb|remote| is \verb|D| and \verb|local| is wildcard \\
3 & \verb|remote| is wildcard and \verb|local| is \verb|S| \\
4 & \verb|tunl0|
\end{tabular}
If tunnel exists, but it is not in \verb|UP| state, the tunnel is ignored.
Note, that if \verb|tunl0| is \verb|UP| it receives all the IPIP packets,
not acknowledged by more specific tunnels.
Be careful, it means that without carefully installed firewall rules
anyone on the Internet may inject to your network any packets with
source addresses indistinguishable from local ones. It is not so bad idea
to design tunnels in the way enforcing maximal route symmetry
and to enable reversed path filter (\verb|rp_filter| sysctl option) on
tunnel devices.
\item In 2.2 you can monitor and debug tunnels with \verb|tcpdump|.
F.e.\ \verb|tcpdump| \verb|-i Cisco| \verb|-nvv| will dump packets,
which kernel output, via tunnel \verb|Cisco| and the packets received on it
from kernel viewpoint.
\end{itemize}
\section{Linux and Cisco IOS tunnels.}
Among another tunnels Cisco IOS supports IPIP and GRE.
Essentially, Cisco setup is subset of options, available for Linux.
Let us consider the simplest example:
\begin{verbatim}
interface Tunnel0
tunnel mode gre ip
tunnel source 10.10.14.1
tunnel destination 10.10.13.2
\end{verbatim}
This command set translates to:
\begin{verbatim}
ip tunnel add Tunnel0 \
mode gre \
local 10.10.14.1 \
remote 10.10.13.2
\end{verbatim}
Any questions? No questions.
\section{Interaction IPIP tunnels and DVMRP.}
DVMRP exploits IPIP tunnels to route multicasts via Internet.
\verb|mrouted| creates
IPIP tunnels listed in its configuration file automatically.
From kernel and user viewpoints there are no differences between
tunnels, created in this way, and tunnels created by \verb|ip tunnel|.
I.e.\ if \verb|mrouted| created some tunnel, it may be used to
route unicast packets, provided appropriate routes are added.
And vice versa, if administrator has already created a tunnel,
it will be reused by \verb|mrouted|, if it requests DVMRP
tunnel with the same local and remote addresses.
Do not wonder, if your manually configured tunnel is
destroyed, when mrouted exits.
\section{Broadcast GRE ``tunnels''.}
It is possible to set \verb|remote| for GRE tunnel to a multicast
address. Such tunnel becomes {\bf broadcast} tunnel (though word
tunnel is not quite appropriate in this case, it is rather virtual network).
\begin{verbatim}
ip tunnel add Universe local 193.233.7.65 \
remote 224.66.66.66 ttl 16
ip addr add 10.0.0.1/16 dev Universe
ip link set Universe up
\end{verbatim}
This tunnel is true broadcast network and broadcast packets are
sent to multicast group 224.66.66.66. By default such tunnel starts
to resolve both IP and IPv6 addresses via ARP/NDISC, so that
if multicast routing is supported in surrounding network, all GRE nodes
will find one another automatically and will form virtual Ethernet-like
broadcast network. If multicast routing does not work, it is unpleasant
but not fatal flaw. The tunnel becomes NBMA rather than broadcast network.
You may disable dynamic ARPing by:
\begin{verbatim}
echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit
\end{verbatim}
and to add required information to ARP tables manually:
\begin{verbatim}
ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent
\end{verbatim}
In this case packets sent to 10.0.0.2 will be encapsulated in GRE
and sent to 128.6.190.2. It is possible to facilitate address resolution
using methods typical for another NBMA networks f.e.\ to start user
level \verb|arpd| daemon, which will maintain database of hosts attached
to GRE virtual network or ask for information
dedicated ARP or NHRP server.
Actually, such setup is the most natural for tunneling,
it is really flexible, scalable and easily managable, so that
it is strongly recommended to be used with GRE tunnels instead of ugly
hack with NBMA mode and \verb|onlink| modifier. Unfortunately,
by historical reasons broadcast mode is not supported by IPIP tunnels,
but this probably will change in future.
\section{Traffic control issues.}
Tunnels are devices, hence all the power of Linux traffic control
applies to them. The simplest (and the most useful in practice)
example is limiting tunnel bandwidth. The following command:
\begin{verbatim}
tc qdisc add dev tunl0 root tbf \
rate 128Kbit burst 4K limit 10K
\end{verbatim}
will limit tunneled traffic to 128Kbit with maximal burst size of 4K
and queuing not more than 10K.
However, you should remember, that tunnels are {\em virtual} devices
implemented in software and true queue management is impossible for them
just because they have no queues. Instead, it is better to create classes
on real physical interfaces and to map tunneled packets to them.
In general case of dynamic routing you should create such classes
on all outgoing interfaces, or, alternatively,
to use option \verb|dev DEV| to bind tunnel to a fixed physical device.
In the last case packets will be routed only via specified device
and you need to setup corresponding classes only on it.
Though you have to pay for this convenience,
if routing will change, your tunnel will fail.
Suppose that CBQ class \verb|1:ABC| has been created on device \verb|eth0|
specially for tunnel \verb|Cisco| with endpoints \verb|S| and \verb|D|.
Now you can select IPIP packets with addresses \verb|S| and \verb|D|
with some classifier and map them to class \verb|1:ABC|. F.e.\
it is easy to make with \verb|rsvp| classifier:
\begin{verbatim}
tc filter add dev eth0 pref 100 proto ip rsvp \
session D ipproto ipip filter S \
classid 1:ABC
\end{verbatim}
If you want to make more detailed classification of sub-flows
transmitted via tunnel, you can build CBQ subtree,
rooted at \verb|1:ABC| and attach to subroot set of rules parsing
IPIP packets more deeply.
\end{document}

View File

@ -1,110 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>NSTAT, IFSTAT and RTACCT Utilities
<author>Alexey Kuznetsov, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/nstat/, <tt/ifstat/ and <tt/rtacct/ are simple tools helping
to monitor kernel snmp counters and network interface statistics.
</abstract>
<p> These utilities are very similar, so that I describe
them simultaneously, using name <tt/Xstat/ in the places which apply
to all of them.
<p>The format of the command is:
<tscreen><verb>
Xstat [ OPTIONS ] [ PATTERN [ PATTERN ... ] ]
</verb></tscreen>
<p>
<tt/PATTERN/ is shell style pattern, selecting identifier
of SNMP variables or interfaces to show. Variable is displayed
if one of patterns matches its name. If no patterns are given,
<tt/Xstat/ assumes that user wants to see all the variables.
<p> <tt/OPTIONS/ is list of single letter options, using common unix
conventions.
<itemize>
<item><tt/-h/ - show help page
<item><tt/-?/ - the same, of course
<item><tt/-v/, <tt/-V/ - print version of <tt/Xstat/ and exit
<item><tt/-z/ - dump zero counters too. By default they are not shown.
<item><tt/-a/ - dump absolute values of counters. By default <tt/Xstat/
calculates increments since the previous use.
<item><tt/-s/ - do not update history, so that the next time you will
see counters including values accumulated to the moment
of this measurement too.
<item><tt/-n/ - do not display anything, only update history.
<item><tt/-r/ - reset history.
<item><tt/-d INTERVAL/ - <tt/Xstat/ is run in daemon mode collecting
statistics. <tt/INTERVAL/ is interval between measurements
in seconds.
<item><tt/-t INTERVAL/ - time interval to average rates. Default value
is 60 seconds.
<item><tt/-e/ - display extended information about errors (<tt/ifstat/ only).
</itemize>
<p>
History is just dump saved in file <tt>/tmp/.Xstat.uUID</tt>
or in file given by environment variables <tt/NSTAT_HISTORY/,
<tt/IFSTAT_HISTORY/ and <tt/RTACCT_HISTORY/.
Each time when you use <tt/Xstat/ values there are updated.
If you use patterns, only the values which you _really_ see
are updated. If you want to skip an unintersting period,
use option <tt/-n/, or just output to <tt>/dev/null</tt>.
<p>
<tt/Xstat/ understands when history is invalidated by system reboot
or source of information switched between different instances
of daemonic <tt/Xstat/ and kernel SNMP tables and does not
use invalid history.
<p> Beware, <tt/Xstat/ will not produce sane output,
when many processes use it simultaneously. If several processes
under single user need this utility they should use environment
variables to put their history in safe places
or to use it with options <tt/-a -s/.
<p>
Well, that's all. The utility is very simple, but nevertheless
very handy.
<p> <bf/Output of XSTAT/
<p> The first line of output is <tt/#/ followed by identifier
of source of information, it may be word <tt/kernel/, when <tt/Xstat/
gets information from kernel or some dotted decimal number followed
by parameters, when it obtains information from running <tt/Xstat/ daemon.
<p>In the case of <tt/nstat/ the rest of output consists of three columns:
SNMP MIB identifier,
its value (or increment since previous measurement) and average
rate of increase of the counter per second. <tt/ifstat/ outputs
interface name followed by pairs of counter and rate of its change.
<p> <bf/Daemonic Xstat/
<p> <tt/Xstat/ may be started as daemon by any user. This makes sense
to avoid wrapped counters and to obtain reasonable long counters
for large time. Also <tt/Xstat/ daemon calculates average rates.
For the first goal sampling interval (option <tt/-d/) may be large enough,
f.e. for gigabit rates byte counters overflow not more frequently than
each 40 seconds and you may select interval of 20 seconds.
From the other hand, when <tt/Xstat/ is used for estimating rates
interval should be less than averaging period (option <tt/-t/), otherwise
estimation loses in quality.
Client <tt/Xstat/, before trying to get information from the kernel,
contacts daemon started by this user, then it tries system wide
daemon, which is supposed to be started by superuser. And only if
none of them replied it gets information from kernel.
<p> <bf/Environment/
<p> <tt/NSTAT_HISTORY/ - name of history file for <tt/nstat/.
<p> <tt/IFSTAT_HISTORY/ - name of history file for <tt/ifstat/.
<p> <tt/RTACCT_HISTORY/ - name of history file for <tt/rtacct/.
</article>

View File

@ -1,26 +0,0 @@
\textwidth 6.0in
\textheight 8.5in
\input SNAPSHOT
\pagestyle{myheadings}
\markboth{\protect\TITLE}{}
\markright{{\protect\sc iproute2-ss\Draft}}
% To print it in compact form: both sides on one sheet (psnup -2)
\evensidemargin=\oddsidemargin
\newenvironment{NB}{\bgroup \vskip 1mm\leftskip 1cm \footnotesize \noindent NB.
}{\par\egroup \vskip 1mm}
\def\threeonly{[2.3.15+ only] }
\begin{document}
\makeatletter
\renewcommand{\@oddhead}{{\protect\sc iproute2-ss\Draft} \hfill \protect\arabic{page}}
\makeatother
\let\oldthefootnote\thefootnote
\def\thefootnote{}
\footnotetext{Copyright \copyright~1999 A.N.Kuznetsov}

View File

@ -1,52 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>RTACCT Utility
<author>Robert Olsson
<date>some_negative_number, 20 Dec 2001
<p>
Here is some code for monitoring the route cache. For systems handling high
network load, servers, routers, firewalls etc the route cache and its garbage
collection is crucial. Linux has a solid implementation.
<p>
The kernel patch (not required since linux-2.4.7) adds statistics counters
from route cache process into
/proc/net/rt_cache_stat. A companion user mode program presents the statistics
in a vmstat or iostat manner. The ratio between cache hits and misses gives
the flow length.
<p>
Hopefully it can help understanding performance and DoS and other related
issues.
<p> An URL where newer versions of this utility can be (probably) found
is ftp://robur.slu.se/pub/Linux/net-development/rt_cache_stat/
<p><bf/Description/
<p>The format of the command is:
<tscreen><verb>
rtstat [ OPTIONS ]
</verb></tscreen>
<p> <tt/OPTIONS/ are:
<itemize>
<item><tt/-h/, <tt/-help/ - show help page and version of the utility.
<item><tt/-i INTERVAL/ - interval between snapshots, default value is
2 seconds.
<item><tt/-s NUMBER/ - whether to print header line. 0 inhibits header line,
1 prescribes to print it once and 2 (this is default setting) forces header
line each 20 lines.
</itemize>
</article>

View File

@ -1,525 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>SS Utility: Quick Intro
<author>Alexey Kuznetsov, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/ss/ is one another utility to investigate sockets.
Functionally it is NOT better than <tt/netstat/ combined
with some perl/awk scripts and though it is surely faster
it is not enough to make it much better. :-)
So, stop reading this now and do not waste your time.
Well, certainly, it proposes some functionality, which current
netstat is still not able to do, but surely will soon.
</abstract>
<sect>Why?
<p> <tt>/proc</tt> interface is inadequate, unfortunately.
When amount of sockets is enough large, <tt/netstat/ or even
plain <tt>cat /proc/net/tcp/</tt> cause nothing but pains and curses.
In linux-2.4 the desease became worse: even if amount
of sockets is small reading <tt>/proc/net/tcp/</tt> is slow enough.
This utility presents a new approach, which is supposed to scale
well. I am not going to describe technical details here and
will concentrate on description of the command.
The only important thing to say is that it is not so bad idea
to load module <tt/tcp_diag/, which can be found in directory
<tt/Modules/ of <tt/iproute2/. If you do not make this <tt/ss/
will work, but it falls back to <tt>/proc</tt> and becomes slow
like <tt/netstat/, well, a bit faster yet (see section "Some numbers").
<sect>Old news
<p>
In the simplest form <tt/ss/ is equivalent to netstat
with some small deviations.
<itemize>
<item><tt/ss -t -a/ dumps all TCP sockets
<item><tt/ss -u -a/ dumps all UDP sockets
<item><tt/ss -w -a/ dumps all RAW sockets
<item><tt/ss -x -a/ dumps all UNIX sockets
</itemize>
<p>
Option <tt/-o/ shows TCP timers state.
Option <tt/-e/ shows some extended information.
Etc. etc. etc. Seems, all the options of netstat related to sockets
are supported. Though not AX.25 and other bizarres. :-)
If someone wants, he can make support for decnet and ipx.
Some rudimentary support for them is already present in iproute2 libutils,
and I will be glad to see these new members.
<p>
However, standard functionality is a bit different:
<p>
The first: without option <tt/-a/ sockets in states
<tt/TIME-WAIT/ and <tt/SYN-RECV/ are skipped too.
It is more reasonable default, I think.
<p>
The second: format of UNIX sockets is different. It coincides
with tcp/udp. Though standard kernel still does not allow to
see write/read queues and peer address of connected UNIX sockets,
the patch doing this exists.
<p>
The third: default is to dump only TCP sockets, rather than all of the types.
<p>
The next: by default it does not resolve numeric host addresses (like <tt/ip/)!
Resolving is enabled with option <tt/-r/. Service names, usually stored
in local files, are resolved by default. Also, if service database
does not contain references to a port, <tt/ss/ queries system
<tt/rpcbind/. RPC services are prefixed with <tt/rpc./
Resolution of services may be suppressed with option <tt/-n/.
<p>
It does not accept "long" options (I dislike them, sorry).
So, address family is given with family identifier following
option <tt/-f/ to be algined to iproute2 conventions.
Mostly, it is to allow option parser to parse
addresses correctly, but as side effect it really limits dumping
to sockets supporting only given family. Option <tt/-A/ followed
by list of socket tables to dump is also supported.
Logically, id of socket table is different of _address_ family, which is
another point of incompatibility. So, id is one of
<tt/all/, <tt/tcp/, <tt/udp/,
<tt/raw/, <tt/inet/, <tt/unix/, <tt/packet/, <tt/netlink/. See?
Well, <tt/inet/ is just abbreviation for <tt/tcp|udp|raw/
and it is not difficult to guess that <tt/packet/ allows
to look at packet sockets. Actually, there are also some other abbreviations,
f.e. <tt/unix_dgram/ selects only datagram UNIX sockets.
<p>
The next: well, I still do not know. :-)
<sect>Time to talk about new functionality.
<p>It is builtin filtering of socket lists.
<sect1> Filtering by state.
<p>
<tt/ss/ allows to filter socket states, using keywords
<tt/state/ and <tt/exclude/, followed by some state
identifier.
<p>
State identifier are standard TCP state names (not listed,
they are useless for you if you already do not know them)
or abbreviations:
<itemize>
<item><tt/all/ - for all the states
<item><tt/bucket/ - for TCP minisockets (<tt/TIME-WAIT|SYN-RECV/)
<item><tt/big/ - all except for minisockets
<item><tt/connected/ - not closed and not listening
<item><tt/synchronized/ - connected and not <tt/SYN-SENT/
</itemize>
<p>
F.e. to dump all tcp sockets except <tt/SYN-RECV/:
<tscreen><verb>
ss exclude SYN-RECV
</verb></tscreen>
<p>
If neither <tt/state/ nor <tt/exclude/ directives
are present,
state filter defaults to <tt/all/ with option <tt/-a/
or to <tt/all/,
excluding listening, syn-recv, time-wait and closed sockets.
<sect1> Filtering by addresses and ports.
<p>
Option list may contain address/port filter.
It is boolean expression which consists of boolean operation
<tt/or/, <tt/and/, <tt/not/ and predicates.
Actually, all the flavors of names for boolean operations are eaten:
<tt/&amp/, <tt/&amp&amp/, <tt/|/, <tt/||/, <tt/!/, but do not forget
about special sense given to these symbols by unix shells and escape
them correctly, when used from command line.
<p>
Predicates may be of the folowing kinds:
<itemize>
<item>A. Address/port match, where address is checked against mask
and port is either wildcard or exact. It is one of:
<tscreen><verb>
dst prefix:port
src prefix:port
src unix:STRING
src link:protocol:ifindex
src nl:channel:pid
</verb></tscreen>
Both prefix and port may be absent or replaced with <tt/*/,
which means wildcard. UNIX socket use more powerful scheme
matching to socket names by shell wildcards. Also, prefixes
unix: and link: may be omitted, if address family is evident
from context (with option <tt/-x/ or with <tt/-f unix/
or with <tt/unix/ keyword)
<p>
F.e.
<tscreen><verb>
dst 10.0.0.1
dst 10.0.0.1:
dst 10.0.0.1/32:
dst 10.0.0.1:*
</verb></tscreen>
are equivalent and mean socket connected to
any port on host 10.0.0.1
<tscreen><verb>
dst 10.0.0.0/24:22
</verb></tscreen>
sockets connected to port 22 on network
10.0.0.0...255.
<p>
Note that port separated of address with colon, which creates
troubles with IPv6 addresses. Generally, we interpret the last
colon as splitting port. To allow to give IPv6 addresses,
trick like used in IPv6 HTTP URLs may be used:
<tscreen><verb>
dst [::1]
</verb></tscreen>
are sockets connected to ::1 on any port
<p>
Another way is <tt/dst ::1/128/. / helps to understand that
colon is part of IPv6 address.
<p>
Now we can add another alias for <tt/dst 10.0.0.1/:
<tt/dst [10.0.0.1]/. :-)
<p> Address may be a DNS name. In this case all the addresses are looked
up (in all the address families, if it is not limited by option <tt/-f/
or special address prefix <tt/inet:/, <tt/inet6/) and resulting
expression is <tt/or/ over all of them.
<item> B. Port expressions:
<tscreen><verb>
dport &gt= :1024
dport != :22
sport &lt :32000
</verb></tscreen>
etc.
All the relations: <tt/&lt/, <tt/&gt/, <tt/=/, <tt/>=/, <tt/=/, <tt/==/,
<tt/!=/, <tt/eq/, <tt/ge/, <tt/lt/, <tt/ne/...
Use variant which you like more, but not forget to escape special
characters when typing them in command line. :-)
Note that port number syntactically coincides to the case A!
You may even add an IP address, but it will not participate
incomparison, except for <tt/==/ and <tt/!=/, which are equivalent
to corresponding predicates of type A. F.e.
<p>
<tt/dst 10.0.0.1:22/
is equivalent to <tt/dport eq 10.0.0.1:22/
and
<tt/not dst 10.0.0.1:22/ is equivalent to
<tt/dport neq 10.0.0.1:22/
<item>C. Keyword <tt/autobound/. It matches to sockets bound automatically
on local system.
</itemize>
<sect> Examples
<p>
<itemize>
<item>1. List all the tcp sockets in state <tt/FIN-WAIT-1/ for our apache
to network 193.233.7/24 and look at their timers:
<tscreen><verb>
ss -o state fin-wait-1 \( sport = :http or sport = :https \) \
dst 193.233.7/24
</verb></tscreen>
Oops, forgot to say that missing logical operation is
equivalent to <tt/and/.
<item> 2. Well, now look at the rest...
<tscreen><verb>
ss -o excl fin-wait-1
ss state fin-wait-1 \( sport neq :http and sport neq :https \) \
or not dst 193.233.7/24
</verb></tscreen>
Note that we have to do _two_ calls of ss to do this.
State match is always anded to address/port match.
The reason for this is purely technical: ss does fast skip of
not matching states before parsing addresses and I consider the
ability to skip fastly gobs of time-wait and syn-recv sockets
as more important than logical generality.
<item> 3. So, let's look at all our sockets using autobound ports:
<tscreen><verb>
ss -a -A all autobound
</verb></tscreen>
<item> 4. And eventually find all the local processes connected
to local X servers:
<tscreen><verb>
ss -xp dst "/tmp/.X11-unix/*"
</verb></tscreen>
Pardon, this does not work with current kernel, patching is required.
But we still can look at server side:
<tscreen><verb>
ss -x src "/tmp/.X11-unix/*"
</verb></tscreen>
</itemize>
<sect> Returning to ground: real manual
<p>
<sect1> Command arguments
<p> General format of arguments to <tt/ss/ is:
<tscreen><verb>
ss [ OPTIONS ] [ STATE-FILTER ] [ ADDRESS-FILTER ]
</verb></tscreen>
<sect2><tt/OPTIONS/
<p> <tt/OPTIONS/ is list of single letter options, using common unix
conventions.
<itemize>
<item><tt/-h/ - show help page
<item><tt/-?/ - the same, of course
<item><tt/-v/, <tt/-V/ - print version of <tt/ss/ and exit
<item><tt/-s/ - print summary statistics. This option does not parse
socket lists obtaining summary from various sources. It is useful
when amount of sockets is so huge that parsing <tt>/proc/net/tcp</tt>
is painful.
<item><tt/-D FILE/ - do not display anything, just dump raw information
about TCP sockets to <tt/FILE/ after applying filters. If <tt/FILE/ is <tt/-/
<tt/stdout/ is used.
<item><tt/-F FILE/ - read continuation of filter from <tt/FILE/.
Each line of <tt/FILE/ is interpreted like single command line option.
If <tt/FILE/ is <tt/-/ <tt/stdin/ is used.
<item><tt/-r/ - try to resolve numeric address/ports
<item><tt/-n/ - do not try to resolve ports
<item><tt/-o/ - show some optional information, f.e. TCP timers
<item><tt/-i/ - show some infomration specific to TCP (RTO, congestion
window, slow start threshould etc.)
<item><tt/-e/ - show even more optional information
<item><tt/-m/ - show extended information on memory used by the socket.
It is available only with <tt/tcp_diag/ enabled.
<item><tt/-p/ - show list of processes owning the socket
<item><tt/-f FAMILY/ - default address family used for parsing addresses.
Also this option limits listing to sockets supporting
given address family. Currently the following families
are supported: <tt/unix/, <tt/inet/, <tt/inet6/, <tt/link/,
<tt/netlink/.
<item><tt/-4/ - alias for <tt/-f inet/
<item><tt/-6/ - alias for <tt/-f inet6/
<item><tt/-0/ - alias for <tt/-f link/
<item><tt/-A LIST-OF-TABLES/ - list of socket tables to dump, separated
by commas. The following identifiers are understood:
<tt/all/, <tt/inet/, <tt/tcp/, <tt/udp/, <tt/raw/,
<tt/unix/, <tt/packet/, <tt/netlink/, <tt/unix_dgram/,
<tt/unix_stream/, <tt/packet_raw/, <tt/packet_dgram/.
<item><tt/-x/ - alias for <tt/-A unix/
<item><tt/-t/ - alias for <tt/-A tcp/
<item><tt/-u/ - alias for <tt/-A udp/
<item><tt/-w/ - alias for <tt/-A raw/
<item><tt/-a/ - show sockets of all the states. By default sockets
in states <tt/LISTEN/, <tt/TIME-WAIT/, <tt/SYN_RECV/
and <tt/CLOSE/ are skipped.
<item><tt/-l/ - show only sockets in state <tt/LISTEN/
</itemize>
<sect2><tt/STATE-FILTER/
<p><tt/STATE-FILTER/ allows to construct arbitrary set of
states to match. Its syntax is sequence of keywords <tt/state/
and <tt/exclude/ followed by identifier of state.
Available identifiers are:
<p>
<itemize>
<item> All standard TCP states: <tt/established/, <tt/syn-sent/,
<tt/syn-recv/, <tt/fin-wait-1/, <tt/fin-wait-2/, <tt/time-wait/,
<tt/closed/, <tt/close-wait/, <tt/last-ack/, <tt/listen/ and <tt/closing/.
<item><tt/all/ - for all the states
<item><tt/connected/ - all the states except for <tt/listen/ and <tt/closed/
<item><tt/synchronized/ - all the <tt/connected/ states except for
<tt/syn-sent/
<item><tt/bucket/ - states, which are maintained as minisockets, i.e.
<tt/time-wait/ and <tt/syn-recv/.
<item><tt/big/ - opposite to <tt/bucket/
</itemize>
<sect2><tt/ADDRESS_FILTER/
<p><tt/ADDRESS_FILTER/ is boolean expression with operations <tt/and/, <tt/or/
and <tt/not/, which can be abbreviated in C style f.e. as <tt/&amp/,
<tt/&amp&amp/.
<p>
Predicates check socket addresses, both local and remote.
There are the following kinds of predicates:
<itemize>
<item> <tt/dst ADDRESS_PATTERN/ - matches remote address and port
<item> <tt/src ADDRESS_PATTERN/ - matches local address and port
<item> <tt/dport RELOP PORT/ - compares remote port to a number
<item> <tt/sport RELOP PORT/ - compares local port to a number
<item> <tt/autobound/ - checks that socket is bound to an ephemeral
port
</itemize>
<p><tt/RELOP/ is some of <tt/&lt=/, <tt/&gt=/, <tt/==/ etc.
To make this more convinient for use in unix shell, alphabetic
FORTRAN-like notations <tt/le/, <tt/gt/ etc. are accepted as well.
<p>The format and semantics of <tt/ADDRESS_PATTERN/ depends on address
family.
<itemize>
<item><tt/inet/ - <tt/ADDRESS_PATTERN/ consists of IP prefix, optionally
followed by colon and port. If prefix or port part is absent or replaced
with <tt/*/, this means wildcard match.
<item><tt/inet6/ - The same as <tt/inet/, only prefix refers to an IPv6
address. Unlike <tt/inet/ colon becomes ambiguous, so that <tt/ss/ allows
to use scheme, like used in URLs, where address is suppounded with
<tt/[/ ... <tt/]/.
<item><tt/unix/ - <tt/ADDRESS_PATTERN/ is shell-style wildcard.
<item><tt/packet/ - format looks like <tt/inet/, only interface index
stays instead of port and link layer protocol id instead of address.
<item><tt/netlink/ - format looks like <tt/inet/, only socket pid
stays instead of port and netlink channel instead of address.
</itemize>
<p><tt/PORT/ is syntactically <tt/ADDRESS_PATTERN/ with wildcard
address part. Certainly, it is undefined for UNIX sockets.
<sect1> Environment variables
<p>
<tt/ss/ allows to change source of information using various
environment variables:
<p>
<itemize>
<item> <tt/PROC_SLABINFO/ to override <tt>/proc/slabinfo</tt>
<item> <tt/PROC_NET_TCP/ to override <tt>/proc/net/tcp</tt>
<item> <tt/PROC_NET_UDP/ to override <tt>/proc/net/udp</tt>
<item> etc.
</itemize>
<p>
Variable <tt/PROC_ROOT/ allows to change root of all the <tt>/proc/</tt>
hierarchy.
<p>
Variable <tt/TCPDIAG_FILE/ prescribes to open a file instead of
requesting kernel to dump information about TCP sockets.
<p> This option is used mainly to investigate bug reports,
when dumps of files usually found in <tt>/proc/</tt> are recevied
by e-mail.
<sect1> Output format
<p>Six columns. The first is <tt/Netid/, it denotes socket type and
transport protocol, when it is ambiguous: <tt/tcp/, <tt/udp/, <tt/raw/,
<tt/u_str/ is abbreviation for <tt/unix_stream/, <tt/u_dgr/ for UNIX
datagram sockets, <tt/nl/ for netlink, <tt/p_raw/ and <tt/p_dgr/ for
raw and datagram packet sockets. This column is optional, it will
be hidden, if filter selects an unique netid.
<p>
The second column is <tt/State/. Socket state is displayed here.
The names are standard TCP names, except for <tt/UNCONN/, which
cannot happen for TCP, but normal for not connected sockets
of another types. Again, this column can be hidden.
<p>
Then two columns (<tt/Recv-Q/ and <tt/Send-Q/) showing amount of data
queued for receive and transmit.
<p>
And the last two columns display local address and port of the socket
and its peer address, if the socket is connected.
<p>
If options <tt/-o/, <tt/-e/ or <tt/-p/ were given, options are
displayed not in fixed positions but separated by spaces pairs:
<tt/option:value/. If value is not a single number, it is presented
as list of values, enclosed to <tt/(/ ... <tt/)/ and separated with
commas. F.e.
<tscreen><verb>
timer:(keepalive,111min,0)
</verb></tscreen>
is typical format for TCP timer (option <tt/-o/).
<tscreen><verb>
users:((X,113,3))
</verb></tscreen>
is typical for list of users (option <tt/-p/).
<sect>Some numbers
<p>
Well, let us use <tt/pidentd/ and a tool <tt/ibench/ to measure
its performance. It is 30 requests per second here. Nothing to test,
it is too slow. OK, let us patch pidentd with patch from directory
Patches. After this it handles about 4300 requests per second
and becomes handy tool to pollute socket tables with lots of timewait
buckets.
<p>
So, each test starts from pollution tables with 30000 sockets
and then doing full dump of the table piped to wc and measuring
timings with time:
<p>Results:
<itemize>
<item> <tt/netstat -at/ - 15.6 seconds
<item> <tt/ss -atr/, but without <tt/tcp_diag/ - 5.4 seconds
<item> <tt/ss -atr/ with <tt/tcp_diag/ - 0.47 seconds
</itemize>
No comments. Though one comment is necessary, most of time
without <tt/tcp_diag/ is wasted inside kernel with completely
blocked networking. More than 10 seconds, yes. <tt/tcp_diag/
does the same work for 100 milliseconds of system time.
</article>

View File

@ -1,514 +0,0 @@
\documentclass[12pt,twoside]{article}
\usepackage[hidelinks]{hyperref} % \url
\usepackage{booktabs} % nicer tabulars
\usepackage{fancyvrb}
\usepackage{fullpage}
\usepackage{float}
\newcommand{\iface}{\textit}
\newcommand{\cmd}{\texttt}
\newcommand{\man}{\textit}
\newcommand{\qdisc}{\texttt}
\newcommand{\filter}{\texttt}
\begin{document}
\title{QoS in Linux with TC and Filters}
\author{Phil Sutter (phil@nwl.cc)}
\date{January 2016}
\maketitle
Standard practice when transmitting packets over a medium which may block (due
to congestion, e.g.) is to use a queue which temporarily holds these packets. In
Linux, this queueing approach is where QoS happens: A Queueing Discipline
(qdisc) holds multiple packet queues with different priorities for dequeueing to
the network driver. The classification (i.e. deciding which queue a packet
should go into) is typically done based on Type Of Service (IPv4) or Traffic
Class (IPv6) header fields but depending on qdisc implementation, might be
controlled by the user as well.
Qdiscs come in two flavors, classful or classless. While classless qdiscs are
not as flexible as classful ones, they also require much less customizing. Often
it is enough to just attach them to an interface, without exact knowledge of
what is done internally. Classful qdiscs are the exact opposite: flexible in
application, they are often not even usable without insightful configuration.
As the name implies, classful qdiscs provide configurable classes to sort
traffic into. In it's basic form, this is not much different than, say, the
classless \qdisc{pfifo\_fast} which holds three queues and classifies per
packet upon priority field. Though typically classes go beyond that by
supporting nesting and additional characteristics like e.g. maximum traffic
rate or quantum.
When it comes to controlling the classification process, filters come into play.
They attach to the parent of a set of classes (i.e. either the qdisc itself or
a parent class) and specify how a packet (or it's associated flow) has to look
like in order to suit a given class. To overcome this simplification, it is
possible to attach multiple filters to the same parent, which then consults each
of them in row until the first one accepts the packet.
Before getting into detail about what filters there are and how to use them, a
simple setup of a qdisc with classes is necessary:
\begin{figure}[H]
\begin{Verbatim}
.-------------------------------------------------------.
| |
| HTB |
| |
| .----------------------------------------------------.|
| | ||
| | Class 1:1 ||
| | ||
| | .---------------..---------------..---------------.||
| | | || || |||
| | | Class 1:10 || Class 1:20 || Class 1:30 |||
| | | || || |||
| | | .------------.|| .------------.|| .------------.|||
| | | | ||| | ||| | ||||
| | | | fq_codel ||| | fq_codel ||| | fq_codel ||||
| | | | ||| | ||| | ||||
| | | '------------'|| '------------'|| '------------'|||
| | '---------------''---------------''---------------'||
| '----------------------------------------------------'|
'-------------------------------------------------------'
\end{Verbatim}
\end{figure}
\noindent
The following commands establish the basic setup shown:
\begin{Verbatim}
(1) # tc qdisc replace dev eth0 root handle 1: htb default 30
(2) # tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit
(3) # alias tclass='tc class add dev eth0 parent 1:1'
(4) # tclass classid 1:10 htb rate 1mbit ceil 20mbit prio 1
(4) # tclass classid 1:20 htb rate 90mbit ceil 95mbit prio 2
(4) # tclass classid 1:30 htb rate 1mbit ceil 95mbit prio 3
(5) # tc qdisc add dev eth0 parent 1:10 fq_codel
(5) # tc qdisc add dev eth0 parent 1:20 fq_codel
(5) # tc qdisc add dev eth0 parent 1:30 fq_codel
\end{Verbatim}
A little explanation for the unfamiliar reader:
\begin{enumerate}
\item Replace the root qdisc of \iface{eth0} by an instance of \qdisc{HTB}.
Specifying the handle is necessary so it can be referenced in consecutive
calls to \cmd{tc}. The default class for unclassified traffic is set to
30.
\item Create a single top-level class with handle 1:1 which limits the total
bandwidth allowed to 95mbit/s. It is assumed that \iface{eth0} is a 100mbit/s link,
staying a little below that helps to keep the main point of enqueueing in
the qdisc layer instead of the interface hardware queue or at another
bottleneck in the network.
\item Define an alias for the common part of the remaining three calls in order
to improve readability. This means all remaining classes are attached to the
common parent class from (2).
\item Create three child classes for different uses: Class 1:10 has highest
priority but is tightly limited in bandwidth - fine for interactive
connections. Class 1:20 has mid priority and high guaranteed bandwidth, for
high priority bulk traffic. Finally, there's the default class 1:30 with
lowest priority, low guaranteed bandwidth and the ability to use the full
link in case it's unused otherwise. This should be fine for uninteresting
traffic not explicitly taken care of.
\item Attach a leaf qdisc to each of the child classes created in (4). Since
\qdisc{HTB} by default attaches \qdisc{pfifo} as leaf qdisc, this step is optional. Still,
the fairness between different flows provided by the classless \qdisc{fq\_codel} is
worth the effort.
\end{enumerate}
More information about the qdiscs and fine-tuning parameters can be found in
\man{tc-htb(8)} and \man{tc-fq\_codel(8)}.
Without any additional setup done, now all traffic leaving \iface{eth0} is shaped to
95mbit/s and directed through class 1:30. This can be verified by looking at the
\texttt{Sent} field of the class statistics printed via \cmd{tc -s class show dev eth0}:
Only the root class 1:1 and it's child 1:30 should show any traffic.
\section*{Finally time to start filtering!}
Let's begin with a simple one, i.e. reestablishing what \qdisc{pfifo\_fast} did
automatically based on TOS/Priority field. Linux internally translates the
header field into the priority field of struct skbuff, which
\qdisc{pfifo\_fast} uses for
classification. \man{tc-prio(8)} contains a table listing the priority (and
ultimately, \qdisc{pfifo\_fast} queue index) each TOS value is being translated into.
Here is a shorter version:
\begin{center}
\begin{tabular}{lll}
TOS Values & Linux Priority (Number) & Queue Index \\
\midrule
0x0 - 0x6 & Best Effort (0) & 1 \\
0x8 - 0xe & Bulk (2) & 2 \\
0x10 - 0x16 & Interactive (6) & 0 \\
0x18 - 0x1e & Interactive Bulk (4) & 1 \\
\end{tabular}
\end{center}
Using the \filter{basic} filter, it is possible to match packets based on that skbuff
field, which has the added benefit of being IP version agnostic. Since the
\qdisc{HTB} setup above defaults to class ID 1:30, the Bulk priority can be
ignored. The \filter{basic} filter allows to combine matches, therefore we get along
with only two filters:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: basic \
match 'meta(priority eq 6)' classid 1:10
# tc filter add dev eth0 parent 1: basic \
match 'meta(priority eq 0)' \
or 'meta(priority eq 4)' classid 1:20
\end{Verbatim}
A detailed description of the \filter{basic} filter and the ematch syntax it uses can be
found in \man{tc-basic(8)} and \man{tc-ematch(8)}.
Obviously, this first example cries for optimization. A simple one would be to
just change the default class from 1:30 to 1:20, so filters are only needed for
Bulk and Interactive priorities:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: basic \
match 'meta(priority eq 6)' classid 1:10
# tc filter add dev eth0 parent 1: basic \
match 'meta(priority eq 2)' classid 1:20
\end{Verbatim}
Given that class IDs are random, choosing them wisely allows for a direct
mapping. So first, recreate the qdisc and classes configuration:
\begin{Verbatim}
# tc qdisc replace dev eth0 root handle 1: htb default 10
# tc class add dev eth0 parent 1: classid 1:1 htb rate 95mbit
# alias tclass='tc class add dev eth0 parent 1:1'
# tclass classid 1:16 htb rate 1mbit ceil 20mbit prio 1
# tclass classid 1:10 htb rate 90mbit ceil 95mbit prio 2
# tclass classid 1:12 htb rate 1mbit ceil 95mbit prio 3
# tc qdisc add dev eth0 parent 1:16 fq_codel
# tc qdisc add dev eth0 parent 1:10 fq_codel
# tc qdisc add dev eth0 parent 1:12 fq_codel
\end{Verbatim}
This is basically identical to above, but with changed leaf class IDs and the
second priority class being the default. Using the \filter{flow} filter with it's \texttt{map}
functionality, a single filter command is enough:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: handle 0x1337 flow \
map key priority baseclass 1:10
\end{Verbatim}
The \filter{flow} filter now uses the priority value to construct a destination class ID
by adding it to the value of \texttt{baseclass}. While this works for priority values of
0, 2 and 6, it will result in non-existent class ID 1:14 for Interactive Bulk
traffic. In that case, the \qdisc{HTB} default applies so that traffic goes into class
ID 1:10 just as intended. Please note that specifying a handle is a mandatory
requirement by the \filter{flow} filter, although I didn't see where one would use that
later. For more information about \filter{flow}, see \man{tc-flow(8)}.
While \filter{flow} and \filter{basic} filters are relatively easy to apply and understand, they
are as well quite limited to their intended purpose. A more flexible option is
the \filter{u32} filter, which allows to match on arbitrary parts of the packet data -
yet only on that, not any meta data associated to it by the kernel (with the
exception of firewall mark value). So in order to continue this little
exercise with \filter{u32}, we have to base classification directly upon the actual TOS
value. An intuitive attempt might look like this:
\begin{Verbatim}
# alias tcfilter='tc filter add dev eth0 parent 1:'
# tcfilter u32 match ip dsfield 0x10 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x12 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x14 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x16 0x1e classid 1:16
# tcfilter u32 match ip dsfield 0x8 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xa 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xc 0x1e classid 1:12
# tcfilter u32 match ip dsfield 0xe 0x1e classid 1:12
\end{Verbatim}
The obvious drawback here is the amount of filters needed. And without the
default class, eight more filters would be necessary. This also has performance
implications: A packet with TOS value 0xe will be checked eight times in total
in order to determine it's destination class. While there's not much to be done
about the number of filters, at least the performance problem can be eliminated
by using \filter{u32}'s hash table support:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: prio 99 handle 1: u32 divisor 16
\end{Verbatim}
This creates a hash table with 16 buckets. The table size is arbitrary, but not
random: Since the first bit of the TOS field is not interesting, it can be
ignored and therefore the range of values to consider is just [0;15], i.e. a
number of 16 different values. The next step is to populate the hash table:
\begin{Verbatim}
# alias tcfilter='tc filter add dev eth0 parent 1: prio 99'
# tcfilter u32 match u8 0 0 ht 1:0: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:1: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:2: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:3: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:4: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:5: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:6: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:7: classid 1:12
# tcfilter u32 match u8 0 0 ht 1:8: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:9: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:a: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:b: classid 1:16
# tcfilter u32 match u8 0 0 ht 1:c: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:d: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:e: classid 1:10
# tcfilter u32 match u8 0 0 ht 1:f: classid 1:10
\end{Verbatim}
The parameter \texttt{ht} denotes the hash table and bucket the filter should be added
to. Since the first TOS bit is ignored, it's value has to be divided by two in
order to get to the bucket it maps to. E.g. a TOS value of 0x10 will therefore
map to bucket 0x8. For the sake of completeness, all possible values are mapped
and therefore a configurable default class is not required. Note that the used
match expression is not necessary, but mandatory. Therefore anything that
matches any packet will suffice. Finally, a filter which links to the defined
hash table is needed:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: prio 1 protocol ip u32 \
link 1: hashkey mask 0x001e0000 match u8 0 0
\end{Verbatim}
Here again, the actual match statement is not necessary, but syntactically
required. All the magic lies within the \texttt{hashkey} parameter, which defines which
part of the packet should be used directly as hash key. Here's a drawing of the
first four bytes of the IPv4 header, with the area selected by \texttt{hashkey mask}
highlighted:
\begin{figure}[H]
\begin{Verbatim}
0 1 2 3
.-----------------------------------------------------------------.
| | | ######## | | |
| Version| IHL | #DSCP### | ECN| Total Length |
| | | ######## | | |
'-----------------------------------------------------------------'
\end{Verbatim}
\end{figure}
\noindent
This may look confusing at first, but keep in mind that bit- as well as
byte-ordering here is LSB while the mask value is written in MSB we humans use.
Therefore reading the mask is done like so, starting from left:
\begin{enumerate}
\item Skip the first byte (which contains Version and IHL fields).
\item Skip the lowest bit of the second byte (0x1e is even).
\item Mark the four following bits (0x1e is 11110 in binary).
\item Skip the remaining three bits of the second byte as well as the remaining two
bytes.
\end{enumerate}
Before doing the lookup, the kernel right-shifts the masked value by the amount
of zero-bits in \texttt{mask}, which implicitly also does the division by two which the
hash table depends on. With this setup, every packet has to pass exactly two
filters to be classified. Note that this filter is limited to IPv4 packets: Due
to the related Traffic Class field being at a different offset in the packet, it
would not work for IPv6. To use the same setup for IPv6 as well, a second
entry-level filter is necessary:
\begin{Verbatim}
# tc filter add dev eth0 parent 1: prio 2 protocol ipv6 u32 \
link 1: hashkey mask 0x01e00000 match u8 0 0
\end{Verbatim}
For illustration purposes, here again is a drawing of the first four bytes of
the IPv6 header, again with masked area highlighted:
\begin{figure}[H]
\begin{Verbatim}
0 1 2 3
.-----------------------------------------------------------------.
| | ######## | |
| Version| #Traffic Class| Flow Label |
| | ######## | |
'-----------------------------------------------------------------'
\end{Verbatim}
\end{figure}
\noindent
Reading the mask value is analogous to IPv4 with the added complexity that
Traffic Class spans over two bytes. Yet, for comparison there's a simple trick:
IPv6 has the interesting field shifted by four bits to the left, and the new
mask's value is shifted by the same amount. For further information about
\filter{u32} and what can be done with it, consult it's man page
\man{tc-u32(8)}.
Of course, the kernel provides many more filters than just \filter{basic},
\filter{flow} and \filter{u32} which have been presented above. As of now, the
remaining ones are:
\begin{description}
\item[bpf]
Filtering using Berkeley Packet Filter programs. The program's return
code determines the packet's destination class ID.
\item[cgroup]
Filter packets based on control groups. This is only useful for packets
originating from the local host, as control groups only exist in that
scope.
\item[flower]
An extended variant of the flow filter.
\item[fw]
Matches on firewall mark values previously assigned to the packet by
netfilter (or a filter action, see below for details). This allows to
export the classification algorithm into netfilter, which is very
convenient if appropriate rules exist on the same system in there
already.
\item[route]
Filter packets based on matching routing table entry. Basically
equivalent to the \texttt{fw} filter above, to make use of an already existing
extensive routing table setup.
\item[rsvp, rsvp6]
Implementation of the Resource Reservation Protocol in Linux, to react
upon requests sent by an RSVP daemon.
\item[tcindex]
Match packets based on tcindex value, which is usually set by the dsmark
qdisc. This is part of an approach to support Differentiated Services in
Linux, which is another topic on it's own.
\end{description}
\section*{Filter Actions}
The tc filter framework provides the infrastructure to another extensible set of
tools as well, namely tc actions. As the name suggests, they allow to do things
with packets (or associated data). (The list of) Actions are part of a given
filter. If it matches, each action it contains is executed in order before
returning the classification result. Since the action has direct access to the
latter, it is in theory possible for an action to react upon or even change the
filtering result - as long as the packet matched, of course. Yet none of the
currently in-tree actions make use of this.
The Generic Actions framework originally evolved out of the filters' ability to
police traffic to a given maximum bandwidth. One common use case for that is to
limit ingress traffic, dropping packets which exceed the threshold. A classic
setup example is like so:
\begin{Verbatim}
# tc qdisc add dev eth0 handle ffff: ingress
# tc filter add dev eth0 parent ffff: u32 \
match u32 0 0
police rate 1mbit burst 100k
\end{Verbatim}
The ingress qdisc is not a real one, but merely a point of reference for filters
to attach to which should get applied to incoming traffic. The \filter{u32} filter added
above matches on any packet and therefore limits the total incoming bandwidth to
1mbit/s, allowing bursts of up to 100kbytes. Using the new syntax, the filter
command changes slightly:
\begin{Verbatim}
# tc filter add dev eth0 parent ffff: u32 \
match u32 0 0 \
action police rate 1mbit burst 100k
\end{Verbatim}
The important detail is that this syntax allows to define multiple actions.
E.g. for testing purposes, it is possible to redirect exceeding traffic to the
loopback interface instead of dropping it:
\begin{Verbatim}
# tc filter add dev eth0 parent ffff: u32 \
match u32 0 0 \
action police rate 1mbit burst 100k conform-exceed pipe \
action mirred egress redirect dev lo
\end{Verbatim}
The added parameter \texttt{conform-exceed pipe} tells the police action to allow for
further actions to handle the exceeding packet.
Apart from \texttt{police} and \texttt{mirred} actions, there are a few more. Here's a full
list of the currently implemented ones:
\begin{description}
\item[bpf]
Apply a Berkeley Packet Filter program to the packet.
\item[connmark]
Set the packet's firewall mark to that of it's connection. This works by
searching the conntrack table for a matching entry. If found, the mark
is restored.
\item[csum]
Trigger recalculation of packet checksums. The supported protocols are:
IPv4, ICMP, IGMP, TCP, UDP and UDPLite.
\item[ipt]
Pass the packet to an iptables target. This allows to use iptables
extensions directly instead of having to go the extra mile via setting
an arbitrary firewall mark and matching on that from within netfilter.
\item[mirred]
Mirror or redirect packets. This is often combined with the ifb pseudo
device to share a common QoS setup between multiple interfaces or even
ingress traffic.
\item[nat]
Perform stateless Native Address Translation. This is certainly not
complete and therefore inferior to NAT using iptables: Although the
kernel module decides between TCP, UDP and ICMP traffic, it does not
handle typical problematic protocols such as active FTP or SIP.
\item[pedit]
Generic packet editing. This allows to alter arbitrary bytes of the
packet, either by specifying an offset into the packet or by naming a
packet header and field name to change. Currently, the latter is
implemented only for IPv4 yet.
\item[police]
Apply a bandwidth rate limiting policy. Packets exceeding it are dropped
by default, but may optionally be handled differently.
\item[simple]
This is rather an example than real action. All it does is print a
user-defined string together with a packet counter. Useful maybe for
debugging when filter statistics are not available or too complicated.
\item[skbedit]
Edit associated packet data, supports changing queue mapping, priority
field and firewall mark value.
\item[vlan]
Add/remove a VLAN header to/from the packet. This might serve as
alternative to using 802.1Q pseudo-interfaces in combination with
routing rules when e.g. packets for a given destination need to be
encapsulated.
\end{description}
\section*{Intermediate Functional Block}
The Intermediate Functional Block (\texttt{ifb}) pseudo network interface acts as a QoS
concentrator for multiple different sources of traffic. Packets from or to other
interfaces have to be redirected to it using the \texttt{mirred} action in order to be
handled, regularly routed traffic will be dropped. This way, a single stack of
qdiscs, classes and filters can be shared between multiple interfaces.
Here's a simple example to feed incoming traffic from multiple interfaces
through a Stochastic Fairness Queue (\qdisc{sfq}):
\begin{Verbatim}
(1) # modprobe ifb
(2) # ip link set ifb0 up
(3) # tc qdisc add dev ifb0 root sfq
\end{Verbatim}
The first step is to load the \texttt{ifb} kernel module (1). By default, this will
create two ifb devices: \iface{ifb0} and \iface{ifb1}. After setting
\iface{ifb0} up in (2), the root
qdisc is replaced by \qdisc{sfq} in (3). Finally, one can start redirecting ingress
traffic to \iface{ifb0}, e.g. from \iface{eth0}:
\begin{Verbatim}
# tc qdisc add dev eth0 handle ffff: ingress
# tc filter add dev eth0 parent ffff: u32 \
match u32 0 0 \
action mirred egress redirect dev ifb0
\end{Verbatim}
The same can be done for other interfaces, just replacing \iface{eth0} in the two
commands above. One thing to keep in mind here is the asymmetrical routing this
creates within the host doing the QoS: Incoming packets enter the system via
\iface{ifb0}, while corresponding replies leave directly via \iface{eth0}. This can be observed
using \cmd{tcpdump} on \iface{ifb0}, which shows the input part of the traffic only. What's
more confusing is that \cmd{tcpdump} on \iface{eth0} shows both incoming and outgoing traffic,
but the redirection is still effective - a simple prove is setting
\iface{ifb0} down,
which will interrupt the communication. Obviously \cmd{tcpdump} catches the packets to
dump before they enter the ingress qdisc, which is why it sees them while the
kernel itself doesn't.
\section*{Conclusion}
Once the steep learning curve has been mastered, the conglomerate of (classful)
qdiscs, filters and actions provides a highly sophisticated and flexible
infrastructure to perform QoS, which plays nicely along with routing and
firewalling setups.
\section*{Further Reading}
A good starting point for novice users and experienced ones diving into unknown
areas is the extensive HOWTO at \url{http://lartc.org}. The iproute2 package ships
some examples (usually in /usr/share/doc/, depending on distribution) as well as
man pages for \cmd{tc} in general, qdiscs and filters. The latter have been added
just recently though, so if your distribution does not ship iproute2 version
4.3.0 yet, these are not in there. Apart from that, the internet is a spring of
HOWTOs and scripts people wrote - though these should be taken with a grain of
salt: The complexity of the matter often leads to copying others' solutions
without much validation, which allows for less optimal or even obsolete
implementations to survive much longer than desired.
\end{document}

View File

@ -5,3 +5,4 @@
4 meta
7 canid
8 ipset
9 ipt

View File

@ -12,7 +12,7 @@
9 audit
10 fiblookup
11 connector
12 nft
12 nft
13 ip6fw
14 dec-rt
15 uevent
@ -20,4 +20,4 @@
18 scsi-trans
19 ecryptfs
20 rdma
21 crypto
21 crypto

View File

@ -14,18 +14,12 @@
13 dnrouted
14 xorp
15 ntk
16 dhcp
16 dhcp
18 keepalived
42 babel
#
# Used by me for gated
#
254 gated/aggr
253 gated/bgp
252 gated/ospf
251 gated/ospfase
250 gated/rip
249 gated/static
248 gated/conn
247 gated/inet
246 gated/default
99 openr
186 bgp
187 isis
188 ospf
189 rip
192 eigrp

View File

@ -0,0 +1,2 @@
Each file in this directory is an rt_protos configuration file. iproute2
commands scan this directory processing all files that end in '.conf'.

View File

@ -1,3 +1,2 @@
Each file in this directory is an rt_tables configuration file. iproute2
commands scan this directory processing all files that end in '.conf'.

View File

@ -1,122 +0,0 @@
# CHANGES
# -------
# v0.3a2- fixed bug in "if" operator. Thanks kad@dgtu.donetsk.ua.
# v0.3a- added TIME parameter. Example:
# TIME=00:00-19:00;64Kbit/6Kbit
# So, between 00:00 and 19:00 RATE will be 64Kbit.
# Just start "cbq.init timecheck" periodically from cron (every 10
# minutes for example).
# !!! Anyway you MUST start "cbq.init start" for CBQ initialize.
# v0.2 - Some cosmetique changes. Now it more compatible with
# old bash version. Thanks to Stanislav V. Voronyi
# <stas@cnti.uanet.kharkov.ua>.
# v0.1 - First public release
#
# README
# ------
#
# First of all - this is just a SIMPLE EXAMPLE of CBQ power.
# Don't ask me "why" and "how" :)
#
# This is an example of using CBQ (Class Based Queueing) and policy-based
# filter for building smart ethernet shapers. All CBQ parameters are
# correct only for ETHERNET (eth0,1,2..) linux interfaces. It works for
# ARCNET too (just set bandwidth parameter to 2Mbit). It was tested
# on 2.1.125-2.1.129 linux kernels (KSI linux, Nostromo version) and
# ip-route utility by A.Kuznetsov (iproute2-ss981101 version).
# You can download ip-route from ftp://ftp.inr.ac.ru/ip-routing or
# get iproute2*.rpm (compiled with glibc) from ftp.ksi-linux.com.
#
#
# HOW IT WORKS
#
# Each shaper must be described by config file in $CBQ_PATH
# (/etc/sysconfig/cbq/) directory - one config file for each CBQ shaper.
#
# Some words about config file name:
# Each shaper has its personal ID - two byte HEX number. Really ID is
# CBQ class.
# So, filename looks like:
#
# cbq-1280.My_first_shaper
# ^^^ ^^^ ^^^^^^^^^^^^^
# | | |______ Shaper name - any word
# | |___________________ ID (0000-FFFF), let ID looks like shaper's rate
# |______________________ Filename must begin from "cbq-"
#
#
# Config file describes shaper parameters and source[destination]
# address[port].
# For example let's prepare /etc/sysconfig/cbq/cbq-1280.My_first_shaper:
#
# ----------8<---------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.0/24
# ----------8<---------------------
#
# This is minimal configuration, where:
# DEVICE: eth0 - device where we do control our traffic
# 10Mbit - REAL ethernet card bandwidth
# 1Mbit - "weight" of :1 class (parent for all shapers for eth0),
# as a rule of thumb weight=batdwidth/10.
# 100Mbit adapter's example: DEVICE=eth0,100Mbit,10Mbit
# *** If you want to build more than one shaper per device it's
# enough to describe bandwidth and weight once - cbq.init
# is smart :) You can put only 'DEVICE=eth0' into cbq-*
# config file for eth0.
#
# RATE: Shaper's speed - Kbit,Mbit or bps (bytes per second)
#
# WEIGHT: "weight" of shaper (CBQ class). Like for DEVICE - approx. RATE/10
#
# PRIO: shaper's priority from 1 to 8 where 1 is the highest one.
# I do always use "5" for all my shapers.
#
# RULE: [source addr][:source port],[dest addr][:dest port]
# Some examples:
# RULE=10.1.1.0/24:80 - all traffic for network 10.1.1.0 to port 80
# will be shaped.
# RULE=10.2.2.5 - shaper works only for IP address 10.2.2.5
# RULE=:25,10.2.2.128/25:5000 - all traffic from any address and port 25 to
# address 10.2.2.128 - 10.2.2.255 and port 5000
# will be shaped.
# RULE=10.5.5.5:80, - shaper active only for traffic from port 80 of
# address 10.5.5.5
# Multiple RULE fields per one config file are allowed. For example:
# RULE=10.1.1.2:80
# RULE=10.1.1.2:25
# RULE=10.1.1.2:110
#
# *** ATTENTION!!!
# All shapers do work only for outgoing traffic!
# So, if you want to build bidirectional shaper you must set it up for
# both ethernet card. For example let's build shaper for our linux box like:
#
# --------- 192.168.1.1
# BACKBONE -----eth0-| linux |-eth1------*[our client]
# ---------
#
# Let all traffic from backbone to client will be shaped at 28Kbit and
# traffic from client to backbone - at 128Kbit. We need two config files:
#
# ---8<-----/etc/sysconfig/cbq/cbq-28.client-out----
# DEVICE=eth1,10Mbit,1Mbit
# RATE=28Kbit
# WEIGHT=2Kbit
# PRIO=5
# RULE=192.168.1.1
# ---8<---------------------------------------------
#
# ---8<-----/etc/sysconfig/cbq/cbq-128.client-in----
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.1,
# ---8<---------------------------------------------
# ^pay attention to "," - this is source address!
#
# Enjoy.

View File

@ -1,49 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# this script shows how one can rate limit incoming SYNs
# Useful for TCP-SYN attack protection. You can use
# IPchains to have more powerful additions to the SYN (eg
# in addition the subnet)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
#
# tag all incoming SYN packets through $INDEV as mark value 1
############################################################
$IPCHAINS -A input -i $INDEV -y -m 1
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
#
# SYN packets are 40 bytes (320 bits) so three SYNs equals
# 960 bits (approximately 1kbit); so we rate limit below
# the incoming SYNs to 3/sec (not very sueful really; but
#serves to show the point - JHS
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 1 fw \
police rate 1kbit burst 40 mtu 9k drop flowid :1
############################################################
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,13 +1,18 @@
eBPF toy code examples (running in kernel) to familiarize yourself
with syntax and features:
- bpf_prog.c -> Classifier examples with using maps
- bpf_shared.c -> Ingress/egress map sharing example
- bpf_tailcall.c -> Using tail call chains
- bpf_cyclic.c -> Simple cycle as tail calls
- BTF defined map examples
- bpf_graft.c -> Demo on altering runtime behaviour
- bpf_shared.c -> Ingress/egress map sharing example
- bpf_map_in_map.c -> Using map in map example
User space code example:
- legacy struct bpf_elf_map defined map examples
- legacy/bpf_shared.c -> Ingress/egress map sharing example
- legacy/bpf_tailcall.c -> Using tail call chains
- legacy/bpf_cyclic.c -> Simple cycle as tail calls
- legacy/bpf_graft.c -> Demo on altering runtime behaviour
- legacy/bpf_map_in_map.c -> Using map in map example
- bpf_agent.c -> Counterpart to bpf_prog.c for user
space to transfer/read out map data
Note: Users should use new BTF way to defined the maps, the examples
in legacy folder which is using struct bpf_elf_map defined maps is not
recommanded.

View File

@ -1,258 +0,0 @@
/*
* eBPF user space agent part
*
* Simple, _self-contained_ user space agent for the eBPF kernel
* ebpf_prog.c program, which gets all map fds passed from tc via unix
* domain socket in one transaction and can thus keep referencing
* them from user space in order to read out (or possibly modify)
* map data. Here, just as a minimal example to display counters.
*
* The agent only uses the bpf(2) syscall API to read or possibly
* write to eBPF maps, it doesn't need to be aware of the low-level
* bytecode parts and/or ELF parsing bits.
*
* ! For more details, see header comment in bpf_prog.c !
*
* gcc bpf_agent.c -o bpf_agent -Wall -O2
*
* For example, a more complex user space agent could run on each
* host, reading and writing into eBPF maps used by tc classifier
* and actions. It would thus allow for implementing a distributed
* tc architecture, for example, which would push down central
* policies into eBPF maps, and thus altering run-time behaviour.
*
* -- Happy eBPF hacking! ;)
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <sys/un.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/socket.h>
/* Just some misc macros as min(), offsetof(), etc. */
#include "../../include/utils.h"
/* Common code from fd passing. */
#include "../../include/bpf_scm.h"
/* Common, shared definitions with ebpf_prog.c */
#include "bpf_shared.h"
/* Mini syscall wrapper */
#include "bpf_sys.h"
static void bpf_dump_drops(int fd)
{
int cpu, max;
max = sysconf(_SC_NPROCESSORS_ONLN);
printf(" `- number of drops:");
for (cpu = 0; cpu < max; cpu++) {
long drops;
assert(bpf_lookup_elem(fd, &cpu, &drops) == 0);
printf("\tcpu%d: %5ld", cpu, drops);
}
printf("\n");
}
static void bpf_dump_queue(int fd)
{
/* Just for the same of the example. */
int max_queue = 4, i;
printf(" | nic queues:");
for (i = 0; i < max_queue; i++) {
struct count_queue cq;
int ret;
memset(&cq, 0, sizeof(cq));
ret = bpf_lookup_elem(fd, &i, &cq);
assert(ret == 0 || (ret < 0 && errno == ENOENT));
printf("\tq%d:[pkts: %ld, mis: %ld]",
i, cq.total, cq.mismatch);
}
printf("\n");
}
static void bpf_dump_proto(int fd)
{
uint8_t protos[] = { IPPROTO_TCP, IPPROTO_UDP, IPPROTO_ICMP };
char *names[] = { "tcp", "udp", "icmp" };
int i;
printf(" ` protos:");
for (i = 0; i < ARRAY_SIZE(protos); i++) {
struct count_tuple ct;
int ret;
memset(&ct, 0, sizeof(ct));
ret = bpf_lookup_elem(fd, &protos[i], &ct);
assert(ret == 0 || (ret < 0 && errno == ENOENT));
printf("\t%s:[pkts: %ld, bytes: %ld]",
names[i], ct.packets, ct.bytes);
}
printf("\n");
}
static void bpf_dump_map_data(int *tfd)
{
int i;
for (i = 0; i < 30; i++) {
const int period = 5;
printf("data, period: %dsec\n", period);
bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]);
bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]);
bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]);
sleep(period);
}
}
static void bpf_info_loop(int *fds, struct bpf_map_aux *aux)
{
int i, tfd[BPF_MAP_ID_MAX];
printf("ver: %d\nobj: %s\ndev: %lu\nino: %lu\nmaps: %u\n",
aux->uds_ver, aux->obj_name, aux->obj_st.st_dev,
aux->obj_st.st_ino, aux->num_ent);
for (i = 0; i < aux->num_ent; i++) {
printf("map%d:\n", i);
printf(" `- fd: %u\n", fds[i]);
printf(" | serial: %u\n", aux->ent[i].id);
printf(" | type: %u\n", aux->ent[i].type);
printf(" | max elem: %u\n", aux->ent[i].max_elem);
printf(" | size key: %u\n", aux->ent[i].size_key);
printf(" ` size val: %u\n", aux->ent[i].size_value);
tfd[aux->ent[i].id] = fds[i];
}
bpf_dump_map_data(tfd);
}
static void bpf_map_get_from_env(int *tfd)
{
char key[64], *val;
int i;
for (i = 0; i < BPF_MAP_ID_MAX; i++) {
memset(key, 0, sizeof(key));
snprintf(key, sizeof(key), "BPF_MAP%d", i);
val = getenv(key);
assert(val != NULL);
tfd[i] = atoi(val);
}
}
static int bpf_map_set_recv(int fd, int *fds, struct bpf_map_aux *aux,
unsigned int entries)
{
struct bpf_map_set_msg msg;
int *cmsg_buf, min_fd, i;
char *amsg_buf, *mmsg_buf;
cmsg_buf = bpf_map_set_init(&msg, NULL, 0);
amsg_buf = (char *)msg.aux.ent;
mmsg_buf = (char *)&msg.aux;
for (i = 0; i < entries; i += min_fd) {
struct cmsghdr *cmsg;
int ret;
min_fd = min(BPF_SCM_MAX_FDS * 1U, entries - i);
bpf_map_set_init_single(&msg, min_fd);
ret = recvmsg(fd, &msg.hdr, 0);
if (ret <= 0)
return ret ? : -1;
cmsg = CMSG_FIRSTHDR(&msg.hdr);
if (!cmsg || cmsg->cmsg_type != SCM_RIGHTS)
return -EINVAL;
if (msg.hdr.msg_flags & MSG_CTRUNC)
return -EIO;
min_fd = (cmsg->cmsg_len - sizeof(*cmsg)) / sizeof(fd);
if (min_fd > entries || min_fd <= 0)
return -1;
memcpy(&fds[i], cmsg_buf, sizeof(fds[0]) * min_fd);
memcpy(&aux->ent[i], amsg_buf, sizeof(aux->ent[0]) * min_fd);
memcpy(aux, mmsg_buf, offsetof(struct bpf_map_aux, ent));
if (i + min_fd == aux->num_ent)
break;
}
return 0;
}
int main(int argc, char **argv)
{
int fds[BPF_SCM_MAX_FDS];
struct bpf_map_aux aux;
struct sockaddr_un addr;
int fd, ret, i;
/* When arguments are being passed, we take it as a path
* to a Unix domain socket, otherwise we grab the fds
* from the environment to demonstrate both possibilities.
*/
if (argc == 1) {
int tfd[BPF_MAP_ID_MAX];
bpf_map_get_from_env(tfd);
bpf_dump_map_data(tfd);
return 0;
}
fd = socket(AF_UNIX, SOCK_DGRAM, 0);
if (fd < 0) {
fprintf(stderr, "Cannot open socket: %s\n",
strerror(errno));
exit(1);
}
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, argv[argc - 1], sizeof(addr.sun_path));
ret = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
if (ret < 0) {
fprintf(stderr, "Cannot bind to socket: %s\n",
strerror(errno));
exit(1);
}
memset(fds, 0, sizeof(fds));
memset(&aux, 0, sizeof(aux));
ret = bpf_map_set_recv(fd, fds, &aux, BPF_SCM_MAX_FDS);
if (ret >= 0)
bpf_info_loop(fds, &aux);
for (i = 0; i < aux.num_ent; i++)
close(fds[i]);
close(fd);
return 0;
}

View File

@ -33,13 +33,13 @@
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} jmp_tc __section(".maps");
__section("aaa")
int cls_aaa(struct __sk_buff *skb)

View File

@ -0,0 +1,55 @@
#include "../../include/bpf_api.h"
struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
} map_inner __section(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
__array(values, struct inner_map);
} map_outer __section(".maps") = {
.values = {
[0] = &map_inner,
},
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -1,501 +0,0 @@
/*
* eBPF kernel space program part
*
* Toy eBPF program for demonstration purposes, some parts derived from
* kernel tree's samples/bpf/sockex2_kern.c example.
*
* More background on eBPF, kernel tree: Documentation/networking/filter.txt
*
* Note, this file is rather large, and most classifier and actions are
* likely smaller to accomplish one specific use-case and are tailored
* for high performance. For performance reasons, you might also have the
* classifier and action already merged inside the classifier.
*
* In order to show various features it serves as a bigger programming
* example, which you should feel free to rip apart and experiment with.
*
* Compilation, configuration example:
*
* Note: as long as the BPF backend in LLVM is still experimental,
* you need to build LLVM with LLVM with --enable-experimental-targets=BPF
* Also, make sure your 4.1+ kernel is compiled with CONFIG_BPF_SYSCALL=y,
* and you have libelf.h and gelf.h headers and can link tc against -lelf.
*
* In case you need to sync kernel headers, go to your kernel source tree:
* # make headers_install INSTALL_HDR_PATH=/usr/
*
* $ export PATH=/home/<...>/llvm/Debug+Asserts/bin/:$PATH
* $ clang -O2 -emit-llvm -c bpf_prog.c -o - | llc -march=bpf -filetype=obj -o bpf.o
* $ objdump -h bpf.o
* [...]
* 3 classifier 000007f8 0000000000000000 0000000000000000 00000040 2**3
* CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
* 4 action-mark 00000088 0000000000000000 0000000000000000 00000838 2**3
* CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
* 5 action-rand 00000098 0000000000000000 0000000000000000 000008c0 2**3
* CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
* 6 maps 00000030 0000000000000000 0000000000000000 00000958 2**2
* CONTENTS, ALLOC, LOAD, DATA
* 7 license 00000004 0000000000000000 0000000000000000 00000988 2**0
* CONTENTS, ALLOC, LOAD, DATA
* [...]
* # echo 1 > /proc/sys/net/core/bpf_jit_enable
* $ gcc bpf_agent.c -o bpf_agent -Wall -O2
* # ./bpf_agent /tmp/bpf-uds (e.g. on a different terminal)
* # tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 \
* action bpf obj bpf.o sec action-mark \
* action bpf obj bpf.o sec action-rand ok
* # tc filter show dev em1
* filter parent 1: protocol all pref 49152 bpf
* filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid 1:1 bpf.o:[classifier]
* action order 1: bpf bpf.o:[action-mark] default-action pipe
* index 52 ref 1 bind 1
*
* action order 2: bpf bpf.o:[action-rand] default-action pipe
* index 53 ref 1 bind 1
*
* action order 3: gact action pass
* random type none pass val 0
* index 38 ref 1 bind 1
*
* The same program can also be installed on ingress side (as opposed to above
* egress configuration), e.g.:
*
* # tc qdisc add dev em1 handle ffff: ingress
* # tc filter add dev em1 parent ffff: bpf obj ...
*
* Notes on BPF agent:
*
* In the above example, the bpf_agent creates the unix domain socket
* natively. "tc exec" can also spawn a shell and hold the socktes there:
*
* # tc exec bpf imp /tmp/bpf-uds
* # tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 \
* action bpf obj bpf.o sec action-mark \
* action bpf obj bpf.o sec action-rand ok
* sh-4.2# (shell spawned from tc exec)
* sh-4.2# bpf_agent
* [...]
*
* This will read out fds over environment and produce the same data dump
* as below. This has the advantage that the spawned shell owns the fds
* and thus if the agent is restarted, it can reattach to the same fds, also
* various programs can easily read/modify the data simultaneously from user
* space side.
*
* If the shell is unnecessary, the agent can also just be spawned directly
* via tc exec:
*
* # tc exec bpf imp /tmp/bpf-uds run bpf_agent
* # tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 \
* action bpf obj bpf.o sec action-mark \
* action bpf obj bpf.o sec action-rand ok
*
* BPF agent example output:
*
* ver: 1
* obj: bpf.o
* dev: 64770
* ino: 6045133
* maps: 3
* map0:
* `- fd: 4
* | serial: 1
* | type: 1
* | max elem: 256
* | size key: 1
* ` size val: 16
* map1:
* `- fd: 5
* | serial: 2
* | type: 1
* | max elem: 1024
* | size key: 4
* ` size val: 16
* map2:
* `- fd: 6
* | serial: 3
* | type: 2
* | max elem: 64
* | size key: 4
* ` size val: 8
* data, period: 5sec
* `- number of drops: cpu0: 0 cpu1: 0 cpu2: 0 cpu3: 0
* | nic queues: q0:[pkts: 0, mis: 0] q1:[pkts: 0, mis: 0] q2:[pkts: 0, mis: 0] q3:[pkts: 0, mis: 0]
* ` protos: tcp:[pkts: 0, bytes: 0] udp:[pkts: 0, bytes: 0] icmp:[pkts: 0, bytes: 0]
* data, period: 5sec
* `- number of drops: cpu0: 5 cpu1: 0 cpu2: 0 cpu3: 1
* | nic queues: q0:[pkts: 0, mis: 0] q1:[pkts: 0, mis: 0] q2:[pkts: 24, mis: 14] q3:[pkts: 0, mis: 0]
* ` protos: tcp:[pkts: 13, bytes: 1989] udp:[pkts: 10, bytes: 710] icmp:[pkts: 0, bytes: 0]
* data, period: 5sec
* `- number of drops: cpu0: 5 cpu1: 0 cpu2: 3 cpu3: 3
* | nic queues: q0:[pkts: 0, mis: 0] q1:[pkts: 0, mis: 0] q2:[pkts: 39, mis: 21] q3:[pkts: 0, mis: 0]
* ` protos: tcp:[pkts: 20, bytes: 3549] udp:[pkts: 18, bytes: 1278] icmp:[pkts: 0, bytes: 0]
* [...]
*
* This now means, the below classifier and action pipeline has been loaded
* as eBPF bytecode into the kernel, the kernel has verified that the
* execution of the bytecode is "safe", and it has JITed the programs
* afterwards, so that upon invocation they're running on native speed. tc
* has transferred all map file descriptors to the bpf_agent via IPC and
* even after tc exits, the agent can read out or modify all map data.
*
* Note that the export to the uds is done only once in the classifier and
* not in the action. It's enough to export the (here) shared descriptors
* once.
*
* If you need to disassemble the generated JIT image (echo with 2), the
* kernel tree has under tools/net/ a small helper, you can invoke e.g.
* `bpf_jit_disasm -o`.
*
* Please find in the code below further comments.
*
* -- Happy eBPF hacking! ;)
*/
#include <stdint.h>
#include <stdbool.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <asm/types.h>
#include <linux/in.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/if_tunnel.h>
#include <linux/filter.h>
#include <linux/bpf.h>
/* Common, shared definitions with ebpf_agent.c. */
#include "bpf_shared.h"
/* BPF helper functions for our example. */
#include "../../include/bpf_api.h"
/* Could be defined here as well, or included from the header. */
#define TC_ACT_UNSPEC (-1)
#define TC_ACT_OK 0
#define TC_ACT_RECLASSIFY 1
#define TC_ACT_SHOT 2
#define TC_ACT_PIPE 3
#define TC_ACT_STOLEN 4
#define TC_ACT_QUEUED 5
#define TC_ACT_REPEAT 6
/* Other, misc stuff. */
#define IP_MF 0x2000
#define IP_OFFSET 0x1FFF
/* eBPF map definitions, all placed in section "maps". */
struct bpf_elf_map __section("maps") map_proto = {
.type = BPF_MAP_TYPE_HASH,
.id = BPF_MAP_ID_PROTO,
.size_key = sizeof(uint8_t),
.size_value = sizeof(struct count_tuple),
.max_elem = 256,
.flags = BPF_F_NO_PREALLOC,
};
struct bpf_elf_map __section("maps") map_queue = {
.type = BPF_MAP_TYPE_HASH,
.id = BPF_MAP_ID_QUEUE,
.size_key = sizeof(uint32_t),
.size_value = sizeof(struct count_queue),
.max_elem = 1024,
.flags = BPF_F_NO_PREALLOC,
};
struct bpf_elf_map __section("maps") map_drops = {
.type = BPF_MAP_TYPE_ARRAY,
.id = BPF_MAP_ID_DROPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(long),
.max_elem = 64,
};
/* Helper functions and definitions for the flow dissector used by the
* example classifier. This resembles the kernel's flow dissector to
* some extend and is just used as an example to show what's possible
* with eBPF.
*/
struct sockaddr;
struct vlan_hdr {
__be16 h_vlan_TCI;
__be16 h_vlan_encapsulated_proto;
};
struct flow_keys {
__u32 src;
__u32 dst;
union {
__u32 ports;
__u16 port16[2];
};
__s32 th_off;
__u8 ip_proto;
};
static __inline__ int flow_ports_offset(__u8 ip_proto)
{
switch (ip_proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
case IPPROTO_DCCP:
case IPPROTO_ESP:
case IPPROTO_SCTP:
case IPPROTO_UDPLITE:
default:
return 0;
case IPPROTO_AH:
return 4;
}
}
static __inline__ bool flow_is_frag(struct __sk_buff *skb, int nh_off)
{
return !!(load_half(skb, nh_off + offsetof(struct iphdr, frag_off)) &
(IP_MF | IP_OFFSET));
}
static __inline__ int flow_parse_ipv4(struct __sk_buff *skb, int nh_off,
__u8 *ip_proto, struct flow_keys *flow)
{
__u8 ip_ver_len;
if (unlikely(flow_is_frag(skb, nh_off)))
*ip_proto = 0;
else
*ip_proto = load_byte(skb, nh_off + offsetof(struct iphdr,
protocol));
if (*ip_proto != IPPROTO_GRE) {
flow->src = load_word(skb, nh_off + offsetof(struct iphdr, saddr));
flow->dst = load_word(skb, nh_off + offsetof(struct iphdr, daddr));
}
ip_ver_len = load_byte(skb, nh_off + 0 /* offsetof(struct iphdr, ihl) */);
if (likely(ip_ver_len == 0x45))
nh_off += 20;
else
nh_off += (ip_ver_len & 0xF) << 2;
return nh_off;
}
static __inline__ __u32 flow_addr_hash_ipv6(struct __sk_buff *skb, int off)
{
__u32 w0 = load_word(skb, off);
__u32 w1 = load_word(skb, off + sizeof(w0));
__u32 w2 = load_word(skb, off + sizeof(w0) * 2);
__u32 w3 = load_word(skb, off + sizeof(w0) * 3);
return w0 ^ w1 ^ w2 ^ w3;
}
static __inline__ int flow_parse_ipv6(struct __sk_buff *skb, int nh_off,
__u8 *ip_proto, struct flow_keys *flow)
{
*ip_proto = load_byte(skb, nh_off + offsetof(struct ipv6hdr, nexthdr));
flow->src = flow_addr_hash_ipv6(skb, nh_off + offsetof(struct ipv6hdr, saddr));
flow->dst = flow_addr_hash_ipv6(skb, nh_off + offsetof(struct ipv6hdr, daddr));
return nh_off + sizeof(struct ipv6hdr);
}
static __inline__ bool flow_dissector(struct __sk_buff *skb,
struct flow_keys *flow)
{
int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
__be16 proto = skb->protocol;
__u8 ip_proto;
/* TODO: check for skb->vlan_tci, skb->vlan_proto first */
if (proto == htons(ETH_P_8021AD)) {
proto = load_half(skb, nh_off +
offsetof(struct vlan_hdr, h_vlan_encapsulated_proto));
nh_off += sizeof(struct vlan_hdr);
}
if (proto == htons(ETH_P_8021Q)) {
proto = load_half(skb, nh_off +
offsetof(struct vlan_hdr, h_vlan_encapsulated_proto));
nh_off += sizeof(struct vlan_hdr);
}
if (likely(proto == htons(ETH_P_IP)))
nh_off = flow_parse_ipv4(skb, nh_off, &ip_proto, flow);
else if (proto == htons(ETH_P_IPV6))
nh_off = flow_parse_ipv6(skb, nh_off, &ip_proto, flow);
else
return false;
switch (ip_proto) {
case IPPROTO_GRE: {
struct gre_hdr {
__be16 flags;
__be16 proto;
};
__u16 gre_flags = load_half(skb, nh_off +
offsetof(struct gre_hdr, flags));
__u16 gre_proto = load_half(skb, nh_off +
offsetof(struct gre_hdr, proto));
if (gre_flags & (GRE_VERSION | GRE_ROUTING))
break;
nh_off += 4;
if (gre_flags & GRE_CSUM)
nh_off += 4;
if (gre_flags & GRE_KEY)
nh_off += 4;
if (gre_flags & GRE_SEQ)
nh_off += 4;
if (gre_proto == ETH_P_8021Q) {
gre_proto = load_half(skb, nh_off +
offsetof(struct vlan_hdr,
h_vlan_encapsulated_proto));
nh_off += sizeof(struct vlan_hdr);
}
if (gre_proto == ETH_P_IP)
nh_off = flow_parse_ipv4(skb, nh_off, &ip_proto, flow);
else if (gre_proto == ETH_P_IPV6)
nh_off = flow_parse_ipv6(skb, nh_off, &ip_proto, flow);
else
return false;
break;
}
case IPPROTO_IPIP:
nh_off = flow_parse_ipv4(skb, nh_off, &ip_proto, flow);
break;
case IPPROTO_IPV6:
nh_off = flow_parse_ipv6(skb, nh_off, &ip_proto, flow);
default:
break;
}
nh_off += flow_ports_offset(ip_proto);
flow->ports = load_word(skb, nh_off);
flow->th_off = nh_off;
flow->ip_proto = ip_proto;
return true;
}
static __inline__ void cls_update_proto_map(const struct __sk_buff *skb,
const struct flow_keys *flow)
{
uint8_t proto = flow->ip_proto;
struct count_tuple *ct, _ct;
ct = map_lookup_elem(&map_proto, &proto);
if (likely(ct)) {
lock_xadd(&ct->packets, 1);
lock_xadd(&ct->bytes, skb->len);
return;
}
/* No hit yet, we need to create a new entry. */
_ct.packets = 1;
_ct.bytes = skb->len;
map_update_elem(&map_proto, &proto, &_ct, BPF_ANY);
}
static __inline__ void cls_update_queue_map(const struct __sk_buff *skb)
{
uint32_t queue = skb->queue_mapping;
struct count_queue *cq, _cq;
bool mismatch;
mismatch = skb->queue_mapping != get_smp_processor_id();
cq = map_lookup_elem(&map_queue, &queue);
if (likely(cq)) {
lock_xadd(&cq->total, 1);
if (mismatch)
lock_xadd(&cq->mismatch, 1);
return;
}
/* No hit yet, we need to create a new entry. */
_cq.total = 1;
_cq.mismatch = mismatch ? 1 : 0;
map_update_elem(&map_queue, &queue, &_cq, BPF_ANY);
}
/* eBPF program definitions, placed in various sections, which can
* have custom section names. If custom names are in use, it's
* required to point tc to the correct section, e.g.
*
* tc filter add [...] bpf obj cls.o sec cls-tos [...]
*
* in case the program resides in __section("cls-tos").
*
* Default section for cls_bpf is: "classifier", for act_bpf is:
* "action". Naturally, if for example multiple actions are present
* in the same file, they need to have distinct section names.
*
* It is however not required to have multiple programs sharing
* a file.
*/
__section("classifier")
int cls_main(struct __sk_buff *skb)
{
struct flow_keys flow;
if (!flow_dissector(skb, &flow))
return 0; /* No match in cls_bpf. */
cls_update_proto_map(skb, &flow);
cls_update_queue_map(skb);
return flow.ip_proto;
}
static __inline__ void act_update_drop_map(void)
{
uint32_t *count, cpu = get_smp_processor_id();
count = map_lookup_elem(&map_drops, &cpu);
if (count)
/* Only this cpu is accessing this element. */
(*count)++;
}
__section("action-mark")
int act_mark_main(struct __sk_buff *skb)
{
/* You could also mangle skb data here with the helper function
* BPF_FUNC_skb_store_bytes, etc. Or, alternatively you could
* do that already in the classifier itself as a merged combination
* of classifier'n'action model.
*/
if (skb->mark == 0xcafe) {
act_update_drop_map();
return TC_ACT_SHOT;
}
/* Default configured tc opcode. */
return TC_ACT_UNSPEC;
}
__section("action-rand")
int act_rand_main(struct __sk_buff *skb)
{
/* Sorry, we're near event horizon ... */
if ((get_prandom_u32() & 3) == 0) {
act_update_drop_map();
return TC_ACT_SHOT;
}
return TC_ACT_UNSPEC;
}
/* Last but not least, the file contains a license. Some future helper
* functions may only be available with a GPL license.
*/
BPF_LICENSE("GPL");

View File

@ -18,13 +18,13 @@
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME); /* or LIBBPF_PIN_NONE */
} map_sh __section(".maps");
__section("egress")
int emain(struct __sk_buff *skb)

View File

@ -1,22 +0,0 @@
#ifndef __BPF_SHARED__
#define __BPF_SHARED__
enum {
BPF_MAP_ID_PROTO,
BPF_MAP_ID_QUEUE,
BPF_MAP_ID_DROPS,
__BPF_MAP_ID_MAX,
#define BPF_MAP_ID_MAX __BPF_MAP_ID_MAX
};
struct count_tuple {
long packets; /* type long for lock_xadd() */
long bytes;
};
struct count_queue {
long total;
long mismatch;
};
#endif /* __BPF_SHARED__ */

View File

@ -1,23 +0,0 @@
#ifndef __BPF_SYS__
#define __BPF_SYS__
#include <sys/syscall.h>
#include <linux/bpf.h>
static inline __u64 bpf_ptr_to_u64(const void *ptr)
{
return (__u64) (unsigned long) ptr;
}
static inline int bpf_lookup_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = bpf_ptr_to_u64(key),
.value = bpf_ptr_to_u64(value),
};
return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}
#endif /* __BPF_SYS__ */

View File

@ -1,4 +1,4 @@
#include "../../include/bpf_api.h"
#include "../../../include/bpf_api.h"
/* Cyclic dependency example to test the kernel's runtime upper
* bound on loops. Also demonstrates on how to use direct-actions,

View File

@ -0,0 +1,66 @@
#include "../../../include/bpf_api.h"
/* This example demonstrates how classifier run-time behaviour
* can be altered with tail calls. We start out with an empty
* jmp_tc array, then add section aaa to the array slot 0, and
* later on atomically replace it with section bbb. Note that
* as shown in other examples, the tc loader can prepopulate
* tail called sections, here we start out with an empty one
* on purpose to show it can also be done this way.
*
* tc filter add dev foo parent ffff: bpf obj graft.o
* tc exec bpf dbg
* [...]
* Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
* <idle>-0 [001] ..s. 138993.202265: : fallthrough
* Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139012.053587: : aaa
* <idle>-0 [002] ..s. 139012.172359: : aaa
* Socket Thread-19818 [001] ..s. 139012.173556: : aaa
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139022.102967: : bbb
* <idle>-0 [002] ..s. 139022.155640: : bbb
* Socket Thread-19818 [001] ..s. 139022.156730: : bbb
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("aaa")
int cls_aaa(struct __sk_buff *skb)
{
printt("aaa\n");
return TC_H_MAKE(1, 42);
}
__section("bbb")
int cls_bbb(struct __sk_buff *skb)
{
printt("bbb\n");
return TC_H_MAKE(1, 43);
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,56 @@
#include "../../../include/bpf_api.h"
#define MAP_INNER_ID 42
struct bpf_elf_map __section_maps map_inner = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.id = MAP_INNER_ID,
.inner_idx = 0,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_outer = {
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,53 @@
#include "../../../include/bpf_api.h"
/* Minimal, stand-alone toy map pinning example:
*
* clang -target bpf -O2 [...] -o bpf_shared.o -c bpf_shared.c
* tc filter add dev foo parent 1: bpf obj bpf_shared.o sec egress
* tc filter add dev foo parent ffff: bpf obj bpf_shared.o sec ingress
*
* Both classifier will share the very same map instance in this example,
* so map content can be accessed from ingress *and* egress side!
*
* This example has a pinning of PIN_OBJECT_NS, so it's private and
* thus shared among various program sections within the object.
*
* A setting of PIN_GLOBAL_NS would place it into a global namespace,
* so that it can be shared among different object files. A setting
* of PIN_NONE (= 0) means no sharing, so each tc invocation a new map
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
lock_xadd(val, 1);
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("map val: %d\n", *val);
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -1,4 +1,5 @@
#include "../../include/bpf_api.h"
/* SPDX-License-Identifier: GPL-2.0 */
#include "../../../include/bpf_api.h"
#define ENTRY_INIT 3
#define ENTRY_0 0

View File

@ -1,983 +0,0 @@
#!/bin/bash
#
# cbq.init v0.7.3
# Copyright (C) 1999 Pavel Golubev <pg@ksi-linux.com>
# Copyright (C) 2001-2004 Lubomir Bulej <pallas@kadan.cz>
#
# chkconfig: 2345 11 89
# description: sets up CBQ-based traffic control
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
#
# To get the latest version, check on Freshmeat for actual location:
#
# http://freshmeat.net/projects/cbq.init
#
#
# VERSION HISTORY
# ---------------
# v0.7.3- Deepak Singhal <singhal at users.sourceforge.net>
# - fix timecheck to not ignore regular TIME rules after
# encountering a TIME rule that spans over midnight
# - Nathan Shafer <nicodemus at users.sourceforge.net>
# - allow symlinks to class files
# - Seth J. Blank <antifreeze at users.sourceforge.net>
# - replace hardcoded ip/tc location with variables
# - Mark Davis <mark.davis at gmx.de>
# - allow setting of PRIO_{MARK,RULE,REALM} in class file
# - Fernando Sanch <toptnc at users.sourceforge.net>
# - allow underscores in interface names
# v0.7.2- Paulo Sedrez
# - fix time2abs to allow hours with leading zero in TIME rules
# - Svetlin Simeonov <zvero at yahoo.com>
# - fix cbq_device_list to allow VLAN interfaces
# - Mark Davis <mark.davis at gmx.de>
# - ignore *~ backup files when looking for classes
# - Mike Boyer <boyer at administrative.com>
# - fix to allow arguments to be passed to "restart" command
# v0.7.1- Lubomir Bulej <pallas at kadan.cz>
# - default value for PERTURB
# - fixed small bug in RULE parser to correctly parse rules with
# identical source and destination fields
# - faster initial scanning of DEVICE fields
# v0.7 - Lubomir Bulej <pallas at kadan.cz>
# - lots of various cleanups and reorganizations; the parsing is now
# some 40% faster, but the class ID must be in range 0x0002-0xffff
# (again). Because of the number of internal changes and the above
# class ID restriction, I bumped the version to 0.7 to indicate
# something might have got broken :)
# - changed PRIO_{U32,FW,ROUTE} to PRIO_{RULE,MARK,REALM}
# for consistency with filter keywords
# - exposed "compile" command
# - Catalin Petrescu <taz at dntis.ro>
# - support for port masks in RULE (u32) filter
# - Jordan Vrtanoski <obeliks at mt.net.mk>
# - support for week days in TIME rules
# v0.6.4- Lubomir Bulej <pallas at kadan.cz>
# - added PRIO_* variables to allow easy control of filter priorities
# - added caching to speed up CBQ start, the cache is invalidated
# whenever any of the configuration files changes
# - updated the readme section + some cosmetic fixes
# v0.6.3- Lubomir Bulej <pallas at kadan.cz>
# - removed setup of (unnecessary) class 1:1 - all classes
# now use qdisc's default class 1:0 as their parent
# - minor fix in the timecheck branch - classes
# without leaf qdisc were not updated
# - minor fix to avoid timecheck failure when run
# at time with minutes equal to 08 or 09
# - respect CBQ_PATH setting in environment
# - made PRIO=5 default, rendering it optional in configs
# - added support for route filter, see notes about REALM keyword
# - added support for fw filter, see notes about MARK keyword
# - added filter display to "list" and "stats" commands
# - readme section update + various cosmetic fixes
# v0.6.2- Catalin Petrescu <taz at dntis.ro>
# - added tunnels interface handling
# v0.6.1- Pavel Golubev <pg at ksi-linux.com>
# - added sch_prio module loading
# (thanks johan at iglo.virtual.or.id for reminding)
# - resolved errors resulting from stricter syntax checking in bash2
# - Lubomir Bulej <pallas at kadan.cz>
# - various cosmetic fixes
# v0.6 - Lubomir Bulej <pallas at kadan.cz>
# - attempt to limit number of spawned processes by utilizing
# more of sed power (use sed instead of grep+cut)
# - simplified TIME parser, using bash builtins
# - added initial support for SFQ as leaf qdisc
# - reworked the documentation part a little
# - incorporated pending patches and ideas submitted by
# following people for versions 0.3 into version 0.6
# - Miguel Freitas <miguel at cetuc.puc-rio.br>
# - in case of overlapping TIME parameters, the last match is taken
# - Juanjo Ciarlante <jjo at mendoza.gov.ar>
# - chkconfig tags, list + stats startup parameters
# - optional tc & ip command logging (into /var/run/cbq-*)
# - Rafal Maszkowski <rzm at icm.edu.pl>
# - PEAK parameter for setting TBF's burst peak rate
# - fix for many config files (use find instead of ls)
# v0.5.1- Lubomir Bulej <pallas at kadan.cz>
# - fixed little but serious bug in RULE parser
# v0.5 - Lubomir Bulej <pallas at kadan.cz>
# - added options PARENT, LEAF, ISOLATED and BOUNDED. This allows
# (with some attention to config file ordering) for creating
# hierarchical structures of shapers with classes able (or unable)
# to borrow bandwidth from their parents.
# - class ID check allows hexadecimal numbers
# - rewritten & simplified RULE parser
# - cosmetic changes to improve readability
# - reorganization to avoid duplicate code (timecheck etc.)
# - timecheck doesn't check classes without TIME fields anymore
# v0.4 - Lubomir Bulej <pallas at kadan.cz>
# - small bugfix in RULE parsing code
# - simplified configuration parsing code
# - several small cosmetic changes
# - TIME parameter can be now specified more than once allowing you to
# differentiate RATE throughout the whole day. Time overlapping is
# not checked, first match is taken. Midnight wrap (eg. 20:00-6:00)
# is allowed and taken care of.
# v0.3a4- fixed small bug in IF operator. Thanks to
# Rafal Maszkowski <rzm at icm.edu.pl>
# v0.3a3- fixed grep bug when using more than 10 eth devices. Thanks to David
# Trcka <trcka at poda.cz>.
# v0.3a2- fixed bug in "if" operator. Thanks kad at dgtu.donetsk.ua.
# v0.3a - added TIME parameter. Example: TIME=00:00-19:00;64Kbit/6Kbit
# So, between 00:00 and 19:00 the RATE will be 64Kbit.
# Just start "cbq.init timecheck" periodically from cron
# (every 10 minutes for example). DON'T FORGET though, to run
# "cbq.init start" for CBQ to initialize.
# v0.2 - Some cosmetic changes. Now it is more compatible with old bash
# version. Thanks to Stanislav V. Voronyi <stas at cnti.uanet.kharkov.ua>.
# v0.1 - First public release
#
#
# README
# ------
#
# First of all - this is just a SIMPLE EXAMPLE of CBQ power.
# Don't ask me "why" and "how" :)
#
# This script is meant to simplify setup and management of relatively simple
# CBQ-based traffic control on Linux. Access to advanced networking features
# of Linux kernel is provided by "ip" and "tc" utilities from A. Kuznetsov's
# iproute2 package, available at ftp://ftp.inr.ac.ru/ip-routing. Because the
# utilities serve primarily to translate user wishes to RTNETLINK commands,
# their interface is rather spartan, intolerant and requires quite a lot of
# typing. And typing is what this script attempts to reduce :)
#
# The advanced networking stuff in Linux is pretty flexible and this script
# aims to bring some of its features to the not-so-hard-core Linux users. Of
# course, there is a tradeoff between simplicity and flexibility and you may
# realize that the flexibility suffered too much for your needs -- time to
# face "ip" and "tc" interface.
#
# To speed up the "start" command, simple caching was introduced in version
# 0.6.4. The caching works so that the sequence of "tc" commands for given
# configuration is stored in a file (/var/cache/cbq.init by default) which
# is used next time the "start" command is run to avoid repeated parsing of
# configuration files. This cache is invalidated whenever any of the CBQ
# configuration files changes. If you want to run "cbq.init start" without
# caching, run it as "cbq.init start nocache". If you want to force cache
# invalidation, run it as "cbq.init start invalidate". Caching is disabled
# if you have logging enabled (ie. CBQ_DEBUG is not empty).
#
# If you only want cqb.init to translate your configuration to "tc" commands,
# use "compile" command which will output "tc" commands required to build
# your configuration. Bear in mind that "compile" does not check if the "tc"
# commands were successful - this is done (in certain places) only when the
# "start nocache" command is used, which is also useful when creating the
# configuration to check whether it is completely valid.
#
# All CBQ parameters are valid for Ethernet interfaces only, The script was
# tested on various Linux kernel versions from series 2.1 to 2.4 and several
# distributions with KSI Linux (Nostromo version) as the premier one.
#
#
# HOW DOES IT WORK?
# -----------------
#
# Every traffic class must be described by a file in the $CBQ_PATH directory
# (/etc/sysconfig/cbq by default) - one file per class.
#
# The config file names must obey mandatory format: cbq-<clsid>.<name> where
# <clsid> is two-byte hexadecimal number in range <0002-FFFF> (which in fact
# is a CBQ class ID) and <name> is the name of the class -- anything to help
# you distinguish the configuration files. For small amount of classes it is
# often possible (and convenient) to let <clsid> resemble bandwidth of the
# class.
#
# Example of valid config name:
# cbq-1280.My_first_shaper
#
#
# The configuration file may contain the following parameters:
#
### Device parameters
#
# DEVICE=<ifname>,<bandwidth>[,<weight>] mandatory
# DEVICE=eth0,10Mbit,1Mbit
#
# <ifname> is the name of the interface you want to control
# traffic on, e.g. eth0
# <bandwidth> is the physical bandwidth of the device, e.g. for
# ethernet 10Mbit or 100Mbit, for arcnet 2Mbit
# <weight> is tuning parameter that should be proportional to
# <bandwidth>. As a rule of thumb: <weight> = <bandwidth> / 10
#
# When you have more classes on one interface, it is enough to specify
# <bandwidth> [and <weight>] only once, therefore in other files you only
# need to set DEVICE=<ifname>.
#
### Class parameters
#
# RATE=<speed> mandatory
# RATE=5Mbit
#
# Bandwidth allocated to the class. Traffic going through the class is
# shaped to conform to specified rate. You can use Kbit, Mbit or bps,
# Kbps and Mbps as suffices. If you don't specify any unit, bits/sec
# are used. Also note that "bps" means "bytes per second", not bits.
#
# WEIGHT=<speed> mandatory
# WEIGHT=500Kbit
#
# Tuning parameter that should be proportional to RATE. As a rule
# of thumb, use WEIGHT ~= RATE / 10.
#
# PRIO=<1-8> optional, default 5
# PRIO=5
#
# Priority of class traffic. The higher the number, the lesser
# the priority. Priority of 5 is just fine.
#
# PARENT=<clsid> optional, default not set
# PARENT=1280
#
# Specifies ID of the parent class to which you want this class be
# attached. You might want to use LEAF=none for the parent class as
# mentioned below. By using this parameter and carefully ordering the
# configuration files, it is possible to create simple hierarchical
# structures of CBQ classes. The ordering is important so that parent
# classes are constructed prior to their children.
#
# LEAF=none|tbf|sfq optional, default "tbf"
#
# Tells the script to attach specified leaf queueing discipline to CBQ
# class. By default, TBF is used. Note that attaching TBF to CBQ class
# shapes the traffic to conform to TBF parameters and prevents the class
# from borrowing bandwidth from its parent even if you have BOUNDED set
# to "no". To allow the class to borrow bandwith (provided it is not
# bounded), you must set LEAF to "none" or "sfq".
#
# If you want to ensure (approximately) fair sharing of bandwidth among
# several hosts in the same class, you might want to specify LEAF=sfq to
# attach SFQ as leaf queueing discipline to that class.
#
# BOUNDED=yes|no optional, default "yes"
#
# If set to "yes", the class is not allowed to borrow bandwidth from
# its parent class in overlimit situation. If set to "no", the class
# will be allowed to borrow bandwidth from its parent.
#
# Note: Don't forget to set LEAF to "none" or "sfq", otherwise the class will
# have TBF attached to itself and will not be able to borrow unused
# bandwith from its parent.
#
# ISOLATED=yes|no optional, default "no"
#
# If set to "yes", the class will not lend unused bandwidth to
# its children.
#
### TBF qdisc parameters
#
# BUFFER=<bytes>[/<bytes>] optional, default "10Kb/8"
#
# This parameter controls the depth of the token bucket. In other
# words it represents the maximal burst size the class can send.
# The optional part of parameter is used to determine the length
# of intervals in packet sizes, for which the transmission times
# are kept.
#
# LIMIT=<bytes> optional, default "15Kb"
#
# This parameter determines the maximal length of backlog. If
# the queue contains more data than specified by LIMIT, the
# newly arriving packets are dropped. The length of backlog
# determines queue latency in case of congestion.
#
# PEAK=<speed> optional, default not set
#
# Maximal peak rate for short-term burst traffic. This allows you
# to control the absolute peak rate the class can send at, because
# single TBF that allows 256Kbit/s would of course allow rate of
# 512Kbit for half a second or 1Mbit for a quarter of second.
#
# MTU=<bytes> optional, default "1500"
#
# Maximum number of bytes that can be sent at once over the
# physical medium. This parameter is required when you specify
# PEAK parameter. It defaults to MTU of ethernet - for other
# media types you might want to change it.
#
# Note: Setting TBF as leaf qdisc will effectively prevent the class from
# borrowing bandwidth from the ancestor class, because even if the
# class allows more traffic to pass through, it is then shaped to
# conform to TBF.
#
### SFQ qdisc parameters
#
# The SFQ queueing discipline is a cheap way for sharing class bandwidth
# among several hosts. As it is stochastic, the fairness is approximate but
# it will do the job in most cases. If you want real fairness, you should
# probably use WRR (weighted round robin) or WFQ queueing disciplines. Note
# that SFQ does not do any traffic shaping - the shaping is done by the CBQ
# class the SFQ is attached to.
#
# QUANTUM=<bytes> optional, default not set
#
# This parameter should not be set lower than link MTU, for ethernet
# it is 1500b, or (with MAC header) 1514b which is the value used
# in Alexey Kuznetsov's examples.
#
# PERTURB=<seconds> optional, default "10"
#
# Period of hash function perturbation. If unset, hash reconfiguration
# will never take place which is what you probably don't want. The
# default value of 10 seconds is probably a good one.
#
### Filter parameters
#
# RULE=[[saddr[/prefix]][:port[/mask]],][daddr[/prefix]][:port[/mask]]
#
# These parameters make up "u32" filter rules that select traffic for
# each of the classes. You can use multiple RULE fields per config.
#
# The optional port mask should only be used by advanced users who
# understand how the u32 filter works.
#
# Some examples:
#
# RULE=10.1.1.0/24:80
# selects traffic going to port 80 in network 10.1.1.0
#
# RULE=10.2.2.5
# selects traffic going to any port on single host 10.2.2.5
#
# RULE=10.2.2.5:20/0xfffe
# selects traffic going to ports 20 and 21 on host 10.2.2.5
#
# RULE=:25,10.2.2.128/26:5000
# selects traffic going from anywhere on port 50 to
# port 5000 in network 10.2.2.128
#
# RULE=10.5.5.5:80,
# selects traffic going from port 80 of single host 10.5.5.5
#
#
#
# REALM=[srealm,][drealm]
#
# These parameters make up "route" filter rules that classify traffic
# according to packet source/destination realms. For information about
# realms, see Alexey Kuznetsov's IP Command Reference. This script
# does not define any realms, it justs builds "tc filter" commands
# for you if you need to classify traffic this way.
#
# Realm is either a decimal number or a string referencing entry in
# /etc/iproute2/rt_realms (usually).
#
# Some examples:
#
# REALM=russia,internet
# selects traffic going from realm "russia" to realm "internet"
#
# REALM=freenet,
# selects traffic going from realm "freenet"
#
# REALM=10
# selects traffic going to realm 10
#
#
#
# MARK=<mark>
#
# These parameters make up "fw" filter rules that select traffic for
# each of the classes accoring to firewall "mark". Mark is a decimal
# number packets are tagged with if firewall rules say so. You can
# use multiple MARK fields per config.
#
#
# Note: Rules for different filter types can be combined. Attention must be
# paid to the priority of filter rules, which can be set below using
# PRIO_{RULE,MARK,REALM} variables.
#
### Time ranging parameters
#
# TIME=[<dow>,<dow>, ...,<dow>/]<from>-<till>;<rate>/<weight>[/<peak>]
# TIME=0,1,2,5/18:00-06:00;256Kbit/25Kbit
# TIME=60123/18:00-06:00;256Kbit/25Kbit
# TIME=18:00-06:00;256Kbit/25Kbit
#
# This parameter allows you to differentiate the class bandwidth
# throughout the day. You can specify multiple TIME parameters, if
# the times overlap, last match is taken. The fields <rate>, <weight>
# and <peak> correspond to parameters RATE, WEIGHT and PEAK (which
# is optional and applies to TBF leaf qdisc only).
#
# You can also specify days of week when the TIME rule applies. <dow>
# is numeric, 0 corresponds to sunday, 1 corresponds to monday, etc.
#
###
#
# Sample configuration file: cbq-1280.My_first_shaper
#
# --------------------------------------------------------------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.128.1.0/24
# --------------------------------------------------------------------------
#
# The configuration says that we will control traffic on 10Mbit ethernet
# device eth0 and the traffic going to network 192.168.1.0 will be
# processed with priority 5 and shaped to rate of 128Kbit.
#
# Note that you can control outgoing traffic only. If you want to control
# traffic in both directions, you must set up CBQ for both interfaces.
#
# Consider the following example:
#
# +---------+ 192.168.1.1
# BACKBONE -----eth0-| linux |-eth1------*-[client]
# +---------+
#
# Imagine you want to shape traffic from backbone to the client to 28Kbit
# and traffic in the opposite direction to 128Kbit. You need to setup CBQ
# on both eth0 and eth1 interfaces, thus you need two config files:
#
# cbq-028.backbone-client
# --------------------------------------------------------------------------
# DEVICE=eth1,10Mbit,1Mbit
# RATE=28Kbit
# WEIGHT=2Kbit
# PRIO=5
# RULE=192.168.1.1
# --------------------------------------------------------------------------
#
# cbq-128.client-backbone
# --------------------------------------------------------------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.1,
# --------------------------------------------------------------------------
#
# Pay attention to comma "," in the RULE field - it denotes source address!
#
# Enjoy.
#
#############################################################################
export LC_ALL=C
### Command locations
TC=/sbin/tc
IP=/sbin/ip
MP=/sbin/modprobe
### Default filter priorities (must be different)
PRIO_RULE_DEFAULT=${PRIO_RULE:-100}
PRIO_MARK_DEFAULT=${PRIO_MARK:-200}
PRIO_REALM_DEFAULT=${PRIO_REALM:-300}
### Default CBQ_PATH & CBQ_CACHE settings
CBQ_PATH=${CBQ_PATH:-/etc/sysconfig/cbq}
CBQ_CACHE=${CBQ_CACHE:-/var/cache/cbq.init}
### Uncomment to enable logfile for debugging
#CBQ_DEBUG="/var/run/cbq-$1"
### Modules to probe for. Uncomment the last CBQ_PROBE
### line if you have QoS support compiled into kernel
CBQ_PROBE="sch_cbq sch_tbf sch_sfq sch_prio"
CBQ_PROBE="$CBQ_PROBE cls_fw cls_u32 cls_route"
#CBQ_PROBE=""
### Keywords required for qdisc & class configuration
CBQ_WORDS="DEVICE|RATE|WEIGHT|PRIO|PARENT|LEAF|BOUNDED|ISOLATED"
CBQ_WORDS="$CBQ_WORDS|PRIO_MARK|PRIO_RULE|PRIO_REALM|BUFFER"
CBQ_WORDS="$CBQ_WORDS|LIMIT|PEAK|MTU|QUANTUM|PERTURB"
### Source AVPKT if it exists
[ -r /etc/sysconfig/cbq/avpkt ] && . /etc/sysconfig/cbq/avpkt
AVPKT=${AVPKT:-3000}
#############################################################################
############################# SUPPORT FUNCTIONS #############################
#############################################################################
### Get list of network devices
cbq_device_list () {
ip link show| sed -n "/^[0-9]/ \
{ s/^[0-9]\+: \([a-z0-9._]\+\)[:@].*/\1/; p; }"
} # cbq_device_list
### Remove root class from device $1
cbq_device_off () {
tc qdisc del dev $1 root 2> /dev/null
} # cbq_device_off
### Remove CBQ from all devices
cbq_off () {
for dev in `cbq_device_list`; do
cbq_device_off $dev
done
} # cbq_off
### Prefixed message
cbq_message () {
echo -e "**CBQ: $@"
} # cbq_message
### Failure message
cbq_failure () {
cbq_message "$@"
exit 1
} # cbq_failure
### Failure w/ cbq-off
cbq_fail_off () {
cbq_message "$@"
cbq_off
exit 1
} # cbq_fail_off
### Convert time to absolute value
cbq_time2abs () {
local min=${1##*:}; min=${min##0}
local hrs=${1%%:*}; hrs=${hrs##0}
echo $[hrs*60 + min]
} # cbq_time2abs
### Display CBQ setup
cbq_show () {
for dev in `cbq_device_list`; do
[ `tc qdisc show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: queueing disciplines\n"
tc $1 qdisc show dev $dev; echo
[ `tc class show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: traffic classes\n"
tc $1 class show dev $dev; echo
[ `tc filter show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: filtering rules\n"
tc $1 filter show dev $dev; echo
done
} # cbq_show
### Check configuration and load DEVICES, DEVFIELDS and CLASSLIST from $1
cbq_init () {
### Get a list of configured classes
CLASSLIST=`find $1 -maxdepth 1 \( -type f -or -type l \) -name 'cbq-*' \
-not -name '*~' -printf "%f\n"| sort`
[ -z "$CLASSLIST" ] &&
cbq_failure "no configuration files found in $1!"
### Gather all DEVICE fields from $1/cbq-*
DEVFIELDS=`find $1 -maxdepth 1 \( -type f -or -type l \) -name 'cbq-*' \
-not -name '*~' | xargs sed -n 's/#.*//; \
s/[[:space:]]//g; /^DEVICE=[^,]*,[^,]*\(,[^,]*\)\?/ \
{ s/.*=//; p; }'| sort -u`
[ -z "$DEVFIELDS" ] &&
cbq_failure "no DEVICE field found in $1/cbq-*!"
### Check for different DEVICE fields for the same device
DEVICES=`echo "$DEVFIELDS"| sed 's/,.*//'| sort -u`
[ `echo "$DEVICES"| wc -l` -ne `echo "$DEVFIELDS"| wc -l` ] &&
cbq_failure "different DEVICE fields for single device!\n$DEVFIELDS"
} # cbq_init
### Load class configuration from $1/$2
cbq_load_class () {
CLASS=`echo $2| sed 's/^cbq-0*//; s/^\([0-9a-fA-F]\+\).*/\1/'`
CFILE=`sed -n 's/#.*//; s/[[:space:]]//g; /^[[:alnum:]_]\+=[[:alnum:].,:;/*@-_]\+$/ p' $1/$2`
### Check class number
IDVAL=`/usr/bin/printf "%d" 0x$CLASS 2> /dev/null`
[ $? -ne 0 -o $IDVAL -lt 2 -o $IDVAL -gt 65535 ] &&
cbq_fail_off "class ID of $2 must be in range <0002-FFFF>!"
### Set defaults & load class
RATE=""; WEIGHT=""; PARENT=""; PRIO=5
LEAF=tbf; BOUNDED=yes; ISOLATED=no
BUFFER=10Kb/8; LIMIT=15Kb; MTU=1500
PEAK=""; PERTURB=10; QUANTUM=""
PRIO_RULE=$PRIO_RULE_DEFAULT
PRIO_MARK=$PRIO_MARK_DEFAULT
PRIO_REALM=$PRIO_REALM_DEFAULT
eval `echo "$CFILE"| grep -E "^($CBQ_WORDS)="`
### Require RATE/WEIGHT
[ -z "$RATE" -o -z "$WEIGHT" ] &&
cbq_fail_off "missing RATE or WEIGHT in $2!"
### Class device
DEVICE=${DEVICE%%,*}
[ -z "$DEVICE" ] && cbq_fail_off "missing DEVICE field in $2!"
BANDWIDTH=`echo "$DEVFIELDS"| sed -n "/^$DEVICE,/ \
{ s/[^,]*,\([^,]*\).*/\1/; p; q; }"`
### Convert to "tc" options
PEAK=${PEAK:+peakrate $PEAK}
PERTURB=${PERTURB:+perturb $PERTURB}
QUANTUM=${QUANTUM:+quantum $QUANTUM}
[ "$BOUNDED" = "no" ] && BOUNDED="" || BOUNDED="bounded"
[ "$ISOLATED" = "yes" ] && ISOLATED="isolated" || ISOLATED=""
} # cbq_load_class
#############################################################################
#################################### INIT ###################################
#############################################################################
### Check for presence of ip-route2 in usual place
[ -x $TC -a -x $IP ] ||
cbq_failure "ip-route2 utilities not installed or executable!"
### ip/tc wrappers
if [ "$1" = "compile" ]; then
### no module probing
CBQ_PROBE=""
ip () {
$IP "$@"
} # ip
### echo-only version of "tc" command
tc () {
echo "$TC $@"
} # tc
elif [ -n "$CBQ_DEBUG" ]; then
echo -e "# `date`" > $CBQ_DEBUG
### Logging version of "ip" command
ip () {
echo -e "\n# ip $@" >> $CBQ_DEBUG
$IP "$@" 2>&1 | tee -a $CBQ_DEBUG
} # ip
### Logging version of "tc" command
tc () {
echo -e "\n# tc $@" >> $CBQ_DEBUG
$TC "$@" 2>&1 | tee -a $CBQ_DEBUG
} # tc
else
### Default wrappers
ip () {
$IP "$@"
} # ip
tc () {
$TC "$@"
} # tc
fi # ip/tc wrappers
case "$1" in
#############################################################################
############################### START/COMPILE ###############################
#############################################################################
start|compile)
### Probe QoS modules (start only)
for module in $CBQ_PROBE; do
$MP $module || cbq_failure "failed to load module $module"
done
### If we are in compile/nocache/logging mode, don't bother with cache
if [ "$1" != "compile" -a "$2" != "nocache" -a -z "$CBQ_DEBUG" ]; then
VALID=1
### validate the cache
[ "$2" = "invalidate" -o ! -f $CBQ_CACHE ] && VALID=0
if [ $VALID -eq 1 ]; then
[ `find $CBQ_PATH -maxdepth 1 -newer $CBQ_CACHE| \
wc -l` -gt 0 ] && VALID=0
fi
### compile the config if the cache is invalid
if [ $VALID -ne 1 ]; then
$0 compile > $CBQ_CACHE ||
cbq_fail_off "failed to compile CBQ configuration!"
fi
### run the cached commands
exec /bin/sh $CBQ_CACHE 2> /dev/null
fi
### Load DEVICES, DEVFIELDS and CLASSLIST
cbq_init $CBQ_PATH
### Setup root qdisc on all configured devices
for dev in $DEVICES; do
### Retrieve device bandwidth and, optionally, weight
DEVTEMP=`echo "$DEVFIELDS"| sed -n "/^$dev,/ { s/$dev,//; p; q; }"`
DEVBWDT=${DEVTEMP%%,*}; DEVWGHT=${DEVTEMP##*,}
[ "$DEVBWDT" = "$DEVWGHT" ] && DEVWGHT=""
### Device bandwidth is required
if [ -z "$DEVBWDT" ]; then
cbq_message "could not determine bandwidth for device $dev!"
cbq_failure "please set up the DEVICE fields properly!"
fi
### Check if the device is there
ip link show $dev &> /dev/null ||
cbq_fail_off "device $dev not found!"
### Remove old root qdisc from device
cbq_device_off $dev
### Setup root qdisc + class for device
tc qdisc add dev $dev root handle 1 cbq \
bandwidth $DEVBWDT avpkt $AVPKT cell 8
### Set weight of the root class if set
[ -n "$DEVWGHT" ] &&
tc class change dev $dev root cbq weight $DEVWGHT allot 1514
[ "$1" = "compile" ] && echo
done # dev
### Setup traffic classes
for classfile in $CLASSLIST; do
cbq_load_class $CBQ_PATH $classfile
### Create the class
tc class add dev $DEVICE parent 1:$PARENT classid 1:$CLASS cbq \
bandwidth $BANDWIDTH rate $RATE weight $WEIGHT prio $PRIO \
allot 1514 cell 8 maxburst 20 avpkt $AVPKT $BOUNDED $ISOLATED ||
cbq_fail_off "failed to add class $CLASS with parent $PARENT on $DEVICE!"
### Create leaf qdisc if set
if [ "$LEAF" = "tbf" ]; then
tc qdisc add dev $DEVICE parent 1:$CLASS handle $CLASS tbf \
rate $RATE buffer $BUFFER limit $LIMIT mtu $MTU $PEAK
elif [ "$LEAF" = "sfq" ]; then
tc qdisc add dev $DEVICE parent 1:$CLASS handle $CLASS sfq \
$PERTURB $QUANTUM
fi
### Create fw filter for MARK fields
for mark in `echo "$CFILE"| sed -n '/^MARK/ { s/.*=//; p; }'`; do
### Attach fw filter to root class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_MARK handle $mark fw classid 1:$CLASS
done ### mark
### Create route filter for REALM fields
for realm in `echo "$CFILE"| sed -n '/^REALM/ { s/.*=//; p; }'`; do
### Split realm into source & destination realms
SREALM=${realm%%,*}; DREALM=${realm##*,}
[ "$SREALM" = "$DREALM" ] && SREALM=""
### Convert asterisks to empty strings
SREALM=${SREALM#\*}; DREALM=${DREALM#\*}
### Attach route filter to the root class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_REALM route ${SREALM:+from $SREALM} \
${DREALM:+to $DREALM} classid 1:$CLASS
done ### realm
### Create u32 filter for RULE fields
for rule in `echo "$CFILE"| sed -n '/^RULE/ { s/.*=//; p; }'`; do
### Split rule into source & destination
SRC=${rule%%,*}; DST=${rule##*,}
[ "$SRC" = "$rule" ] && SRC=""
### Split destination into address, port & mask fields
DADDR=${DST%%:*}; DTEMP=${DST##*:}
[ "$DADDR" = "$DST" ] && DTEMP=""
DPORT=${DTEMP%%/*}; DMASK=${DTEMP##*/}
[ "$DPORT" = "$DTEMP" ] && DMASK="0xffff"
### Split up source (if specified)
SADDR=""; SPORT=""
if [ -n "$SRC" ]; then
SADDR=${SRC%%:*}; STEMP=${SRC##*:}
[ "$SADDR" = "$SRC" ] && STEMP=""
SPORT=${STEMP%%/*}; SMASK=${STEMP##*/}
[ "$SPORT" = "$STEMP" ] && SMASK="0xffff"
fi
### Convert asterisks to empty strings
SADDR=${SADDR#\*}; DADDR=${DADDR#\*}
### Compose u32 filter rules
u32_s="${SPORT:+match ip sport $SPORT $SMASK}"
u32_s="${SADDR:+match ip src $SADDR} $u32_s"
u32_d="${DPORT:+match ip dport $DPORT $DMASK}"
u32_d="${DADDR:+match ip dst $DADDR} $u32_d"
### Uncomment the following if you want to see parsed rules
#echo "$rule: $u32_s $u32_d"
### Attach u32 filter to the appropriate class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_RULE u32 $u32_s $u32_d classid 1:$CLASS
done ### rule
[ "$1" = "compile" ] && echo
done ### classfile
;;
#############################################################################
################################# TIME CHECK ################################
#############################################################################
timecheck)
### Get time + weekday
TIME_TMP=`date +%w/%k:%M`
TIME_DOW=${TIME_TMP%%/*}
TIME_NOW=${TIME_TMP##*/}
### Load DEVICES, DEVFIELDS and CLASSLIST
cbq_init $CBQ_PATH
### Run through all classes
for classfile in $CLASSLIST; do
### Gather all TIME rules from class config
TIMESET=`sed -n 's/#.*//; s/[[:space:]]//g; /^TIME/ { s/.*=//; p; }' \
$CBQ_PATH/$classfile`
[ -z "$TIMESET" ] && continue
MATCH=0; CHANGE=0
for timerule in $TIMESET; do
TIME_ABS=`cbq_time2abs $TIME_NOW`
### Split TIME rule to pieces
TIMESPEC=${timerule%%;*}; PARAMS=${timerule##*;}
WEEKDAYS=${TIMESPEC%%/*}; INTERVAL=${TIMESPEC##*/}
BEG_TIME=${INTERVAL%%-*}; END_TIME=${INTERVAL##*-}
### Check the day-of-week (if present)
[ "$WEEKDAYS" != "$INTERVAL" -a \
-n "${WEEKDAYS##*$TIME_DOW*}" ] && continue
### Compute interval boundaries
BEG_ABS=`cbq_time2abs $BEG_TIME`
END_ABS=`cbq_time2abs $END_TIME`
### Midnight wrap fixup
if [ $BEG_ABS -gt $END_ABS ]; then
[ $TIME_ABS -le $END_ABS ] &&
TIME_ABS=$[TIME_ABS + 24*60]
END_ABS=$[END_ABS + 24*60]
fi
### If the time matches, remember params and set MATCH flag
if [ $TIME_ABS -ge $BEG_ABS -a $TIME_ABS -lt $END_ABS ]; then
TMP_RATE=${PARAMS%%/*}; PARAMS=${PARAMS#*/}
TMP_WGHT=${PARAMS%%/*}; TMP_PEAK=${PARAMS##*/}
[ "$TMP_PEAK" = "$TMP_WGHT" ] && TMP_PEAK=""
TMP_PEAK=${TMP_PEAK:+peakrate $TMP_PEAK}
MATCH=1
fi
done ### timerule
cbq_load_class $CBQ_PATH $classfile
### Get current RATE of CBQ class
RATE_NOW=`tc class show dev $DEVICE| sed -n \
"/cbq 1:$CLASS / { s/.*rate //; s/ .*//; p; q; }"`
[ -z "$RATE_NOW" ] && continue
### Time interval matched
if [ $MATCH -ne 0 ]; then
### Check if there is any change in class RATE
if [ "$RATE_NOW" != "$TMP_RATE" ]; then
NEW_RATE="$TMP_RATE"
NEW_WGHT="$TMP_WGHT"
NEW_PEAK="$TMP_PEAK"
CHANGE=1
fi
### Match not found, reset to default RATE if necessary
elif [ "$RATE_NOW" != "$RATE" ]; then
NEW_WGHT="$WEIGHT"
NEW_RATE="$RATE"
NEW_PEAK="$PEAK"
CHANGE=1
fi
### If there are no changes, go for next class
[ $CHANGE -eq 0 ] && continue
### Replace CBQ class
tc class replace dev $DEVICE classid 1:$CLASS cbq \
bandwidth $BANDWIDTH rate $NEW_RATE weight $NEW_WGHT prio $PRIO \
allot 1514 cell 8 maxburst 20 avpkt $AVPKT $BOUNDED $ISOLATED
### Replace leaf qdisc (if any)
if [ "$LEAF" = "tbf" ]; then
tc qdisc replace dev $DEVICE handle $CLASS tbf \
rate $NEW_RATE buffer $BUFFER limit $LIMIT mtu $MTU $NEW_PEAK
fi
cbq_message "$TIME_NOW: class $CLASS on $DEVICE changed rate ($RATE_NOW -> $NEW_RATE)"
done ### class file
;;
#############################################################################
################################## THE REST #################################
#############################################################################
stop)
cbq_off
;;
list)
cbq_show
;;
stats)
cbq_show -s
;;
restart)
shift
$0 stop
$0 start "$@"
;;
*)
echo "Usage: `basename $0` {start|compile|stop|restart|timecheck|list|stats}"
esac

View File

@ -1,76 +0,0 @@
#! /bin/sh
TC=/home/root/tc
IP=/home/root/ip
DEVICE=eth1
BANDWIDTH="bandwidth 10Mbit"
# Attach CBQ on $DEVICE. It will have handle 1:.
# $BANDWIDTH is real $DEVICE bandwidth (10Mbit).
# avpkt is average packet size.
# mpu is minimal packet size.
$TC qdisc add dev $DEVICE root handle 1: cbq \
$BANDWIDTH avpkt 1000 mpu 64
# Create root class with classid 1:1. This step is not necessary.
# bandwidth is the same as on CBQ itself.
# rate == all the bandwidth
# allot is MTU + MAC header
# maxburst measure allowed class burstiness (please,read S.Floyd and VJ papers)
# est 1sec 8sec means, that kernel will evaluate average rate
# on this class with period 1sec and time constant 8sec.
# This rate is viewed with "tc -s class ls dev $DEVICE"
$TC class add dev $DEVICE parent 1:0 classid :1 est 1sec 8sec cbq \
$BANDWIDTH rate 10Mbit allot 1514 maxburst 50 avpkt 1000
# Bulk.
# New parameters are:
# weight, which is set to be proportional to
# "rate". It is not necessary, weight=1 will work as well.
# defmap and split say that best effort ttraffic, not classfied
# by another means will fall to this class.
$TC class add dev $DEVICE parent 1:1 classid :2 est 1sec 8sec cbq \
$BANDWIDTH rate 4Mbit allot 1514 weight 500Kbit \
prio 6 maxburst 50 avpkt 1000 split 1:0 defmap ff3d
# OPTIONAL.
# Attach "sfq" qdisc to this class, quantum is MTU, perturb
# gives period of hash function perturbation in seconds.
#
$TC qdisc add dev $DEVICE parent 1:2 sfq quantum 1514b perturb 15
# Interactive-burst class
$TC class add dev $DEVICE parent 1:1 classid :3 est 2sec 16sec cbq \
$BANDWIDTH rate 1Mbit allot 1514 weight 100Kbit \
prio 2 maxburst 100 avpkt 1000 split 1:0 defmap c0
$TC qdisc add dev $DEVICE parent 1:3 sfq quantum 1514b perturb 15
# Background.
$TC class add dev $DEVICE parent 1:1 classid :4 est 1sec 8sec cbq \
$BANDWIDTH rate 100Kbit allot 1514 weight 10Mbit \
prio 7 maxburst 10 avpkt 1000 split 1:0 defmap 2
$TC qdisc add dev $DEVICE parent 1:4 sfq quantum 1514b perturb 15
# Realtime class for RSVP
$TC class add dev $DEVICE parent 1:1 classid 1:7FFE cbq \
rate 5Mbit $BANDWIDTH allot 1514b avpkt 1000 \
maxburst 20
# Reclassified realtime traffic
#
# New element: split is not 1:0, but 1:7FFE. It means,
# that only real-time packets, which violated policing filters
# or exceeded reshaping buffers will fall to it.
$TC class add dev $DEVICE parent 1:7FFE classid 1:7FFF est 4sec 32sec cbq \
rate 1Mbit $BANDWIDTH allot 1514b avpkt 1000 weight 10Kbit \
prio 6 maxburst 10 split 1:7FFE defmap ffff

View File

@ -1,446 +0,0 @@
#!/bin/bash
#
# dhclient-script for Linux.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version
# 2 of the License, or (at your option) any later version.
#
# Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
#
# Probably, I did not understand, what this funny feature as "alias"
# means exactly. For now I suppose, that it is a static address, which
# we should install and preserve.
#
exec >> /var/log/DHS.log 2>&1
echo dhc-script $* reason=$reason
set | grep "^\(old_\|new_\|check_\)"
LOG () {
echo LOG $* ;
}
# convert 8bit mask to length
# arg: $1 = mask
#
Mask8ToLen() {
local l=0;
while [ $l -le 7 ]; do
if [ $[ ( 1 << $l ) + $1 ] -eq 256 ]; then
return $[ 8 - $l ]
fi
l=$[ $l + 1 ]
done
return 0;
}
# convert inet dotted quad mask to length
# arg: $1 = dotquad mask
#
MaskToLen() {
local masklen=0
local mask8=$1
case $1 in
0.0.0.0)
return 0;
;;
255.*.0.0)
masklen=8
mask8=${mask8#255.}
mask8=${mask8%.0.0}
;;
255.255.*.0)
masklen=16
mask8=${mask8#255.255.}
mask8=${mask8%.0}
;;
255.255.255.*)
masklen=24
mask8=${mask8#255.255.255.}
;;
*)
return 255
;;
esac
Mask8ToLen $mask8
return $[ $? + $masklen ]
}
# calculate ABC "natural" mask
# arg: $1 = dotquad address
#
ABCMask () {
local class;
class=${1%%.*}
if [ "$1" = "255.255.255.255" ]; then
echo $1
elif [ "$1" = "0.0.0.0" ]; then
echo $1
elif [ $class -ge 224 ]; then
echo 240.0.0.0
elif [ $class -ge 192 ]; then
echo 255.255.255.0
elif [ $class -ge 128 ]; then
echo 255.255.0.0
else
echo 255.0.0.0
fi
}
# calculate ABC "natural" mask length
# arg: $1 = dotquad address
#
ABCMaskLen () {
local class;
class=${1%%.*}
if [ "$1" = "255.255.255.255" ]; then
return 32
elif [ "$1" = "0.0.0.0" ]; then
return 0
elif [ $class -ge 224 ]; then
return 4;
elif [ $class -ge 192 ]; then
return 24;
elif [ $class -ge 128 ]; then
return 16;
else
return 8;
fi
}
# Delete IP address
# args: $1 = interface
# $2 = address
# $3 = mask
# $4 = broadcast
# $5 = label
#
DelINETAddr () {
local masklen=32
local addrid=$1
LOG DelINETAddr $*
if [ "$5" ]; then
addrid=$addrid:$5
fi
LOG ifconfig $addrid down
ifconfig $addrid down
}
# Add IP address
# args: $1 = interface
# $2 = address
# $3 = mask
# $4 = broadcast
# $5 = label
#
AddINETAddr () {
local mask_arg
local brd_arg
local addrid=$1
LOG AddINETAddr $*
if [ "$5" ]; then
addrid=$addrid:$5
fi
if [ "$3" ]; then
mask_arg="netmask $3"
fi
if [ "$4" ]; then
brd_arg="broadcast $4"
fi
LOG ifconfig $addrid $2 $mask_arg $brd_arg up
ifconfig $addrid $2 $mask_arg $brd_arg up
}
# Add default routes
# args: $1 = routers list
#
AddDefaultRoutes() {
local router
if [ "$1" ]; then
LOG AddDefaultRoutes $*
for router in $1; do
LOG route add default gw $router
route add default gw $router
done ;
fi
}
# Delete default routes
# args: $1 = routers list
#
DelDefaultRoutes() {
local router
if [ "$1" ]; then
LOG DelDefaultRoutes $*
for router in $1; do
LOG route del default gw $router
route del default gw $router
done
fi
}
# ping a host
# args: $1 = dotquad address of the host
#
PingNode() {
LOG PingNode $*
if ping -q -c 1 -w 2 $1 ; then
return 0;
fi
return 1;
}
# Check (and add route, if alive) default routers
# args: $1 = routers list
# returns: 0 if at least one router is alive.
#
CheckRouterList() {
local router
local succeed=1
LOG CheckRouterList $*
for router in $1; do
if PingNode $router ; then
succeed=0
route add default gw $router
fi
done
return $succeed
}
# Delete/create static routes.
# args: $1 = operation (del/add)
# $2 = routes list in format "dst1 nexthop1 dst2 ..."
#
# BEWARE: this feature of DHCP is obsolete, because does not
# support subnetting.
#
X-StaticRouteList() {
local op=$1
local lst="$2"
local masklen
LOG X-StaticRouteList $*
if [ "$lst" ]; then
set $lst
while [ $# -gt 1 ]; do
route $op -net $1 netmask `ABCMask "$1"` gw $2
shift; shift;
done
fi
}
# Create static routes.
# arg: $1 = routes list in format "dst1 nexthop1 dst2 ..."
#
AddStaticRouteList() {
LOG AddStaticRouteList $*
X-StaticRouteList add "$1"
}
# Delete static routes.
# arg: $1 = routes list in format "dst1 nexthop1 dst2 ..."
#
DelStaticRouteList() {
LOG DelStaticRouteList $*
X-StaticRouteList del "$1"
}
# Broadcast unsolicited ARP to update neighbours' caches.
# args: $1 = interface
# $2 = address
#
UnsolicitedARP() {
if [ -f /sbin/arping ]; then
/sbin/arping -A -c 1 -I "$1" "$2" &
(sleep 2 ; /sbin/arping -U -c 1 -I "$1" "$2" ) &
fi
}
# Duplicate address detection.
# args: $1 = interface
# $2 = test address
# returns: 0, if DAD succeeded.
DAD() {
if [ -f /sbin/arping ]; then
/sbin/arping -c 2 -w 3 -D -I "$1" "$2"
return $?
fi
return 0
}
# Setup resolver.
# args: NO
# domain and nameserver list are passed in global variables.
#
# NOTE: we try to be careful and not to break user supplied resolv.conf.
# The script mangles it, only if it has dhcp magic signature.
#
UpdateDNS() {
local nameserver
local idstring="#### Generated by DHCPCD"
LOG UpdateDNS $*
if [ "$new_domain_name" = "" -a "$new_domain_name_servers" = "" ]; then
return 0;
fi
echo $idstring > /etc/resolv.conf.dhcp
if [ "$new_domain_name" ]; then
echo search $new_domain_name >> /etc/resolv.conf.dhcp
fi
echo options ndots:1 >> /etc/resolv.conf.dhcp
if [ "$new_domain_name_servers" ]; then
for nameserver in $new_domain_name_servers; do
echo nameserver $nameserver >> /etc/resolv.conf.dhcp
done
else
echo nameserver 127.0.0.1 >> /etc/resolv.conf.dhcp
fi
if [ -f /etc/resolv.conf ]; then
if [ "`head -1 /etc/resolv.conf`" != "$idstring" ]; then
return 0
fi
if [ "$old_domain_name" = "$new_domain_name" -a
"$new_domain_name_servers" = "$old_domain_name_servers" ]; then
return 0
fi
fi
mv /etc/resolv.conf.dhcp /etc/resolv.conf
}
case $reason in
NBI)
exit 1
;;
MEDIUM)
exit 0
;;
PREINIT)
ifconfig $interface:dhcp down
ifconfig $interface:dhcp1 down
if [ -d /proc/sys/net/ipv4/conf/$interface ]; then
ifconfig $interface:dhcp 10.10.10.10 netmask 255.255.255.255
ifconfig $interface:dhcp down
if [ -d /proc/sys/net/ipv4/conf/$interface ]; then
LOG The interface $interface already configured.
fi
fi
ifconfig $interface:dhcp up
exit 0
;;
ARPSEND)
exit 0
;;
ARPCHECK)
if DAD "$interface" "$check_ip_address" ; then
exit 0
fi
exit 1
;;
BOUND|RENEW|REBIND|REBOOT)
if [ "$old_ip_address" -a "$alias_ip_address" -a \
"$alias_ip_address" != "$old_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
if [ "$old_ip_address" -a "$old_ip_address" != "$new_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
fi
if [ "$old_ip_address" = "" -o "$old_ip_address" != "$new_ip_address" -o \
"$reason" = "BOUND" -o "$reason" = "REBOOT" ]; then
AddINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
AddStaticRouteList "$new_static_routes"
AddDefaultRoutes "$new_routers"
UnsolicitedARP "$interface" "$new_ip_address"
fi
if [ "$new_ip_address" != "$alias_ip_address" -a "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
UpdateDNS
exit 0
;;
EXPIRE|FAIL)
if [ "$alias_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
if [ "$old_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
fi
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 0
;;
TIMEOUT)
if [ "$alias_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
# Seems, <null address> means, that no more old leases found.
# Or does it mean bug in dhcpcd? 8) Fail for now.
if [ "$new_ip_address" = "<null address>" ]; then
if [ "$old_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
fi
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 1
fi
if DAD "$interface" "$new_ip_address" ; then
AddINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
UnsolicitedARP "$interface" "$new_ip_address"
if [ "$alias_ip_address" -a "$alias_ip_address" != "$new_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
UnsolicitedARP "$interface" "$alias_ip_address"
fi
if CheckRouterList "$new_routers" ; then
AddStaticRouteList "$new_static_routes"
UpdateDNS
exit 0
fi
fi
DelINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 1
;;
esac
exit 0

View File

@ -1,68 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script just tags on the ingress interfac using Ipchains
# the result is used for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
#
############################################################
$IPCHAINS -A input -s 10.2.0.4/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64 set_tc_index
#
# values of the DSCP to change depending on the class
#
#becomes EF
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
#becomes AF11
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
#becomes AF21
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent 1:0
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0

View File

@ -1,87 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script tags the fwmark on the ingress interface using IPchains
# the result is used first for policing on the Ingress interface then
# for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
############################################################
$IPCHAINS -A input -s 10.2.0.0/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# attach a fw classifier to the ingress which polices anything marked
# by ipchains to tag value 3 (The rest of the subnet packets -- not
# tag 1 or 2) to not go beyond 1.5Mbps
# Allow up to at least 60 packets to burst (assuming maximum packet
# size of # 1.5 KB) in the long run and upto about 6 packets in the
# shot run
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 3 fw \
police rate 1500kbit burst 90k mtu 9k drop flowid :1
############################################################
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $DEV ingress

View File

@ -1,170 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/usr/src/iproute2-current
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth0
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# Create u32 filters
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1: u32 \
divisor 1
############################################################
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter2=" police index 2 rate $CIR2 burst $CBS1 "
meter3=" police index 3 rate $CIR2 burst $CBS2 "
meter4=" police index 4 rate $CIR1 burst $CBS2 "
meter5=" police index 5 rate $CIR1 burst $CBS2 "
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesnt exceed its CIR/CBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x90 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesnt exceed its CIR/CBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x98 0xfc \
$meter3 \
drop flowid :3
#
# *********************** BE ***************************
#
# Anything else (not from the AF4*) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
# Note that the BE class is also used by the AF4* in the worst
# case
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,132 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR2 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesnt exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter1 \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter2 \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3 \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,198 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=500kbit
# the PIR is what is in excess of the CIR
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 20 max sized packets
EBS1=30k
EBS2=30k
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter1a=" police index 2 rate $PIR1 burst $EBS1 "
meter2=" police index 3 rate $CIR2 burst $CBS1 "
meter2a=" police index 4 rate $PIR2 burst $EBS1 "
meter3=" police index 5 rate $CIR2 burst $CBS2 "
meter3a=" police index 6 rate $PIR2 burst $EBS2 "
meter4=" police index 7 rate $CIR1 burst $CBS2 "
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) from is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip tos 0x88 0xfc \
$meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 8 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 9 u32 \
match ip tos 0x90 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 10 u32 \
match ip tos 0x90 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 11 u32 \
match ip tos 0x90 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 13 u32 \
match ip tos 0x98 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 14 u32 \
match ip tos 0x98 0xfc \
$meter3a \
drop flowid :3
#
## *********************** BE ***************************
##
## Anything else (not from the AF4*) gets discarded if it
## exceeds 1Mbps and by default goes to BE if it doesnt
## Note that the BE class is also used by the AF4* in the worst
## case
##
$TC filter add dev $INDEV parent ffff: protocol ip prio 16 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,144 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter1a="police rate $CIR2 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter2a="police rate $CIR2 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter3a="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR1 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesnt exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 handle 1 fw \
$meter1 \
continue flowid 4:1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 handle 1 fw \
$meter1a \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 handle 1 fw \
$meter2 \
continue flowid 4:2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter2a \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter3 \
continue flowid 4:3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3a \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,145 +0,0 @@
#! /bin/sh
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=1000kbit
# The PIR is the excess (in addition to the CIR i.e if always
# going to the PIR --> average rate is CIR+PIR)
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 10 max sized packets
EBS1=15k
EBS2=15k
# The meters
meter1=" police rate $CIR1 burst $CBS1 "
meter1a=" police rate $PIR1 burst $EBS1 "
meter2=" police rate $CIR2 burst $CBS1 "
meter2a="police rate $PIR2 burst $CBS1 "
meter3=" police rate $CIR2 burst $CBS2 "
meter3a=" police rate $PIR2 burst $EBS2 "
meter4=" police rate $CIR1 burst $CBS2 "
meter5=" police rate $CIR1 burst $CBS2 "
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# NOTE: tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
#anything from subnet 10.2.0.2/24 is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS + PIR/EBS
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip src 10.2.0.0/24 $meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip src 10.2.0.0/24 $meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip src 10.2.0.0/24 $meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip src 10.2.0.0/24 $meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip src 10.2.0.0/24 $meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip src 10.2.0.0/24 $meter3a \
drop flowid :3
#
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0 $meter5 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,98 +0,0 @@
Note all these are mere examples which can be customized to your needs
AFCBQ
-----
AF PHB built using CBQ, DSMARK,GRED (default in GRIO mode) ,RED for BE
and the tcindex classifier with some algorithmic mapping
EFCBQ
-----
EF PHB built using CBQ (for rate control and prioritization),
DSMARK( to remark DSCPs), tcindex classifier and RED for the BE
traffic.
EFPRIO
------
EF PHB using the PRIO scheduler, Token Bucket to rate control EF,
tcindex classifier, DSMARK to remark, and RED for the BE traffic
EDGE scripts
==============
CB-3(1|2)-(u32/chains)
======================
The major differences are that the classifier is u32 on -u32 extension
and IPchains on the chains extension. CB stands for color Blind
and 31 is for the mode where only a CIR and CBS are defined whereas
32 stands for a mode where a CIR/CBS + PIR/EBS are defined.
Color Blind (CB)
==========-----=
We look at one special subnet that we are interested in for simplicty
reasons to demonstrate the capability. We send the packets from that
subnet to AF4*, BE or end up dropping depending on the metering results.
The algorithm overview is as follows:
*classify:
**case: subnet X
----------------
if !exceed meter1 tag as AF41
else
if !exceed meter2 tag as AF42
else
if !exceed meter 3 tag as AF43
else
drop
default case: Any other subnet
-------------------------------
if !exceed meter 5 tag as AF43
else
drop
One Egress side change the DSCPs of the packets to reflect AF4* and BE
based on the tags from the ingress.
-------------------------------------------------------------
Color Aware
===========
Define some meters with + policing and give them IDs eg
meter1=police index 1 rate $CIR1 burst $CBS1
meter2=police index 2 rate $CIR2 burst $CBS2 etc
General overview:
classify based on the DSCPs and use the policer ids to decide tagging
*classify on ingress:
switch (dscp) {
case AF41: /* tos&0xfc == 0x88 */
if (!exceed meter1) break;
case AF42: /* tos&0xfc == 0x90 */
if (!exceed meter2) {
tag as AF42;
break;
}
case AF43: /* tos&0xfc == 0x98 */
if (!exceed meter3) {
tag as AF43;
break;
} else
drop;
default:
if (!exceed meter4) tag as BE;
else drop;
}
On the Egress side mark the proper AF tags

View File

@ -1,105 +0,0 @@
#!/usr/bin/perl
#
#
# AF using CBQ for a single interface eth0
# 4 AF classes using GRED and one BE using RED
# Things you might want to change:
# - the device bandwidth (set at 10Mbits)
# - the bandwidth allocated for each AF class and the BE class
# - the drop probability associated with each AF virtual queue
#
# AF DSCP values used (based on AF draft 04)
# -----------------------------------------
# AF DSCP values
# AF1 1. 0x0a 2. 0x0c 3. 0x0e
# AF2 1. 0x12 2. 0x14 3. 0x16
# AF3 1. 0x1a 2. 0x1c 3. 0x1e
# AF4 1. 0x22 2. 0x24 3. 0x26
#
#
# A simple DSCP-class relationship formula used to generate
# values in the for loop of this script; $drop stands for the
# DP
# $dscp = ($class*8+$drop*2)
#
# if you use GRIO buffer sharing, then GRED priority is set as follows:
# $gprio=$drop+1;
#
$TC = "/usr/src/iproute2-current/tc/tc";
$DEV = "dev lo";
$DEV = "dev eth1";
$DEV = "dev eth0";
# the BE-class number
$beclass = "5";
#GRIO buffer sharing on or off?
$GRIO = "";
$GRIO = "grio";
# The bandwidth of your device
$linerate="10Mbit";
# The BE and AF rates
%rate_table=();
$berate="1500Kbit";
$rate_table{"AF1rate"}="1500Kbit";
$rate_table{"AF2rate"}="1500Kbit";
$rate_table{"AF3rate"}="1500Kbit";
$rate_table{"AF4rate"}="1500Kbit";
#
#
#
print "\n# --- General setup ---\n";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex mask 0xfc " .
"shift 2 pass_on\n";
#"shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth $linerate ".
"cell 8 avpkt 1000 mpu 64\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 tcindex ".
"mask 0xf0 shift 4 pass_on\n";
for $class (1..4) {
print "\n# --- AF Class $class specific setup---\n";
$AFrate=sprintf("AF%drate",$class);
print "$TC class add $DEV parent 2:0 classid 2:$class cbq ".
"bandwidth $linerate rate $rate_table{$AFrate} avpkt 1000 prio ".
(6-$class)." bounded allot 1514 weight 1 maxburst 21\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle $class ".
"tcindex classid 2:$class\n";
print "$TC qdisc add $DEV parent 2:$class gred setup DPs 3 default 2 ".
"$GRIO\n";
#
# per DP setup
#
for $drop (1..3) {
print "\n# --- AF Class $class DP $drop---\n";
$dscp = $class*8+$drop*2;
$tcindex = sprintf("1%x%x",$class,$drop);
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 ".
"handle $dscp tcindex classid 1:$tcindex\n";
$prob = $drop*0.02;
if ($GRIO) {
$gprio = $drop+1;
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob ".
"prio $gprio\n";
} else {
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob \n";
}
}
}
#
#
print "\n#------BE Queue setup------\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 1:1\n";
print "$TC class add $DEV parent 2:0 classid 2:$beclass cbq ".
"bandwidth $linerate rate $berate avpkt 1000 prio 6 " .
"bounded allot 1514 weight 1 maxburst 21 \n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle 0 tcindex ".
"classid 2:5\n";
print "$TC qdisc add $DEV parent 2:5 red limit 60KB min 15KB max 45KB ".
"burst 20 avpkt 1000 bandwidth $linerate probability 0.4\n";

View File

@ -1,25 +0,0 @@
#!/usr/bin/perl
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
$efrate="1.5Mbit";
$MTU="1.5kB";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 prio\n";
#
# EF class: Maximum about one MTU sized packet allowed on the queue
#
print "$TC qdisc add $DEV parent 2:1 tbf rate $efrate burst $MTU limit 1.6kB\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
#
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,31 +0,0 @@
#!/usr/bin/perl
#
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth ".
"10Mbit cell 8 avpkt 1000 mpu 64\n";
#
# EF class
#
print "$TC class add $DEV parent 2:0 classid 2:1 cbq bandwidth ".
"10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated ".
"allot 1514 weight 1 maxburst 10 \n";
# packet fifo for EF?
print "$TC qdisc add $DEV parent 2:1 pfifo limit 5\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC class add $DEV parent 2:0 classid 2:2 cbq bandwidth ".
"10Mbit rate 5Mbit avpkt 1000 prio 7 allot 1514 weight 1 ".
"maxburst 21 borrow split 2:0 defmap 0xffff \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,125 +0,0 @@
These were the tests done to validate the Diffserv scripts.
This document will be updated continously. If you do more
thorough validation testing please post the details to the
diffserv mailing list.
Nevertheless, these tests should serve for basic validation.
AFCBQ, EFCBQ, EFPRIO
----------------------
generate all possible DSCPs and observe that they
get sent to the proper classes. In the case of AF also
to the correct Virtual Queues.
Edge1
-----
generate TOS values 0x0,0x10,0xbb each with IP addresses
10.2.0.24 (mark 1), 10.2.0.3 (mark2) and 10.2.0.30 (mark 3)
and observe that they get marked as expected.
Edge2
-----
-Repeat the tests in Edge1
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 3
Edge31-cb-chains
----------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 1.
Metering: The data throughput should not exceed 2*CIR1 + 2*CIR2
which is roughly: 5mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
Edge31-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
approximately 5mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
approximately 3.5mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1.5mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device
Edge32-cb-u32, Edge32-cb-chains
-------------------------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well).
Metering:
The data throughput should not exceed 2*CIR1 + 2*CIR2
+ 2*PIR2 + PIR1 for u32 which is roughly: 6mbps
The data throughput should not exceed 2*CIR1 + 5*CIR2
for chains which is roughly: 6mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
TODO:
-More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
-More tests needed to capture stats on how many times the CIR was exceeded
but the data was not remarked etc.
Edge32-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
+PIR1 -- approximately 4mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
+ 2* PIR2 approximately 3mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed PIR1+ CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device

View File

@ -1,134 +0,0 @@
#!/bin/sh
#
# Setup address label from /etc/gai.conf
#
# Written by YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>, 2010.
#
IP=ip
DEFAULT_GAICONF=/etc/gai.conf
verbose=
debug=
function run ()
{
if [ x"$verbose" != x"" ]; then
echo "$@"
fi
if [ x"$debug" = x"" ]; then
"$@"
fi
}
function do_load_config ()
{
file=$1; shift
flush=1
cat $file | while read command prefix label; do
if [ x"$command" = x"#label" ]; then
if [ ${flush} = 1 ]; then
run ${IP} -6 addrlabel flush
flush=0
fi
run ${IP} -6 addrlabel add prefix $prefix label $label
fi
done
}
function do_list_config ()
{
${IP} -6 addrlabel list | while read p pfx l lbl; do
echo label ${pfx} ${lbl}
done
}
function help ()
{
echo "Usage: $0 [-v] {--list | --config [ ${DEFAULT_GAICONF} ] | --default}"
exit 1
}
TEMP=`getopt -o c::dlv -l config::,default,list,verbose -n gaiconf -- "$@"`
if [ $? != 0 ]; then
echo "Terminating..." >&2
exit 1
fi
TEMPFILE=`mktemp`
eval set -- "$TEMP"
while true ; do
case "$1" in
-c|--config)
if [ x"$cmd" != x"" ]; then
help
fi
case "$2" in
"") gai_conf="${DEFAULT_GAICONF}"
shift 2
;;
*) gai_conf="$2"
shift 2
esac
cmd=config
;;
-d|--default)
if [ x"$cmd" != x"" ]; then
help
fi
gai_conf=${TEMPFILE}
cmd=config
;;
-l|--list)
if [ x"$cmd" != x"" ]; then
help
fi
cmd=list
shift
;;
-v)
verbose=1
shift
;;
--)
shift;
break
;;
*)
echo "Internal error!" >&2
exit 1
;;
esac
done
case "$cmd" in
config)
if [ x"$gai_conf" = x"${TEMPFILE}" ]; then
sed -e 's/^[[:space:]]*//' <<END_OF_DEFAULT >${TEMPFILE}
label ::1/128 0
label ::/0 1
label 2002::/16 2
label ::/96 3
label ::ffff:0:0/96 4
label fec0::/10 5
label fc00::/7 6
label 2001:0::/32 7
END_OF_DEFAULT
fi
do_load_config "$gai_conf"
;;
list)
do_list_config
;;
*)
help
;;
esac
rm -f "${TEMPFILE}"
exit 0

View File

@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
GENLOBJ=genl.o
include ../Config
include ../config.mk
SHARED_LIBS ?= y
CFLAGS += -fno-strict-aliasing

View File

@ -13,7 +13,6 @@
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <netinet/in.h>
@ -29,84 +28,18 @@
static int usage(void)
{
fprintf(stderr,"Usage: ctrl <CMD>\n" \
"CMD := get <PARMS> | list | monitor\n" \
"CMD := get <PARMS> | list | monitor | policy <PARMS>\n" \
"PARMS := name <name> | id <id>\n" \
"Examples:\n" \
"\tctrl ls\n" \
"\tctrl monitor\n" \
"\tctrl get name foobar\n" \
"\tctrl get id 0xF\n");
"\tctrl get id 0xF\n"
"\tctrl policy name foobar\n"
"\tctrl policy id 0xF\n");
return -1;
}
int genl_ctrl_resolve_family(const char *family)
{
struct rtnl_handle rth;
int ret = 0;
struct {
struct nlmsghdr n;
struct genlmsghdr g;
char buf[4096];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN),
.n.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
.n.nlmsg_type = GENL_ID_CTRL,
.g.cmd = CTRL_CMD_GETFAMILY,
};
struct nlmsghdr *nlh = &req.n;
struct genlmsghdr *ghdr = &req.g;
if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
fprintf(stderr, "Cannot open generic netlink socket\n");
exit(1);
}
addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
if (rtnl_talk(&rth, nlh, nlh, sizeof(req)) < 0) {
fprintf(stderr, "Error talking to the kernel\n");
goto errout;
}
{
struct rtattr *tb[CTRL_ATTR_MAX + 1];
int len = nlh->nlmsg_len;
struct rtattr *attrs;
if (nlh->nlmsg_type != GENL_ID_CTRL) {
fprintf(stderr, "Not a controller message, nlmsg_len=%d "
"nlmsg_type=0x%x\n", nlh->nlmsg_len, nlh->nlmsg_type);
goto errout;
}
if (ghdr->cmd != CTRL_CMD_NEWFAMILY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
goto errout;
}
len -= NLMSG_LENGTH(GENL_HDRLEN);
if (len < 0) {
fprintf(stderr, "wrong controller message len %d\n", len);
return -1;
}
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
if (tb[CTRL_ATTR_FAMILY_ID] == NULL) {
fprintf(stderr, "Missing family id TLV\n");
goto errout;
}
ret = rta_getattr_u16(tb[CTRL_ATTR_FAMILY_ID]);
}
errout:
rtnl_close(&rth);
return ret;
}
static void print_ctrl_cmd_flags(FILE *fp, __u32 fl)
{
fprintf(fp, "\n\t\tCapabilities (0x%x):\n ", fl);
@ -172,8 +105,7 @@ static int print_ctrl_grp(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
/*
* The controller sends one nlmsg per family
*/
static int print_ctrl(const struct sockaddr_nl *who,
struct rtnl_ctrl_data *ctrl,
static int print_ctrl(struct rtnl_ctrl_data *ctrl,
struct nlmsghdr *n, void *arg)
{
struct rtattr *tb[CTRL_ATTR_MAX + 1];
@ -193,7 +125,8 @@ static int print_ctrl(const struct sockaddr_nl *who,
ghdr->cmd != CTRL_CMD_DELFAMILY &&
ghdr->cmd != CTRL_CMD_NEWFAMILY &&
ghdr->cmd != CTRL_CMD_NEWMCAST_GRP &&
ghdr->cmd != CTRL_CMD_DELMCAST_GRP) {
ghdr->cmd != CTRL_CMD_DELMCAST_GRP &&
ghdr->cmd != CTRL_CMD_GETPOLICY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
return 0;
}
@ -206,7 +139,7 @@ static int print_ctrl(const struct sockaddr_nl *who,
}
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
parse_rtattr_flags(tb, CTRL_ATTR_MAX, attrs, len, NLA_F_NESTED);
if (tb[CTRL_ATTR_FAMILY_NAME]) {
char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]);
@ -229,6 +162,36 @@ static int print_ctrl(const struct sockaddr_nl *who,
__u32 *ma = RTA_DATA(tb[CTRL_ATTR_MAXATTR]);
fprintf(fp, " max attribs: %d ",*ma);
}
if (tb[CTRL_ATTR_OP_POLICY]) {
const struct rtattr *pos;
rtattr_for_each_nested(pos, tb[CTRL_ATTR_OP_POLICY]) {
struct rtattr *ptb[CTRL_ATTR_POLICY_DUMP_MAX + 1];
struct rtattr *pattrs = RTA_DATA(pos);
int plen = RTA_PAYLOAD(pos);
parse_rtattr_flags(ptb, CTRL_ATTR_POLICY_DUMP_MAX,
pattrs, plen, NLA_F_NESTED);
fprintf(fp, " op %d policies:",
pos->rta_type & ~NLA_F_NESTED);
if (ptb[CTRL_ATTR_POLICY_DO]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DO]);
fprintf(fp, " do=%d", *v);
}
if (ptb[CTRL_ATTR_POLICY_DUMP]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DUMP]);
fprintf(fp, " dump=%d", *v);
}
}
}
if (tb[CTRL_ATTR_POLICY])
nl_print_policy(tb[CTRL_ATTR_POLICY], fp);
/* end of family definitions .. */
fprintf(fp,"\n");
if (tb[CTRL_ATTR_OPS]) {
@ -277,10 +240,9 @@ static int print_ctrl(const struct sockaddr_nl *who,
return 0;
}
static int print_ctrl2(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg)
static int print_ctrl2(struct nlmsghdr *n, void *arg)
{
return print_ctrl(who, NULL, n, arg);
return print_ctrl(NULL, n, arg);
}
static int ctrl_list(int cmd, int argc, char **argv)
@ -299,13 +261,16 @@ static int ctrl_list(int cmd, int argc, char **argv)
.g.cmd = CTRL_CMD_GETFAMILY,
};
struct nlmsghdr *nlh = &req.n;
struct nlmsghdr *answer = NULL;
if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
fprintf(stderr, "Cannot open generic netlink socket\n");
exit(1);
}
if (cmd == CTRL_CMD_GETFAMILY) {
if (cmd == CTRL_CMD_GETFAMILY || cmd == CTRL_CMD_GETPOLICY) {
req.g.cmd = cmd;
if (argc != 2) {
fprintf(stderr, "Wrong number of params\n");
return -1;
@ -313,7 +278,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
if (matches(*argv, "name") == 0) {
NEXT_ARG();
strncpy(d, *argv, sizeof (d) - 1);
strlcpy(d, *argv, sizeof(d));
addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME,
d, strlen(d) + 1);
} else if (matches(*argv, "id") == 0) {
@ -330,20 +295,22 @@ static int ctrl_list(int cmd, int argc, char **argv)
fprintf(stderr, "Wrong params\n");
goto ctrl_done;
}
}
if (rtnl_talk(&rth, nlh, nlh, sizeof(req)) < 0) {
if (cmd == CTRL_CMD_GETFAMILY) {
if (rtnl_talk(&rth, nlh, &answer) < 0) {
fprintf(stderr, "Error talking to the kernel\n");
goto ctrl_done;
}
if (print_ctrl2(NULL, nlh, (void *) stdout) < 0) {
if (print_ctrl2(answer, (void *) stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
goto ctrl_done;
}
}
if (cmd == CTRL_CMD_UNSPEC) {
if (cmd == CTRL_CMD_UNSPEC || cmd == CTRL_CMD_GETPOLICY) {
nlh->nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
nlh->nlmsg_seq = rth.dump = ++rth.seq;
@ -358,6 +325,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
ret = 0;
ctrl_done:
free(answer);
rtnl_close(&rth);
return ret;
}
@ -393,6 +361,8 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0)
return ctrl_list(CTRL_CMD_UNSPEC, argc-1, argv+1);
if (matches(*argv, "policy") == 0)
return ctrl_list(CTRL_CMD_GETPOLICY, argc-1, argv+1);
if (matches(*argv, "help") == 0)
return usage();

View File

@ -13,7 +13,6 @@
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>
#include <dlfcn.h>
#include <sys/socket.h>
@ -23,21 +22,19 @@
#include <errno.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h> /* until we put our own header */
#include "SNAPSHOT.h"
#include "version.h"
#include "utils.h"
#include "genl_utils.h"
int show_stats = 0;
int show_details = 0;
int show_raw = 0;
int resolve_hosts = 0;
int show_stats;
int show_details;
int show_raw;
static void *BODY;
static struct genl_util * genl_list;
static struct genl_util *genl_list;
static int print_nofopt(const struct sockaddr_nl *who, struct nlmsghdr *n,
void *arg)
static int print_nofopt(struct nlmsghdr *n, void *arg)
{
fprintf((FILE *) arg, "unknown genl type ..\n");
return 0;
@ -46,8 +43,9 @@ static int print_nofopt(const struct sockaddr_nl *who, struct nlmsghdr *n,
static int parse_nofopt(struct genl_util *f, int argc, char **argv)
{
if (argc) {
fprintf(stderr, "Unknown genl \"%s\", hence option \"%s\" "
"is unparsable\n", f->name, *argv);
fprintf(stderr,
"Unknown genl \"%s\", hence option \"%s\" is unparsable\n",
f->name, *argv);
return -1;
}
@ -100,9 +98,10 @@ static void usage(void) __attribute__((noreturn));
static void usage(void)
{
fprintf(stderr, "Usage: genl [ OPTIONS ] OBJECT | help }\n"
"where OBJECT := { ctrl etc }\n"
" OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }\n");
fprintf(stderr,
"Usage: genl [ OPTIONS ] OBJECT [help] }\n"
"where OBJECT := { ctrl etc }\n"
" OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -V[ersion] | -h[elp] }\n");
exit(-1);
}
@ -119,24 +118,26 @@ int main(int argc, char **argv)
} else if (matches(argv[1], "-raw") == 0) {
++show_raw;
} else if (matches(argv[1], "-Version") == 0) {
printf("genl utility, iproute2-ss%s\n", SNAPSHOT);
printf("genl utility, iproute2-%s\n", version);
exit(0);
} else if (matches(argv[1], "-help") == 0) {
usage();
} else {
fprintf(stderr, "Option \"%s\" is unknown, try "
"\"genl -help\".\n", argv[1]);
fprintf(stderr,
"Option \"%s\" is unknown, try \"genl -help\".\n",
argv[1]);
exit(-1);
}
argc--; argv++;
}
if (argc > 1) {
struct genl_util *a;
int ret;
struct genl_util *a = NULL;
a = get_genl_kind(argv[1]);
if (!a) {
fprintf(stderr,"bad genl %s\n", argv[1]);
fprintf(stderr, "bad genl %s\n", argv[1]);
exit(-1);
}

View File

@ -1,17 +1,15 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _TC_UTIL_H_
#define _TC_UTIL_H_ 1
#include <linux/genetlink.h>
#include "utils.h"
#include "linux/genetlink.h"
struct genl_util
{
struct genl_util {
struct genl_util *next;
char name[16];
int (*parse_genlopt)(struct genl_util *fu, int argc, char **argv);
int (*print_genlopt)(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
int (*print_genlopt)(struct nlmsghdr *n, void *arg);
};
extern int genl_ctrl_resolve_family(const char *family);
#endif

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* This file creates a dummy version of dynamic loading
* for environments where dynamic linking

View File

@ -1 +0,0 @@
static const char SNAPSHOT[] = "160808";

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_API__
#define __BPF_API__
@ -18,6 +19,19 @@
#include "bpf_elf.h"
/** libbpf pin type. */
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
LIBBPF_PIN_BY_NAME,
};
/** Type helper macros. */
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
/** Misc macros. */
#ifndef __stringify
@ -72,6 +86,11 @@
__section(__stringify(ID) "/" __stringify(KEY))
#endif
#ifndef __section_xdp_entry
# define __section_xdp_entry \
__section(ELF_SECTION_PROG)
#endif
#ifndef __section_cls_entry
# define __section_cls_entry \
__section(ELF_SECTION_CLASSIFIER)
@ -82,6 +101,11 @@
__section(ELF_SECTION_ACTION)
#endif
#ifndef __section_lwt_entry
# define __section_lwt_entry \
__section(ELF_SECTION_PROG)
#endif
#ifndef __section_license
# define __section_license \
__section(ELF_SECTION_LICENSE)
@ -107,9 +131,14 @@
/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
#ifndef __BPF_FUNC
# define __BPF_FUNC(NAME, ...) \
(* NAME)(__VA_ARGS__) __maybe_unused
#endif
#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...) \
(* NAME)(__VA_ARGS__) __maybe_unused = (void *) BPF_FUNC_##NAME
__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
#endif
/* Map access/manipulation */
@ -147,10 +176,15 @@ static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
/* System helpers */
static uint32_t BPF_FUNC(get_smp_processor_id);
static uint32_t BPF_FUNC(get_numa_node_id);
/* Packet misc meta data */
static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
/* Packet redirection */
static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
@ -169,6 +203,20 @@ static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
uint32_t from, uint32_t to, uint32_t flags);
static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
const void *to, uint32_t to_size, uint32_t seed);
static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
uint32_t flags);
static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
uint32_t flags);
static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
/* Event notification */
static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
uint64_t index, const void *data, uint32_t size) =
(void *) BPF_FUNC_perf_event_output;
/* Packet vlan encap/decap */
static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_ELF__
#define __BPF_ELF__
@ -15,6 +16,7 @@
/* ELF section names, etc */
#define ELF_SECTION_LICENSE "license"
#define ELF_SECTION_MAPS "maps"
#define ELF_SECTION_PROG "prog"
#define ELF_SECTION_CLASSIFIER "classifier"
#define ELF_SECTION_ACTION "action"
@ -35,6 +37,17 @@ struct bpf_elf_map {
__u32 flags;
__u32 id;
__u32 pinning;
__u32 inner_id;
__u32 inner_idx;
};
#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
struct ____btf_map_##name { \
type_key key; \
type_val value; \
}; \
struct ____btf_map_##name \
__attribute__ ((section(".maps." #name), used)) \
____btf_map_##name = { }
#endif /* __BPF_ELF__ */

View File

@ -1,8 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_SCM__
#define __BPF_SCM__
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include "utils.h"
#include "bpf_elf.h"

327
include/bpf_util.h Normal file
View File

@ -0,0 +1,327 @@
/*
* bpf_util.h BPF common code
*
* This program is free software; you can distribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
* Authors: Daniel Borkmann <daniel@iogearbox.net>
* Jiri Pirko <jiri@resnulli.us>
*/
#ifndef __BPF_UTIL__
#define __BPF_UTIL__
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/filter.h>
#include <linux/magic.h>
#include <linux/elf-em.h>
#include <linux/if_alg.h>
#include "utils.h"
#include "bpf_scm.h"
#define BPF_ENV_UDS "TC_BPF_UDS"
#define BPF_ENV_MNT "TC_BPF_MNT"
#ifndef BPF_MAX_LOG
# define BPF_MAX_LOG 4096
#endif
#define BPF_DIR_GLOBALS "globals"
#ifndef BPF_FS_MAGIC
# define BPF_FS_MAGIC 0xcafe4a11
#endif
#define BPF_DIR_MNT "/sys/fs/bpf"
#ifndef TRACEFS_MAGIC
# define TRACEFS_MAGIC 0x74726163
#endif
#define TRACE_DIR_MNT "/sys/kernel/tracing"
#ifndef AF_ALG
# define AF_ALG 38
#endif
#ifndef EM_BPF
# define EM_BPF 247
#endif
struct bpf_cfg_ops {
void (*cbpf_cb)(void *nl, const struct sock_filter *ops, int ops_len);
void (*ebpf_cb)(void *nl, int fd, const char *annotation);
};
enum bpf_mode {
CBPF_BYTECODE,
CBPF_FILE,
EBPF_OBJECT,
EBPF_PINNED,
BPF_MODE_MAX,
};
struct bpf_cfg_in {
const char *object;
const char *section;
const char *uds;
enum bpf_prog_type type;
enum bpf_mode mode;
__u32 ifindex;
bool verbose;
int argc;
char **argv;
struct sock_filter opcodes[BPF_MAXINSNS];
union {
int n_opcodes;
int prog_fd;
};
};
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
#define BPF_ALU64_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
#define BPF_ALU32_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
#define BPF_ALU64_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
#define BPF_ALU32_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* Short form of mov, dst_reg = src_reg */
#define BPF_MOV64_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
#define BPF_MOV32_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* Short form of mov, dst_reg = imm32 */
#define BPF_MOV64_IMM(DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
#define BPF_MOV32_IMM(DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
#define BPF_LD_IMM64(DST, IMM) \
BPF_LD_IMM64_RAW(DST, 0, IMM)
#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
((struct bpf_insn) { \
.code = BPF_LD | BPF_DW | BPF_IMM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = (__u32) (IMM) }), \
((struct bpf_insn) { \
.code = 0, /* zero is reserved opcode */ \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
#ifndef BPF_PSEUDO_MAP_FD
# define BPF_PSEUDO_MAP_FD 1
#endif
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
#define BPF_LD_MAP_FD(DST, MAP_FD) \
BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
#define BPF_LD_ABS(SIZE, IMM) \
((struct bpf_insn) { \
.code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
((struct bpf_insn) { \
.code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
#define BPF_JMP_REG(OP, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Raw code statement block */
#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
((struct bpf_insn) { \
.code = CODE, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = IMM })
/* Program exit */
#define BPF_EXIT_INSN() \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_EXIT, \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = 0 })
int bpf_parse_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops);
int bpf_load_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops,
void *nl);
int bpf_parse_and_load_common(struct bpf_cfg_in *cfg,
const struct bpf_cfg_ops *ops, void *nl);
const char *bpf_prog_to_default_section(enum bpf_prog_type type);
int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
int bpf_trace_pipe(void);
void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, __u32 ifindex,
char *log, size_t size_log);
int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log,
size_t size_log);
int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_dump_prog_info(FILE *f, uint32_t id);
#ifdef HAVE_ELF
int bpf_send_map_fds(const char *path, const char *obj);
int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
unsigned int entries);
#ifdef HAVE_LIBBPF
int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
int iproute2_bpf_fetch_ancillary(void);
int iproute2_get_root_path(char *root_path, size_t len);
bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map *imap,
struct bpf_elf_map *omap, char *omap_name);
int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
#endif /* HAVE_LIBBPF */
#else
static inline int bpf_send_map_fds(const char *path, const char *obj)
{
return 0;
}
static inline int bpf_recv_map_fds(const char *path, int *fds,
struct bpf_map_aux *aux,
unsigned int entries)
{
return -1;
}
#ifdef HAVE_LIBBPF
static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
{
fprintf(stderr, "No ELF library support compiled in.\n");
return -1;
}
#endif /* HAVE_LIBBPF */
#endif /* HAVE_ELF */
const char *get_libbpf_version(void);
#endif /* __BPF_UTIL__ */

6
include/cg_map.h Normal file
View File

@ -0,0 +1,6 @@
#ifndef __CG_MAP_H__
#define __CG_MAP_H__
const char *cg_id_to_path(__u64 id);
#endif /* __CG_MAP_H__ */

View File

@ -1,6 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __COLOR_H__
#define __COLOR_H__ 1
#include <stdbool.h>
enum color_attr {
COLOR_IFNAME,
COLOR_MAC,
@ -8,10 +11,17 @@ enum color_attr {
COLOR_INET6,
COLOR_OPERSTATE_UP,
COLOR_OPERSTATE_DOWN,
COLOR_CLEAR
COLOR_NONE
};
void enable_color(void);
enum color_opt {
COLOR_OPT_NEVER = 0,
COLOR_OPT_AUTO = 1,
COLOR_OPT_ALWAYS = 2
};
bool check_enable_color(int color, int json);
bool matches_color(const char *arg, int *val);
int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
enum color_attr ifa_family_color(__u8 ifa_family);
enum color_attr oper_state_color(__u8 state);

Some files were not shown because too many files have changed in this diff Show More