Compare commits

...

4700 Commits

Author SHA1 Message Date
Paul Blakey 73590d9573 tc: flower: Fix buffer overflow on large labels
Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').

Fix that by increasing the print buffer size.

Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111

Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-06 13:44:50 -08:00
Stephen Hemminger 3f77bc6253 uapi: update to if_ether.h
Merged from 5.16-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-03 12:20:02 -08:00
Maxim Petrov 5f8bb902e1 ip/ipnexthop: fix unsigned overflow in parse_nh_group_type_res()
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.

Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.

Fixes: 9167671822 ("nexthop: Add support for resilient nexthop groups")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:48 -08:00
Maxim Petrov 3184de3797 lib/bpf_legacy: remove always-true check
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:04 -08:00
Stephen Hemminger 79026c1262 rdma: update uapi headers
Update the RDMA uapi headers from 5.16.0-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 10:00:19 -08:00
Stephen Hemminger fa58de9b0c vdpa: align uapi headers
Update vdpa headers based on 5.16.0-rc1 and remove redundant
copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 09:56:57 -08:00
[200~jiangheng be31c26484 lnstat: fix buffer overflow in header output
Running lnstat will cause core dump from reading past end of array.

Segmentation fault (core dumped)

The maximum  value of th.num_lines is HDR_LINES(10),  h should not be equal to th.num_lines, array th.hdr may be out of bounds.

Signed-off-by jiangheng <jiangheng12@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 13:41:10 -08:00
Maxim Petrov 0e94972590 tc/m_vlan: fix print_vlan() conditional on TCA_VLAN_ACT_PUSH_ETH
Fix the wild bracket in the if clause leading to the error in the condition.

Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 11:13:12 -08:00
Davide Caratti 9bd5ab0f09 mptcp: fix JSON output when dumping endpoints by id
iproute ignores '-j' command line argument when dumping endpoints by id:

 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
 [{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
 1.2.3.4 id 42 signal backup

fix mptcp_addr_show() to use the proper JSON helpers.

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-11 10:07:26 -08:00
Anssi Hannula a787d9ae10 man: tc-u32: Fix page to match new firstfrag behavior
Commit 690b11f4a6 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.

However, the documentation added in f15a23966f ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).

Due to this, the example does not work correctly and does not actually
classify regular SSH packets.

Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.

Fixes: f15a23966f ("tc: add a man page for u32 filter")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:46:17 -08:00
Luca Boccassi af96c7b5dd Fix some typos detected by Lintian in manpages
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:45:44 -08:00
Stephen Hemminger 35c81b18c4 uapi: update vdpa.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:40:40 -08:00
David Ahern 50b668bdbf Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:45:31 -06:00
David Ahern 9c56d693f6 Merge branch 'can-tdc-plus-cleanups' into next
Vincent Mailhol  says:

====================

The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.

This patch series contains four patches which respectively:

  1. Correct the bittiming ranges in the print_usage function and add
  the units to give more clarity: some parameters are in milliseconds,
  some in nano seconds, some in time quantum and the newly TDC
  parameters introduced in this series would be in clock period.

  2. Do some code refactoring on function print_ctrlmode().

  3. factorize the many print_*(PRINT_JSON, ...) and fprintf
  occurrences in a single print_*(PRINT_ANY, ...) call and fix the
  signedness while doing that.

  4. report the value of the bitrate prescalers (brp and dbrp).

  5. adds command line support for the TDC in iproute and goes together
  with below series in the kernel:
  https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t

** Changelog **

>From RFC v5 to v6:
  * Dropped the RFC tag because the related patch series on the kernel
    side were pulled into net-next.
  * Remove the changes in include/uapi/linux/can/netlink.h because
    these should be pulled separately.
  * Add another patch (the second of this series) to do some cleanup
    on function print_ctrlmode().
  * Minor fixes in the patch comments (grammar, rephrasing).

>From RFC v4 to RFC v5:
  * Add the unit (bps, tq, ns or ms) in print_usage()
  * Rewrote void can_print_timing_min_max() to better factorize the
    code.
  * Rewrote the commit message of the two last patches (those related
    to TDC) to either add clarification of fix inacurracies.

>From v3 to RFC v4:
  * Reflect the changes made on the kernel side.

>From RFC v2 to v3:
  * Dropped the RFC tag. Now that the kernel patch reach the testing
    branch, I am finaly ready.
  * Regression fix: configuring a link with only nominal bittiming
    returned -EOPNOTSUPP
  * Added two more patches to the series:
      - iplink_can: fix configuration ranges in print_usage()
      - iplink_can: print brp and dbrp bittiming variables
  * Other small fixes on formatting.

>From RFC v1 to RFC v2:
  * Add an additional patch to the series to fix the issues reported
    by Stephen Hemminger
    Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:44:56 -06:00
Vincent Mailhol 0c263d7c36 iplink_can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.

This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).

This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
  - IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
    minimum value
  - IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
    maximum value
  - IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
    minimum value
  - IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
    maximum value
  - IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
    window minimum value
  - IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
    window maximum value
  - IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
  - IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
  - IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window

All those new parameters are nested together into the attribute
IFLA_CAN_TDC.

The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.

When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate.  Valid options for tdc-mode are:

  * auto: the transmitter dynamically measures TDCV for each of the
    transmitted frames. As such, TDCV can not be manually provided. In
    this mode, the user must specify TDCO and may also specify TDCF if
    supported.

  * manual: use a static TDCV provided by the user. In this mode, the
    user must specify both TDCV and TDCO and may also specify TDCF if
    supported.

  * off: TDC is explicitly disabled.

  * tdc-mode parameter omitted (default mode): the kernel decides
    whether TDC should be enabled or not and if so, it calculates the
    TDC values. TDC parameters are an expert option and the average
    user is not expected to provide those, thus the presence of this
    "default mode".

If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.

If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.

TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).

TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).

For reference, here are a few samples of how the output looks like:

| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto

| $ ip --details link show can0
| 1:  can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD,TDC-AUTO> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
| 	  bitrate 1000000 sample-point 0.750
| 	  tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1 brp 1
| 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp_inc 1
| 	  dbitrate 8000000 dsample-point 0.700
| 	  dtq 12 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1
| 	  tdco 7 tdcf 8
| 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp_inc 1
| 	  tdco 0..127 tdcf 0..127
| 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

| $ ip --details --json --pretty link show can0
| [ {
|         "ifindex": 1,
|         "ifname": "can0",
|         "flags": [ "NOARP","ECHO" ],
|         "mtu": 72,
|         "qdisc": "noop",
|         "operstate": "DOWN",
|         "linkmode": "DEFAULT",
|         "group": "default",
|         "txqlen": 10,
|         "link_type": "can",
|         "promiscuity": 0,
|         "min_mtu": 0,
|         "max_mtu": 0,
|         "linkinfo": {
|             "info_kind": "can",
|             "info_data": {
|                 "ctrlmode": [ "FD","TDC-AUTO" ],
|                 "state": "STOPPED",
|                 "berr_counter": {
|                     "tx": 0,
|                     "rx": 0
|                 },
|                 "restart_ms": 0,
|                 "bittiming": {
|                     "bitrate": 1000000,
|                     "sample_point": "0.750",
|                     "tq": 12,
|                     "prop_seg": 29,
|                     "phase_seg1": 30,
|                     "phase_seg2": 20,
|                     "sjw": 1,
|                     "brp": 1
|                 },
|                 "bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 256
|                     },
|                     "tseg2": {
|                         "min": 2,
|                         "max": 128
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 128
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 512
|                     },
|                     "brp_inc": 1
|                 },
|                 "data_bittiming": {
|                     "bitrate": 8000000,
|                     "sample_point": "0.700",
|                     "tq": 12,
|                     "prop_seg": 3,
|                     "phase_seg1": 3,
|                     "phase_seg2": 3,
|                     "sjw": 1,
|                     "brp": 1,
|                     "tdc": {
|                         "tdco": 7,
|                         "tdcf": 8
|                     }
|                 },
|                 "data_bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 32
|                     },
|                     "tseg2": {
|                         "min": 1,
|                         "max": 16
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 8
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 32
|                     },
|                     "brp_inc": 1,
|                     "tdc": {
|                         "tdco": {
|                             "min": 0,
|                             "max": 127
|                         },
|                         "tdcf": {
|                             "min": 0,
|                             "max": 127
|                         }
|                     }
|                 },
|                 "clock": 80000000
|             }
|         },
|         "num_tx_queues": 1,
|         "num_rx_queues": 1,
|         "gso_max_size": 65536,
|         "gso_max_segs": 65535
|     } ]

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:43:10 -06:00
Vincent Mailhol 0f7bb8d842 iplink_can: print brp and dbrp bittiming variables
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.

Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.

Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:

        brp = clock * tq / 1000000000

with clock in hertz and tq in nano second (thus the need of a 1
billion factor to convert it back to second).

But because above formula is not so trivial to remember and is
subjected to rounding errors, it makes sense to directly output
{d,}bpr.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:54 -06:00
Vincent Mailhol 67f3c7a5cc iplink_can: use PRINT_ANY to factorize code and fix signedness
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.

Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.

This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:50 -06:00
Vincent Mailhol fd5e958c49 iplink_can: code refactoring of print_ctrlmode()
This patch only does cleanup and do not introduce any functional
changes.

We do some code refactoring of print_ctrlmode() in prevision of the
upcoming patch:

  - remove the first argument of print_ctrlmode(). It is a pointer to
    FILE and is never used.

  - add a new function argument: enum output_type t in order to
    specify the output type (i.e. PRINT_{FP,JSON,ANY}).

  - add a new function argument: const char *key in order to specify
    the name of the json array (e.g. "ctrlmode").

  - replace the _PF() macro with the print_flag() function to increase
    readability.

  - directly return if none of the flags are set (previously, this
    check was done before calling the function).

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:44 -06:00
Vincent Mailhol 8316df6e6d iplink_can: fix configuration ranges in print_usage() and add unit
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.

The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".

Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.

The actual ranges are specific to each device and can be confirmed
doing:

$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state STOPPED restart-ms 0
	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.

For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { 1..1000000 }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER }
		  PROP-SEG	:= { 1..8 }
		  PHASE-SEG1	:= { 1..8 }
		  PHASE-SEG2	:= { 1..8 }
		  SJW		:= { 1..4 }
		  RESTART-MS	:= { 0 | NUMBER }

...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]
	[ cc-len8-dlc { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { NUMBER in bps }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER in ns }
		  PROP-SEG	:= { NUMBER in tq }
		  PHASE-SEG1	:= { NUMBER in tq }
		  PHASE-SEG2	:= { NUMBER in tq }
		  SJW		:= { NUMBER in tq }
		  RESTART-MS	:= { 0 | NUMBER in ms }

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:23 -06:00
Taehee Yoo 6e15d27aae ip: add AMT support
Add basic support for Automatic Multicast Tunneling (AMT) network devices.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2021-11-03 13:24:13 -06:00
David Ahern 9cae1de564 Import amt.h
Impor amt.h uapi from last kernel sync point

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:23:38 -06:00
David Ahern 258e350ca9 Update kernel headers
Update kernel headers to commit:
    cc0356d6a02e ("Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:22:15 -06:00
Moshe Shemesh 047e9ae516 devlink: Fix cmd_dev_param_set() to check configuration mode
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.

Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.

Fixes: 13925ae9eb ("devlink: Add param command support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-02 08:34:33 -07:00
Stephen Hemminger 7a8b7573a4 v5.15.0 2021-11-01 16:41:02 -07:00
Neta Ostrovsky ad3a118f88 rdma: Fix SRQ resource tracking information json
Fix the json output for the QPs that are associated with the SRQ -
The qpn are now displayed in a json array.

Sample output before the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128,130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Sample output after the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128","130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Fixes: 9b272e138d ("rdma: Add SRQ resource tracking information")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:04:45 -07:00
Antoine Tenart 7a235a101b man: devlink-port: fix pfnum for devlink port add
When configuring a devlink PCI port, the pfnumber can be specified
using 'pfnum' and not 'pcipf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:03:44 -07:00
David Ahern e2947f6fd8 Merge branch 'managed-neighbor' into next
Daniel Borkmann  says:

====================

iproute2 patches to add support for managed neighbor entries as per recent
net-next commits:

  2ed08b5ead3c ("Merge branch 'Managed-Neighbor-Entries'")
  c47fedba94bc ("Merge branch 'minor-managed-neighbor-follow-ups'")

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 09:00:26 -06:00
Daniel Borkmann 9e009e78e7 ip, neigh: Add NTF_EXT_MANAGED support
Currently, ip neigh does not support the NTF_EXT_MANAGED flag. Add cmdline
support.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:59:03 -06:00
Daniel Borkmann 040e52526c ip, neigh: Add missing NTF_USE support
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:55 -06:00
Daniel Borkmann c76a3849ec ip, neigh: Fix up spacing in netlink dump
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:50 -06:00
Nicolas Dichtel 76b30805f9 xfrm: enable to manage default policies
Two new commands to manage default policies:
 - ip xfrm policy setdefault
 - ip xfrm policy getdefault

And the corresponding part in 'ip xfrm monitor'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:28 -06:00
David Ahern 2be7d99960 Merge branch 'rdma-optional-stats' into next
Mark Zhang  says:

====================

This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg106283.html

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-16 12:52:02 -06:00
Stephen Hemminger 229eaba507 uapi: pickup fix for xfrm ABI breakage
See kernel
Commit 844f7eaaed9 ("include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-15 17:40:30 -07:00
Nicolas Dichtel 95cd2a6204 iplink: enable to specify index when changing netns
When an interface is moved to another netns, it's possible to specify a
new ifindex. Let's add this support.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eeb85a14ee34
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 18:05:09 -06:00
David Ahern a936a73fc2 Merge branch 'config-libdir' into next
Andrea Claudi  says:

====================

This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.

Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;

Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;

Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;

Patch 5 introduces support for the --opt=value style on current options,
for uniformity;

Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;

Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.

Changelog:
----------
v4 -> v5
  - bail out when multiple values are provided with a single option
  - simplify option parsing and reduce code duplication, as suggested
    by Phil Sutter
  - remove a nasty eval on libdir option processing

v3 -> v4
  - fix parsing issue on '--include_dir' and '--libbpf_dir'
  - split '--opt value' and '--opt=value' use cases, avoid code
    duplication moving semantic checks on value to dedicated functions

v2 -> v3
  - fix parsing error on prefix and libdir options.

v1 -> v2
  - consolidate '--opt value' and '--opt=value' use cases, as suggested
    by David Ahern.
  - added patch 2 to manage the --prefix option, used by the Debian
    packaging system, as reported by Luca Boccassi, and use it when
    setting lib directory.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:59:33 -06:00
Andrea Claudi cee0cf84bd configure: add the --libdir option
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.

At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).

As Luca states in commit a3272b9372 ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.

Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.

Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:20 -06:00
Andrea Claudi 0ee1950b5c configure: add the --prefix option
This commit add the '--prefix' option to the iproute2 configure script.

This mimics the '--prefix' option that autotools configure provides, and
will be used later to allow users or packagers to set the lib directory.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:17 -06:00
Andrea Claudi 4b8bca5f9e configure: support --param=value style
This commit makes it possible to specify values for configure params
using the common autotools configure syntax '--param=value'.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:05 -06:00
Andrea Claudi 99245d1741 configure: simplify options parsing
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.

- The conditional shift after the assignments is moved right after the
  case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
  like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
  making it possible to get rid of the null string case break.

As a bonus, now the help message states that on or off should follow
--libbpf_force

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:02 -06:00
Andrea Claudi c330d09794 configure: fix parsing issue with more than one value per option
With commit a9c3d70d90 ("configure: add options ability") users are no
more able to provide wrong command lines like:

$ ./configure --include_dir foo bar

The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.

Commit a3272b9372 ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.

This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.

Fixes: a3272b9372 ("configure: restore backward compatibility")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:57 -06:00
Andrea Claudi 48c379bc2a configure: fix parsing issue on libbpf_dir option
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:

$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.

Fixes: 7ae2585b86 ("configure: convert LIBBPF environment variables to command-line options")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:53 -06:00
Andrea Claudi 1d819dcc74 configure: fix parsing issue on include_dir option
configure is stuck in an endless loop if '--include_dir' option is used
without a value:

$ ./configure --include_dir
./configure: line 506: shift: 2: shift count out of range
./configure: line 506: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the include dir
exists; this avoid to produce an erroneous configuration.

Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:48 -06:00
Neta Ostrovsky 19ba785f16 rdma: Add optional-counters set/unset support
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.

Examples:
To enable a set of optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic unset link rocep8s0f0/1 optional-counters

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:57 -06:00
Neta Ostrovsky 7d5cb70e94 rdma: Add stat "mode" support
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.

An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.

Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
    $ rdma statistic mode link rocep8s0f0/1
    link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

To present supported optional counters on link rocep8s0f0/1:
    $ rdma statistic mode supported link rocep8s0f0/1
    link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:53 -06:00
Neta Ostrovsky d480cb71f5 rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit 7301d0a9834c
("RDMA/nldev: Add support to get status of all counters")

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:47 -06:00
David Ahern e4ca6a4965 Update kernel headers
Update kernel headers to commit:
    295711fa8fec ("Merge branch 'dpaa2-irq-coalescing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:49:19 -06:00
Stephen Hemminger a31e7b7967 mptcp: cleanup include section.
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.

We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.

Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:48:36 -06:00
Paul Chaignon a500c5ac87 lib/bpf: fix map-in-map creation without prepopulation
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.

Commit 6d61a2b557 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.

This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=130996, max_entries=1, map_flags=0, inner_map_fd=0, map_name="maglev_inner", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = 32
    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=2, value_size=4, max_entries=65536, map_flags=BPF_F_NO_PREALLOC, inner_map_fd=0, map_name="maglev_outer", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = -1 EINVAL (Invalid argument)

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-14 14:37:51 -07:00
Antoine Tenart 7c032cac10 man: devlink-port: remove extra .br
br. were added between options of the same command. That is not needed
and makes the output to be one 3 lines for no particular reason.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 04ee8e6f06 man: devlink-port: fix style
Values should be .I, square brackets should be used for optional values,
curly brackets for lists. Follow this in the devlink-port man page.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 14802d84d3 man: devlink-port: fix the devlink port add synopsis
When configuring a devlink PCI SF port, the sfnumber can be specified
using 'sfnum' and not 'pcisf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
David Ahern 8cd517a805 Merge branch 'main' into next
Conflicts:
	ip/ipneigh.c

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:47:47 -06:00
David Ahern 763fd793fe Merge branch 'ioam-encap-modes' into next
Justin Iurman  says:

====================

Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.

In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.

This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:

 - inline: directly inserts IOAM inside packets (by default).

 - encap:  ip6ip6 encapsulation of IOAM inside packets.

 - auto:   either inline mode for packets generated locally or encap mode for
           in-transit packets.

With current iproute2 implementation, it is configured this way:

$ ip -6 r [...] encap ioam6 trace prealloc [...]

The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:

(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]

(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]

(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]

A tunnel destination address must be configured when using the encap mode or the
auto mode.

  [1] https://lore.kernel.org/netdev/163335001045.30570.12527451523558030753.git-patchwork-notify@kernel.org/T/#m3b428d4142ee3a414ec803466c211dfdec6e0c09

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:37:12 -06:00
Justin Iurman 41020eb0fd Update documentation
This patch updates the IOAM documentation (ip-route man page) to reflect the
three encap modes that were introduced.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:54 -06:00
Justin Iurman 8fb522cde3 Add support for IOAM encap modes
This patch adds support for the three IOAM encap modes that were introduced:
inline, encap and auto.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:29 -06:00
Frank Villaro-Dixon 897772a735 cmd: use spaces instead of tabs for usage indentation
Fix rogue "tab after spaces" used for indentation of the documentation.
This causes rendering issues on terminals using a non-standard tab width.

Signed-off-by: Frank Villaro-Dixon <frank.villaro@infomaniak.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-06 10:00:49 -07:00
Nikolay Aleksandrov b840c620fe ip: nexthop: keep cache netlink socket open
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:34:29 -06:00
Jacob Keller b90174354d devlink: print maximum number of snapshots if available
Recently the kernel gained ability to report the maximum number of
snapshots a region can have. Print this value out if it was reported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:31:30 -06:00
David Ahern 6448ed373c Update kernel headers
Update kernel headers to commit:
    49ed8dde3715 ("net: usb: use eth_hw_addr_set() for dev->addr_len cases")

Update to linux/mptcp.h is removed because it breaks compilation
of ipmptcp.c in a nontrivial way.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:28:28 -06:00
Davide Caratti e7a98a96f0 mptcp: unbreak JSON endpoint list
the following command:

 # ip -j mptcp endpoint show

prints a JSON array that misses the terminating bracket. Fix this calling
delete_json_obj() to balance the call to new_json_obj().

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
2021-10-04 14:07:09 -07:00
David Ahern ec703e0629 Merge branch 'nexthop-cache' into next
Nikolay Aleksandrov  says:

====================

This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.

I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.

Comments are very welcome as usual. :)

Changes since RFC:
 - reordered parse/print splits, in order to do that I have to parse
   resilient groups first, then add nh entry parsing so code has been
   reordered as well and patch order has changed, but there have been
   no functional changes (as before refactoring of old code is done in
   the first 8 patches and then patches 9-12 add the new cache and use it)
 - re-run all tests above

Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
             can directly pass them from the nh_entry structure later
Patch     3: adds new nha_res_grp structure which describes a resilient
             nexhtop group
Patch     4: splits print_nh_res_group into a parse and print parts
             which use the new nha_res_grp structure
Patch     5: adds new nh_entry structure which describes a nexthop
Patch     6: factors out print_nexthop's attribute parsing into nh_entry
             structure used before printing
Patch     7: factors out print_nexthop's nh_entry structure printing
Patch     8: factors out ipnh_get's rtnl talk part and allows to use a
             different rt handle for the communication
Patch     9: adds nexthop cache and helpers to manage it, it uses the
             new __ipnh_get to retrieve nexthops
Patch    10: adds a new helper print_cache_nexthop_id that prints nexthop
             information from its id, if the nexthop is not found in the
             cache it fetches it
Patch    11: the new print_cache_nexthop_id helper is used when printing
             routes with show_details (-d) to output detailed nexthop
             information, the format after nh_info is the same as
             ip nexthop show
Patch    12: changes print_nexthop into print_cache_nexthop which always
             outputs the nexthop information and can also update the cache
             (based on process_cache argument), it's used to keep the
             cache up to date while monitoring

Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
	nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via 169.254.2.12 dev veth3 weight 1
	nexthop via 169.254.3.13 dev veth5 weight 1

[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
	nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via fe80:2::12 dev veth3 weight 1
	nexthop via fe80:3::13 dev veth5 weight 1

[NEXTHOP]id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
	nh_info id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink

JSON:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101,
                    "weight": 11
                },{
                    "id": 102,
                    "weight": 45
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 11,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 45,
                "flags": [ ]
            } ]
  }

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:31:44 -06:00
Nikolay Aleksandrov 7ca868a7aa ip: nexthop: add print_cache_nexthop which prints and manages the nh cache
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:59 -06:00
Nikolay Aleksandrov 5d5dc549ce ip: route: print and cache detailed nexthop information when requested
If -d (show_details) is used when printing/monitoring routes then print
detailed nexthop information in the field "nh_info". The nexthop is also
cached for future searches.

Output looks like:
 unicast 198.51.100.0/24 nhid 103 table 3 proto boot scope global
	 nh_info id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	 nexthop via 169.254.2.22 dev veth2 weight 1
	 nexthop via 169.254.3.23 dev veth4 weight 1

The nh_info field has the same format as ip -d nexthop show would've had
for the same nexthop id.

For completeness the JSON version looks like:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101
                },{
                    "id": 102
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 1,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 1,
                "flags": [ ]
            } ]
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:55 -06:00
Nikolay Aleksandrov cb3d18c29e ip: nexthop: add a helper which retrieves and prints cached nh entry
Add a helper which looks for a nexthop in the cache and if not found
reads the entry from the kernel and caches it. Finally the entry is
printed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:51 -06:00
Nikolay Aleksandrov 60a9703032 ip: nexthop: add cache helpers
Add a static nexthop cache in a hash with 1024 buckets and helpers to
manage it (link, unlink, find, add nexthop, del nexthop). Adding new
nexthops is done by creating a new rtnl handle and using it to retrieve
the nexthop so the helper is safe to use while already reading a
response (i.e. using the global rth).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:41 -06:00
Nikolay Aleksandrov 53d7c43bd3 ip: nexthop: factor out ipnh_get_id rtnl talk into a helper
Factor out ipnh_get_id's rtnl talk portion into a separate helper which
will be reused later to retrieve nexthops for caching.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:36 -06:00
Nikolay Aleksandrov a2ca431215 ip: nexthop: factor out print_nexthop's nh entry printing
Factor out nexthop entry structure printing from print_nexthop,
effectively splitting it into parse and print parts.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:32 -06:00
Nikolay Aleksandrov 945c26db68 ip: nexthop: parse attributes into nh entry structure before printing
Factor out the nexthop attribute parsing and parse attributes into a
nexthop entry structure which is then used to print.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:28 -06:00
Nikolay Aleksandrov 7ec1cee630 ip: nexthop: add nh entry structure
Add a structure which describes a nexthop, it will be later used to
parse, print and cache nexthops.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:24 -06:00
Nikolay Aleksandrov 60a7515b89 ip: nexthop: split print_nh_res_group into parse and print parts
Now that we have resilient group structure split print_nh_res_group into
a parse and print functions, print_nexthop calls the parse function
first to parse the attributes into the structure and then uses the print
function to print the parsed structure.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:19 -06:00
Nikolay Aleksandrov cfb0a8729e ip: nexthop: add resilient group structure
Add a structure which describes a resilient nexthop group. It will be
later used for parsing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:11 -06:00
Nikolay Aleksandrov 371e889da7 ip: export print_rta_gateway version which outputs prepared gateway string
Export a new __print_rta_gateway that takes a prepared gateway string to
print which is also used by print_rta_gateway for consistent format.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:06 -06:00
Nikolay Aleksandrov f72789965e ip: print_rta_if takes ifindex as device argument instead of attribute
We need print_rta_if() to take ifindex directly so later we can use it
with cached converted nexthop objects.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:23:31 -06:00
David Ahern c8c9111a4c Merge branch 'ax.25-netrom-rose' into next
Ralf Baechle  says:

====================

net-tools contain support for these three protocol but are deprecated and
no longer installed by default by many distributions.  Iproute2 otoh has
no support at all and will dump the addresses of these protocols which
actually are pretty human readable as hex numbers:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 88:98:60:a0:92:40:02 brd a2:a6:a8:40:40:40:00
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom 88:98:60:a0:92:40:0a brd 00:00:00:00:00:00:00
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 65:09:33:30:00 brd 00:00:00:00:00

This series adds basic support for the three protocols to print addresses:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 DL0PI-1 brd QST-0
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom DL0PI-5 brd *
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 6509333000 brd 0000000000

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:03:11 -06:00
Ralf Baechle e2cc9840ea ROSE: Print decoded addresses rather than hex numbers.
NETROM is a OSI layer 3 protocol sitting on top of AX.25.  It uses BCD-
encoded 10 digit telephone numbers as addresses.  Without this ip will
print a ROSE addresses like

  link/rose 12:34:56:78:90 brd 00:00:00:00:00

which is readable but ugly.  With this applied it ROSE addresses will be
printed as

  link/rose 1234567890 brd 0000000000

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:51 -06:00
Ralf Baechle 26c5782fab ROSE: Add rose_ntop implementation.
ROSE addresses are ten digit numbers, basically like North American
telephone numbers.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:45 -06:00
Ralf Baechle fd4c1c8168 NETROM: Print decoded addresses rather than hex numbers.
NETROM is an OSI layer 3 protocol sitting on top of AX.25.  It also uses
AX.25 addresses.  Without this commit ip will print NETROM address like

  link/generic 98:92:9c:aa:b0:40:02 brd 00:00:00:00:00:00:00

while with this commit the decoded result

  link/generic LINUX-1 brd *

is much more eye friendly.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:40 -06:00
Ralf Baechle c63b769ad4 NETROM: Add netrom_ntop implementation.
NETROM uses AX.25 addresses so this is a simple wrapper around ax25_ntop1.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:37 -06:00
Ralf Baechle 399ae00af5 AX.25: Print decoded addresses rather than hex numbers.
Before this, ip would have printed the AX.25 address configured for an
AX.25 interface's default addresses as:

  link/ax25 98:92:9c:aa:b0:40:02 brd a2:a6:a8:40:40:40:00

which is pretty unreadable.  With this commit ip will decode AX.25
addresses like

  link/ax25 LINUX-1 brd QST-0

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:34 -06:00
Ralf Baechle 3a92669b3a AX.25: Add ax25_ntop implementation.
AX.25 addresses are based on Amateur radio callsigns followed by an SSID
like XXXXXX-SS where the callsign is up to 6 characters which are either
letters or digits and the SSID is a decimal number in the range 0..15.
Amateur radio callsigns are assigned by a country's relevant authorities
and are 3..6 characters though a few countries have assigned callsigns
longer than that.  AX.25 is not able to handle such longer callsigns.

Being based on HDLC AX.25 encodes addresses by shifting them one bit left
thus zeroing bit 0, the HDLC extension bit for all but the last bit of
a packet's address field but for our purposes here we're not considering
the HDLC extension bit that is it will always be zero.

Linux' internal representation of AX.25 addresses in Linux is very similar
to this on the on-air or on-the-wire format.  The callsign is padded to
6 octets by adding spaces, followed by the SSID octet then all 7 octets
are left-shifted by one byte.

This for example turns "LINUX-1" where the callsign is LINUX and SSID is 1
into 98:92:9c:aa:b0:40:02.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:30 -06:00
Andrea Claudi 2f5825cb38 lib: bpf_legacy: fix bpffs mount when /sys/fs/bpf exists
bpf selftests using iproute2 fails with:

$ ip link set dev veth0 xdp object ../bpf/xdp_dummy.o section xdp_dummy
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program

This happens when the /sys/fs/bpf directory exists. In this case, mkdir
in bpf_mnt_check_target() fails with errno == EEXIST, and the function
returns -1. Thus bpf_get_work_dir() does not call bpf_mnt_fs() and the
bpffs is not mounted.

Fix this in bpf_mnt_check_target(), returning 0 when the mountpoint
exists.

Fixes: d4fcdbbec9 ("lib/bpf: Fix and simplify bpf_mnt_check_target()")
Reported-by: Mingyu Shi <mshi@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:30:52 -07:00
Puneet Sharma d756c08a3d tc/f_flower: fix port range parsing
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.

$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:28:48 -07:00
Gokul Sivakumar ebbb701714 lib: bpf_legacy: add prog name, load time, uid and btf id in prog info dump
The BPF program name is included when dumping the BPF program info and the
kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes for the program
name.

$ sudo ip link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited

The BPF program load time (ns since boottime), UID of the user who loaded
the program and the BTF ID are also included when dumping the BPF program
information when the user expects a detailed ip link info output.

$ sudo ip -details link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filt
ering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:4c:df:a4:54 designated_root 8000.2:42:4c:df:a4:54 root_port 0 r
oot_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_chan
ge_timer    0.00 gc_timer  265.36 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask
0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast
_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_
interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query
_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_v
ersion 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues
1 gso_max_size 65536 gso_max_segs 65535
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited load_time 2676682607316255 created_by_uid 0 btf_id 708

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-21 09:16:32 -06:00
David Ahern 75c5054e7a Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-14 10:46:48 -06:00
Stephen Hemminger 92e32f7791 uapi: updates from 5.15-rc1
Small changes to virtio etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-13 15:07:58 -07:00
Lahav Schlesinger 0431e1e724 ip: Support filter links/neighs with no master
Commit d3432bf10f17 ("net: Support filtering interfaces on no master")
in the kernel added support for filtering interfaces/neighbours that
have no master interface.

This patch completes it and adds this support to iproute2:
1. ip link show nomaster
2. ip address show nomaster
3. ip neighbour {show | flush} nomaster

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:17:18 -06:00
Lennert Buytenhek 12b3d6a2ad man: ip-macsec: fix gcm-aes-256 formatting issue
The 'ip link add' invocation template at the top of the ip-macsec man
page formats with a pair of extra double quotes:

   ip  link  add  link DEVICE name NAME type macsec [ [ address <lladdr> ]
   port PORT | sci <u64> ]  [  cipher  {  default  |  gcm-aes-128  |  gcm-
   aes-256"}][" icvlen ICVLEN ] [ encrypt { on | off } ] [ send_sci { on |

This is due to missing whitespace around the gcm-aes-256 identifier
in the source file.

Fixes: b16f525323 ("Add support for configuring MACsec gcm-aes-256 cipher type.")
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:13:26 -06:00
David Ahern 917d913b2e Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-08 15:13:49 -06:00
David Ahern d0cba0d1f6 Merge branch 'bridge-mcast_router' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan port/bridge multicast router option. It is
similar to the already existing bridge-wide mcast_router control. Patch 01
moves attribute adding and parsing together for vlan option setting,
similar to global vlan option setting. It simplifies adding new options
because we can avoid reserved values and additional checks. Patch 02
adds the new mcast_router option and updates the related man page.

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Note that this set depends on the latest kernel uapi headers.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:03:58 -06:00
Nikolay Aleksandrov ae895504c6 bridge: vlan: add support for mcast_router option
Add support for setting and dumping per-vlan/interface mcast_router
option. It controls the mcast router mode of a vlan/interface pair.
For bridge devices only modes 0 - 2 are allowed. The possible modes
are:
 0 - disabled
 1 - automatic router presence detection (default)
 2 - permanent router
 3 - temporary router (available only for ports)

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
Nikolay Aleksandrov 12fbe3e4eb bridge: vlan: set vlan option attributes while parsing
Set vlan option attributes immediately while parsing to simplify the
checks, avoid having reserved values (e.g. -1 for unset var) and have
more limited scope for the variables. This is also similar to how global
vlan options are set. The attribute setting and checks are moved with
option parsing, no functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
David Ahern db28c944d8 Update kernel headers
Update kernel headers to commit:
    27151f177827 ("Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:59:38 -06:00
Stephen Hemminger 6d676ad934 ip: rewrite routel in python
Not sure if anyone uses the routel script. The script was
a combination of ip route, shell and awk doing command scraping.
It is now possible to do this much better using the JSON
output formats and python.

Rewriting also fixes the bug where the old script could not parse
the current output format.  At the end was getting:
/usr/bin/routel: 48: shift: can't shift that many

The new script also has IPv6 as option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:24 -06:00
Stephen Hemminger 1eaebad2c5 ip: remove routef script
This script is old and limited to IPv4.
Using ip route command directly is better option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:23 -06:00
Stephen Hemminger adddf30cd8 ip: remove ifcfg script
This script was from olden days of ifcfg.
I don't see any distribution using it and it is time to put
it out to pasture.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:19 -06:00
Stephen Hemminger 2c8110881b ip: remove old rtpr script
This script was a one off hack for a special case.
Now that ip commands have better formatting, there is no
real reason for it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:30:36 -06:00
David Marchand e7e0e2ce65 iptuntap: fix multi-queue flag display
When creating a tap with multi_queue flag, this flag is not displayed
when dumping:

$ ip tuntap add tap23 mode tap multi_queue
$ ip tuntap
tap23: tap persist0x100

While at it, add a space between known flags and hexdump of unknown
ones.

Fixes: c41e038f48 ("iptuntap: allow creation of multi-queue tun/tap device")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:41:17 -07:00
Nikolay Aleksandrov deef844b1e man: ip-link: remove double of
Remove double "of".

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:40:36 -07:00
Luca Boccassi a3272b9372 configure: restore backward compatibility
Commit a9c3d70d90 broke backward compatibility
by making 'configure' error out if parameters are passed, instead of
ignoring them.
Sometimes packaging systems detect 'configure' and assume it's from
autotools, and pass a bunch of options. Eg:

 dh_auto_configure
	./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking

Ignore unknown options again instead of erroring out.

Fixes: a9c3d70d90 ("configure: add options ability")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Luca Boccassi ceba59308d tree-wide: fix some typos found by Lintian
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Stephen Hemminger 7a70524270 ip: remove leftovers from IPX and DECnet
Iproute2 has not supported DECnet or IPX since version 5.0.
There were some leftover support in the ip options flags
and parsing, remove these.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:03:53 -07:00
Stephen Hemminger 8ab1834e56 uapi: update headers from 5.15 merge
New headers from 5.15 early merge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:02:50 -07:00
Hangbin Liu 6d0d35bab9 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-09-01 12:51:44 -07:00
David Ahern 926ad64104 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Ilya Dmitrichenko c730bd0b11 ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman df8912ede2 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Peilin Ye 7e7270bb1f tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 86c596ed91 IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 2d83c71082 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman f0b3808afa Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern acbdef9386 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern 2d6fa30bb8 Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Gokul Sivakumar 508ad89c82 ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern fb843668fb Merge branch 'bridge-vlan-global-mcast' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan multicast options. The feature is
globally controlled by a new bridge option called mcast_vlan_snooping
which is added by patch 01. Then patches 2-5 add support for dumping
global vlan options and filtering on vlan id. Patch 06 adds support for
setting global vlan options and then patches 07-18 add all the new
global vlan options, finally patch 19 adds support for dumping vlan
multicast router ports. These options are identical in meaning, names and
functionality as the bridge-wide ones.

All the new vlan global commands are under the global keyword:
 $ bridge vlan global show [ vid VID dev DEVICE ]
 $ bridge vlan global set vid VID dev DEVICE ...

I've added command examples in each commit message. The patch-set is a
bit bigger but the global options follow the same pattern so I don't see
a point in breaking them. All man page descriptions have been taken from
the same current bridge-wide mcast options. The only additional iproute2
change which is left to do is the per-vlan mcast router control which
I'll send separately. Note to properly use this set you'll need the
updated kernel headers where mcast router was moved from a global option
to per-vlan/per-device one (changed uapi enum which was in net-next).

Example:
 # enable vlan mcast snooping globally
 $ ip link set dev bridge type bridge mcast_vlan_snooping 1
 # enable mcast querier on vlan 100
 $ bridge vlan global set dev bridge vid 100 mcast_querier 1
 # show vlan 100's global options
 $ bridge -s vlan global show vid 100
port              vlan-id
bridge            100
                    mcast_snooping 1 mcast_querier 1 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000

A following kernel patch-set will add selftests which use these commands.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:32:31 -06:00
Nikolay Aleksandrov 72222cd467 bridge: vlan: add support for dumping router ports
Add dump support for vlan multicast router ports and their details if
requested. If details are requested we print 1 entry per line, otherwise
we print all router ports on a single line similar to how mdb prints
them.

Looks like:
$ bridge vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20 ens16

Looks like (with -s):
 $ bridge -s vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20   187.57 temp
                                   ens16   118.27 temp

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:31 -06:00
Nikolay Aleksandrov 7ad5505bb5 bridge: vlan: add global mcast_querier option
Add control and dump support for the global mcast_querier option which
controls if the bridge will act as a multicast querier for that vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_querier 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:26 -06:00
Nikolay Aleksandrov 061da2e222 bridge: vlan: add global mcast_startup_query_interval option
Add control and dump support for the global mcast_startup_query_interval
option which controls the interval between queries in the startup phase.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when setting
it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_interval 15000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:12 -06:00
Nikolay Aleksandrov 60dcd5c318 bridge: vlan: add global mcast_query_response_interval option
Add control and dump support for the global mcast_query_response_interval
option which sets the Max Response Time/Maximum Response Delay for IGMP/MLD
queries sent by the bridge. To be consistent with the same bridge-wide
option the value is reported with USER_HZ granularity and the same
granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_response_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:47 -06:00
Nikolay Aleksandrov 0e4cfa0370 bridge: vlan: add global mcast_query_interval option
Add control and dump support for the global mcast_query_interval
option which controls the interval between queries sent by the bridge
after the end of the startup phase. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:29 -06:00
Nikolay Aleksandrov ebcee09ca1 bridge: vlan: add global mcast_querier_interval option
Add control and dump support for the global mcast_querier_interval
option which controls the interval after which if no other router
queries are seen the bridge will start sending its own queries.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when
setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_querier_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:12 -06:00
Nikolay Aleksandrov 3ae784f589 bridge: vlan: add global mcast_membership_interval option
Add control and dump support for the global mcast_membership_interval
option which controls the interval after which the bridge will leave a
group if no reports have been received for it. To be consistent with the
same bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 26000 (260 seconds).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_membership_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:56 -06:00
Nikolay Aleksandrov 2b6cc38d52 bridge: vlan: add global mcast_last_member_interval option
Add control and dump support for the global mcast_last_member_interval
option which controls the interval between queries to find remaining
members of a group after a leave message. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 100 (1 second).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_interval 200

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:45 -06:00
Nikolay Aleksandrov 7cc7dbf447 bridge: vlan: add global mcast_startup_query_count option
Add control and dump support for the global mcast_startup_query_count
option which controls the number of queries the bridge will send on the
vlan during startup phase (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_count 5

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:28 -06:00
Nikolay Aleksandrov 3399c0759f bridge: vlan: add global mcast_last_member_count option
Add control and dump support for the global mcast_last_member_count option
which controls the number of queries the bridge will send on the vlan after
a leave is received (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_count 10

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:26:07 -06:00
Nikolay Aleksandrov a8d7212a4f bridge: vlan: add global mcast_mld_version option
Add control and dump support for the global mcast_mld_version option
which controls the MLD version on the vlan (default 1).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_mld_version 2

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:25:17 -06:00
Nikolay Aleksandrov 29fada0f41 bridge: vlan: add global mcast_igmp_version option
Add control and dump support for the global mcast_igmp_version option
which controls the IGMP version on the vlan (default 2).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_igmp_version 3

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:24:09 -06:00
Nikolay Aleksandrov 1f608d590c bridge: vlan: add global mcast_snooping option
Add control and dump support for the global mcast_snooping option which
controls if multicast snooping is enabled or disabled for a single vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_snooping 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:23:26 -06:00
Nikolay Aleksandrov dee5eb05e5 bridge: vlan: add support to set global vlan options
Add support to change global vlan options via a new vlan global
set subcommand similar to the current vlan set subcommand. The man page
and help are updated accordingly. The command works only with bridge
devices. It doesn't support any options yet.

Syntax: $ bridge vlan global set vid VID dev DEV

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:13 -06:00
Nikolay Aleksandrov ecf6d8b4a1 bridge: vlan: add support for vlan filtering when dumping options
In order to allow vlan filtering when dumping options we need to move
all print operations into the option dumping functions and add the
filtering after we've parsed the nested attributes so we can extract the
start and end vlan ids.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:09 -06:00
Nikolay Aleksandrov 720f8613bd bridge: vlan: add support to show global vlan options
Add support for new bridge vlan command grouping called global which
operates on global options. The first command it supports is "show".
To do that we update print_vlan_rtm to recognize the global vlan options
attribute and parse it properly.
Man page and help are also updated with the new command.

Syntax is: $ bridge vlan global show [ vid VID ] [ dev DEV ]

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:04 -06:00
Nikolay Aleksandrov d3a961a9b1 bridge: vlan: skip unknown attributes when printing options
Skip unknown attributes when printing vlan options in print_vlan_rtm.
Make sure print_vlan_opts doesn't accept attributes it doesn't understand.
Currently we print only one type, later global vlan options support will
be added.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:00 -06:00
Nikolay Aleksandrov 312e22fe79 bridge: vlan: factor out vlan option printing
Factor out the code which prints current per-vlan options from
print_vlan_rtm without any changes, later we'll filter based on the vlan
attribute and add support for global vlan option printing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:53 -06:00
Nikolay Aleksandrov d2eecb9d1d ip: bridge: add support for mcast_vlan_snooping
Add support for mcast_vlan_snooping option which controls per-vlan
multicast snooping, also update the man page.
Syntax: $ ip link set dev bridge type bridge mcast_vlan_snooping 0/1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:03 -06:00
Stephen Hemminger 169f36a0c9 v5.14.0 2021-08-31 11:57:59 -07:00
Jakub Kicinski 85b0e73c77 ss: fix fallback to procfs for raw sockets
Jonas reports that ss -awp does not display any RAW sockets
on a Knoppix 4.4 kernel.

sockdiag_send() diverts to tcpdiag_send() to try the older
netlink interface. tcpdiag_send() works for TCP and DCCP
but not other protocols. Instead of rejecting unsupported
protocols (and missing RAW and SCTP) match on supported ones.

Link: https://lore.kernel.org/netdev/20210815231738.7b42bad4@mmluhan/
Reported-and-tested-by: Jonas Bechtel <post@jbechtel.de>
Fixes: 41fe6c34de ("ss: Add inet raw sockets information gathering via netlink diag interface")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 15:03:46 -07:00
Stephen Hemminger 1afde09498 uapi: update neighbour.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:09:34 -07:00
Gokul Sivakumar 10ecd12690 man: bridge: fix the typo to change "-c[lor]" into "-c[olor]" in man page
Fixes: 3a1ca9a5b ("bridge: update man page for new color and json changes")
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 057d3c6d37 bridge: fdb: don't colorize the "dev" & "dst" keywords in "bridge -c fdb"
To be consistent with the colorized output of "ip" command and to increase
readability, stop highlighting the "dev" & "dst" keywords in the colorized
output of "bridge -c fdb" cmd.

Example: in the following "bridge -c fdb" entry, only "00:00:00:00:00:00",
"vxlan100" and "2001:db8:2::1" fields should be highlighted in color.

00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1 self permanent

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 82149efee9 bridge: reorder cmd line arg parsing to let "-c" detected as "color" option
As per the man/man8/bridge.8 page, the shorthand cmd line arg "-c" can be
used to colorize the bridge cmd output. But while parsing the args in while
loop, matches() detects "-c" as "-compressedvlans" instead of "-color", so
fix this by doing the check for "-color" option first before checking for
"-compressedvlans".

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Hangbin Liu 3a09567f7d ip/bond: add arp_validate filter support
Add arp_validate filter support based on kernel commit 896149ff1b2c
("bonding: extend arp_validate to be able to receive unvalidated arp-only traffic")

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:02:44 -07:00
Parav Pandit 355c49ffa5 devlink: Show port state values in man page and in the help command
Port function state can have either of the two values - active or
inactive. Update the documentation and help command for these two
values to tell user about it.

With the introduction of state, hw_addr and state are optional.
Hence mark them as optional in man page that also aligns with the help
command output.

Fixes: bdfb9f1bd6 ("devlink: Support set of port function state")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-11 15:02:30 -07:00
Hangbin Liu ebaa603b30 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-08-11 12:26:20 -06:00
David Ahern 8d6134b204 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:23:33 -06:00
Ilya Dmitrichenko 51d8fc708c ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:17:52 -06:00
Justin Iurman 71ba9c18e0 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:16:09 -06:00
Phil Sutter 9b7ea92b9e tc: u32: Fix key folding in sample option
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:02:43 -07:00
Andrea Claudi d1eacf12b5 lib: bpf_glue: remove useless assignment
The value of s used inside the cycle is the result of strstr(), so this
assignment is useless.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:01:54 -07:00
Andrea Claudi 50a4127022 lib: bpf_legacy: fix potential NULL-pointer dereference
If bpf_map_fetch_name() returns NULL, strlen() hits a NULL-pointer
dereference on outer_map_name.

Fix this checking outer_map_name value, and returning false when NULL,
as already done for inner_map_name before.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:55:12 -07:00
Jacob Keller 954a0077c8 devlink: fix infinite loop on flash update for drivers without status
When processing device flash update, cmd_dev_flash function waits until
the flash process has completed. This requires the following two
conditions to both be true:

a) we've received an exit status from the child process
b) we've received the DEVLINK_CMD_FLASH_UPDATE_END *or*
   we haven't received any status notifications from the driver.

The original devlink flash status monitoring code in 9b13cddfe2
("devlink: implement flash status monitoring") was written assuming that
a driver will either send no status updates, or it will send at least
one DEVLINK_CMD_FLASH_UPDATE_STATUS before DEVLINK_CMD_FLASH_UPDATE_END.

Newer versions of the kernel since commit 52cc5f3a166a ("devlink: move flash
end and begin to core devlink") in v5.10 moved handling of the
DEVLINK_CMD_FLASH_UPDATE_END into the core stack, and will send this
regardless of whether or not the driver sends any of its own status
notifications.

The handling of DEVLINK_CMD_FLASH_UPDATE_END in cmd_dev_flash_status_cb
has an additional condition that it must not be the first message.
Otherwise, it falls back to treating it like
a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is wrong because it can lead to an infinite loop if a driver does
not send any status updates.

In this case, the kernel will send DEVLINK_CMD_FLASH_UPDATE_END without
any DEVLINK_CMD_FLASH_UPDATE_STATUS. The devlink application will see
that ctx->not_first is false, and will treat this like any other status
message. Thus, ctx->not_first will be set to 1.

The loop condition to exit flash update will thus never be true, since
we will wait forever, because ctx->not_first is true, and
ctx->received_end is false.

This leads to the application appearing to process the flash update, but
it will never exit.

Fix this by simply always treating DEVLINK_CMD_FLASH_UPDATE_END the same
regardless of whether its the first message or not.

This is obviously the correct thing to do: once we've received the
DEVLINK_CMD_FLASH_UPDATE_END the flash update must be finished. For new
kernels this is always true, because we send this message in the core
stack after the driver flash update routine finishes.

For older kernels, some drivers may not have sent any
DEVLINK_CMD_FLASH_UPDATE_STATUS or DEVLINK_CMD_FLASH_UPDATE_END. This is
handled by the while loop conditional that exits if we get a return
value from the child process without having received any status
notifications.

An argument could be made that we should exit immediately when we get
either the DEVLINK_CMD_FLASH_UPDATE_END or an exit code from the child
process. However, at a minimum it makes no sense to ever process
DEVLINK_CMD_FLASH_UPDATE_END as if it were a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is easy to test as it is triggered by the selftests for the
netdevsim driver, which has a test case for both with and without status
notifications.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:54:39 -07:00
Feng Zhou be99929d60 lib/bpf: Fix btf_load error lead to enable debug log
Use tc with no verbose, when bpf_btf_attach fail,
the conditions:
"if (fd < 0 && (errno == ENOSPC || !ctx->log_size))"
will make ctx->log_size != 0. And then, bpf_prog_attach,
ctx->log_size != 0. so enable debug log.
The verifier log sometimes is so chatty on larger programs.
bpf_prog_attach is failed.
"Log buffer too small to dump verifier log 16777215 bytes (9 tries)!"

BTF load failure does not affect prog load. prog still work.
So when BTF/PROG load fail, enlarge log_size and re-fail with
having verbose.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:53:54 -07:00
Peilin Ye e78411948d tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-08 11:56:55 -06:00
David Ahern 09d8ce3db1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-04 09:24:12 -06:00
David Ahern e8763fc9ab Merge branch 'ipv6-oam' into next
Justin Iurman says:

====================

The IOAM patchset was merged recently (see net-next commits [1,2,3,4,5,6]).
Therefore, this patchset provides support for IOAM inside iproute2, as well as
manpage documentation. Here is a summary of added features inside iproute2.

(1) configure IOAM namespaces and schemas:

$ ip ioam
Usage:  ip ioam { COMMAND | help }
        ip ioam namespace show
        ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
        ip ioam namespace del ID
        ip ioam schema show
        ip ioam schema add ID DATA
        ip ioam schema del ID
        ip ioam namespace set ID schema { ID | none }

(2) provide a new encap type to insert the IOAM pre-allocated trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

  [1] db67f219fc9365a0c456666ed7c134d43ab0be8a
  [2] 9ee11f0fff205b4b3df9750bff5e94f97c71b6a0
  [3] 8c6f6fa6772696be0c047a711858084b38763728
  [4] 3edede08ff37c6a9370510508d5eeb54890baf47
  [5] de8e80a54c96d2b75377e0e5319a64d32c88c690
  [6] 968691c777af78d2daa2ee87cfaeeae825255a58

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:34:09 -06:00
Justin Iurman 78832863ef IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:35 -06:00
Justin Iurman 32f4969d44 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:31 -06:00
Justin Iurman 2909812583 Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:05 -06:00
David Ahern e53f4cd504 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:32:26 -06:00
David Ahern 236696e52c Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:25:09 -06:00
Gokul Sivakumar cf866f0a5a ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:14:50 -06:00
Peilin Ye c06d313d86 tc/skbmod: Remove misinformation about the swap action
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."

This is false.  In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored.  As an example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		matchall action skbmod \
        	set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
        	swap mac

The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's.  The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():

	parm = nla_data(tb[TCA_SKBMOD_PARMS]);
	index = parm->index;
	if (parm->flags & SKBMOD_F_SWAPMAC)
		lflags = SKBMOD_F_SWAPMAC;
		^^^^^^^^^^^^^^^^^^^^^^^^^^

Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap".  Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.

If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-22 15:14:29 -07:00
Roi Dayan 71d36000dc police: Fix normal output back to what it was
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.

Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:14:30 -07:00
Lahav Schlesinger f760bff328 ipmonitor: Fix recvmsg with ancillary data
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().

This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.

This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.

Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:13:36 -07:00
Stephen Hemminger 7a7e9ed98f uapi: headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:12:47 -07:00
Christian Schürmann 1f2c908d53 man8/ip-tunnel.8: fix typo, 'encaplim' is not a valid option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-15 09:31:51 -07:00
Alexander Mikhalitsyn 115e987035 libnetlink: check error handler is present before a call
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.

Fixes: 459ce6e3d7 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-11 10:33:44 -07:00
Stephen Hemminger 0015ada629 libnetlink: cosmetic changes
Don't initialize arguments that are NULL, and format initialization
in a more logical way.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:39:07 -07:00
Alexander Mikhalitsyn 459ce6e3d7 ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".

Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated

Expected result here is to get empty dump file (as it was before this
change).

v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.

v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).

v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions

v5: space fixes, commit message reformat, empty initializers

Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:32:56 -07:00
Stephen Hemminger 8f85d085fe uapi: update kernel headers from 5.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 17:07:24 -07:00
Martynas Pumputis 83d4d61bc9 libbpf: fix attach of prog with multiple sections
When BPF programs which consists of multiple executable sections via
iproute2+libbpf (configured with LIBBPF_FORCE=on), we noticed that a
wrong section can be attached to a device. E.g.:

    # tc qdisc replace dev lxc_health clsact
    # tc filter replace dev lxc_health ingress prio 1 \
        handle 1 bpf da obj bpf_lxc.o sec from-container
    # tc filter show dev lxc_health ingress filter protocol all
        pref 1 bpf chain 0 filter protocol all pref 1 bpf chain 0
        handle 0x1 bpf_lxc.o:[__send_drop_notify] <-- WRONG SECTION
        direct-action not_in_hw id 38 tag 7d891814eda6809e jited

After taking a closer look into load_bpf_object() in lib/bpf_libbpf.c,
we noticed that the filter used in the program iterator does not check
whether a program section name matches a requested section name
(cfg->section). This can lead to a wrong prog FD being used to attach
the program.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 16:59:39 -07:00
David Ahern 02c06ffc13 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-07-01 14:29:42 +00:00
Stephen Hemminger fc3511962d lib: remove blank line at eof
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 13:20:44 -07:00
Stephen Hemminger 0e7ea3e8fe v5.13.0 2021-06-29 11:24:17 -07:00
Ben Hutchings 33cf9306c8 devlink: Fix printf() type mismatches on 32-bit architectures
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.

Fix this by using the standard macro PRIu64 instead.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Ben Hutchings 4ac0383a59 utils: Fix BIT() to support up to 64 bits on all architectures
devlink and vdpa use BIT() together with 64-bit flag fields.  devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.

Fix this by making BIT() use uint64_t instead of unsigned long.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Stephen Hemminger c73fb66070 uapi: update headers to 5.13
Final 5.13 header update

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-28 10:19:08 -07:00
Roi Dayan 6f15c21719 devlink: Fix link errors on some systems
On some systems we fail to link because of missing math lib.
add -lm to devlink.

    LINK     devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2

Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-26 14:57:27 -07:00
Asbjørn Sloth Tønnesen 2ff4761db4 tc: pedit: add decrement operation
Implement a decrement operation for ttl and hoplimit.

Since this is just syntactic sugar, it goes that:

  tc filter add ... action pedit ex munge ip ttl dec ...
  tc filter add ... action pedit ex munge ip6 hoplimit dec ...

is just a more readable version of this:

  tc filter add ... action pedit ex munge ip ttl add 0xff ...
  tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...

This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.

Tested with skip_sw on Mellanox ConnectX-6 Dx.

[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989

v3:
   - Use dedicated flags argument in parse_cmd() (David Ahern)
   - Minor rewording of the man page

v2:
   - Fix whitespace issue (Stephen Hemminger)
   - Add to usage info in explain()

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:45:19 +00:00
Asbjørn Sloth Tønnesen bc5e8473aa tc: pedit: parse_cmd: add flags argument
This patch just prepares the flags argument, so it's
available to the next patch.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:44:35 +00:00
Sergey Ryazanov 6acccd52a2 iplink: support for WWAN devices
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.

The WWAN interface for a new data channel should be created with a
command like this:

ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2

Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:57 +00:00
Sergey Ryazanov 362da458a4 iplink: add support for parent device
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".

Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:22 +00:00
David Ahern 083e2706e1 Import wwan.h uapi file
Import wwan.h uapi file at version from last kernel headers sync.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:39:47 +00:00
Stephen Hemminger 8316825a52 man: fix syntax for ip link property
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-24 11:54:04 -07:00
Paolo Lungaroni 3e26254f31 seg6: add support for SRv6 End.DT46 Behavior
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.

As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.

To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:

 $ sysctl -wq net.vrf.strict_mode=1

Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.

An SRv6 End.DT46 Behavior instance can be created as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6local",
        "action": "End.DT46",
        "vrftable": 100,
        "dev": "vrf100",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:36:17 +00:00
David Ahern 1d11326a57 Update kernel headers
Update kernel headers to commit:
    ef2c3ddaa4ed ("ibmvnic: Use strscpy() instead of strncpy()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:33:45 +00:00
Guillaume Nault f8879e85f0 utils: bump max args number to 512 for batch files
Large tc filters can have many arguments. For example the following
filter matches the first 7 MPLS LSEs, pops all of them, then updates
the Ethernet header and redirects the resulting packet to eth1.

filter add dev eth0 ingress handle 44 priority 100 \
  protocol mpls_uc flower mpls                     \
    lse depth 1 label 1040076 tc 4 bos 0 ttl 175   \
    lse depth 2 label 89648 tc 2 bos 0 ttl 9       \
    lse depth 3 label 63417 tc 5 bos 0 ttl 185     \
    lse depth 4 label 593135 tc 5 bos 0 ttl 67     \
    lse depth 5 label 857021 tc 0 bos 0 ttl 181    \
    lse depth 6 label 239239 tc 1 bos 0 ttl 254    \
    lse depth 7 label 30 tc 7 bos 1 ttl 237        \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol ipv6 pipe               \
  action vlan pop_eth pipe                         \
  action vlan push_eth                             \
    dst_mac 00:00:5e:00:53:7e                      \
    src_mac 00:00:5e:00:53:03 pipe                 \
  action mirred egress redirect dev eth1

This filter has 149 arguments, so it can't be used with tc -batch
which is limited to a 100.

Let's bump the limit to 512. That should leave a lot of room for big
batch commands.

v2:
   -Define the limit in utils.h (Stephen Hemminger)
   -Bump the limit even higher (256 -> 512) (Stephen Hemminger)

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-18 02:57:05 +00:00
Stephen Hemminger e1d3ac755d uapi: update kernel headers to 5.13-rc6
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-17 15:54:05 -07:00
David Ahern d8b3b9d32d Merge branch 'devlink-rate-support' into next
Dmytro Linkin says:

====================

Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object

Examples:

Display all rate objects:

    # devlink port function rate show
    pci/0000:03:00.0/1 type leaf parent some_group
    pci/0000:03:00.0/2 type leaf tx_share 12Mbit
    pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps

Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:

    # devlink port function rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/1 type leaf

Display node rate object with name some_group of the pci/0000:03:00.0
device:

    # devlink port function rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group type node

Display leaf rate object rate values using IEC units:

    # devlink -i port function rate show pci/0000:03:00.0/2
    pci/0000:03:00.0/2 type leaf 11718Kibit

Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:

    # devlink -jp port function rate show pci/0000:03:00.0/2
    {
        "rate": {
            "pci/0000:03:00.0/2": {
                "type": "leaf",
                "tx_share": 1500000
            }
        }
    }

Create node rate object with name "1st_group" on pci/0000:03:00.0 device:

    # devlink port function rate add pci/0000:03:00.0/1st_group

Create node rate object with specified parameters:

    # devlink port function rate add pci/0000:03:00.0/2nd_group \
        tx_share 10Mbit tx_max 30Mbit parent 1st_group

Set parameters to the specified leaf rate object:

    # devlink port function rate set pci/0000:03:00.0/1 \
        tx_share 2Mbit tx_max 10Mbit

Set leaf's parent to "1st_group":

    # devlink port function rate set pci/0000:03:00.0/1 parent 1st_group

Unset leaf's parent:

    # devlink port function rate set pci/0000:03:00.0/1 noparent

Delete node rate object:

    # devlink port function rate del pci/0000:03:00.0/2nd_group

Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:34 +00:00
Dmytro Linkin dedf895184 devlink: Add ISO/IEC switch
Add -i/--iec switch to print rate values using binary prefixes.
Update devlink(8) and devlink-rate(8) pages.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:13 +00:00
Dmytro Linkin 6c70aca76e devlink: Add port func rate support
Implement user commands to manage devlink port func rate objects.
List all rate commands:

    $ devlink port func rate help

or just

    $ devlink port func rate

To list all OR particular rate object:

    $ devlink port func rate show
    pci/0000:03:00.0/some_group: type node
    pci/0000:03:00.0/0: type leaf
    pci/0000:03:00.0/1: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/0: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group: type node

Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:

    $ devlink port func rate add pci/0000:03:00.0/some_group

or with defining tx rate limits

    $ devlink port func rate add pci/0000:03:00.0/some_group \
        tx_shara 10kbit tx_max 100mbit

NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.

To delete node object:

    $ devlink port func rate del pci/0000:03:00.0/some_group

Set rate limits of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 \
        tx_share 5MBps tx_max 25GBps

    $ devlink prot func rate set pci/0000:03:00.0/some_group \
        tx_share 0

Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.

NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.

NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.

Set or unset parent of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 parent some_group

    $ devlink port func rate set pci/0000:03:00.0/0 noparent

NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:06 +00:00
Dmytro Linkin 95339955c5 devlink: Add helper function to validate object handler
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:37:21 +00:00
David Ahern 85903c9a29 Update kernel headers
Update kernel headers to commit:
    76cf404c40ae ("Merge branch 'ipa-mem-2'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:38:23 +00:00
Parav Pandit fbd4b581cb devlink: Add optional controller user input
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.

An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev

$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:49 +00:00
Roi Dayan 0d5cf51e0d police: Add support for json output
Change to use the print wrappers instead of fprintf().

This is example output of the options part before this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1 police 0x2 ,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    }overhead 0b linklayer unspec
        ref 1 bind 1
,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

This is the output of the same dump with this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1,
                    "kind": "police",
                    "index": 2,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    },
                    "overhead": 0,
                    "linklayer": "unspec",
                    "ref": 1,
                    "bind": 1,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:36 +00:00
Eric Dumazet 52f136f640 tc: fq: add horizon attributes
Commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
added kernel support for horizon attributes in linux-5.8

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
 Sent 690924 bytes 3234 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 112 (inactive 104 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

$ tc qd change dev wlp2s0 root fq horizon 500ms horizon_cap

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 500ms horizon_cap
 Sent 831220 bytes 3844 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 122 (inactive 120 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-07 02:56:01 +00:00
Hangbin Liu 7ae2585b86 configure: convert LIBBPF environment variables to command-line options
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:59 +00:00
Hangbin Liu a9c3d70d90 configure: add options ability
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.

This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:11 +00:00
Roman Mashak 9d9b1a84a5 ss: update ss man page
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-01 15:55:06 -07:00
Ariel Levkovich 825bd5dacb tc: f_flower: Add missing ct_state flags to usage description
Add ct_state flags rpl and inv to the commands usage
description

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:40:05 +00:00
Ariel Levkovich 7fda6c588a tc: f_flower: Add option to match on related ct state
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state -est-rel+trk \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +rel+trk \
  action mirred egress redirect dev ens1f0_1

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:39:14 +00:00
Florian Westphal d3740fdc26 libgenl: make genl_add_mcast_grp set errno on error
genl_add_mcast_grp doesn't set errno in all cases.

On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with

    "can't subscribe to mptcp events: Success"

Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.

Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-05-17 11:59:37 -07:00
Heiko Thiery c5b72cc56b lib/fs: fix issue when {name,open}_to_handle_at() is not implemented
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-17 02:31:29 +00:00
David Ahern 62c88ed940 config.mk: Rerun configure when it is newer than config.mk
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
2021-05-17 02:13:56 +00:00
Jakub Kicinski 49437375b6 ip: dynamically size columns when printing stats
This change makes ip -s -s output size the columns
automatically. I often find myself using json
output because the normal output is unreadable.
Even on a laptop after 2 days of uptime byte
and packet counters almost overflow their columns,
let alone a busy server.

For max readability switch to right align.

Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8227918473 8617683  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    691937917  4727223  0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

After:

    RX:  bytes packets errors dropped  missed   mcast
    8228633710 8618408      0       0       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
     692006303 4727740      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0      10

More importantly, with large values before:

    RX: bytes  packets  errors  dropped overrun mcast
    126570234447969 15016149200 0       0       0       0
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    126570234447969 15016149200 0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

Note that in this case we have full shift by a column,
e.g. the value under "dropped" is actually for "errors" etc.

After:

    RX:       bytes     packets errors dropped  missed   mcast
    126570234447969 15016149200      0       0       0       0
    RX errors:           length    crc   frame    fifo overrun
                              0      0       0       0       0
    TX:       bytes     packets errors dropped carrier collsns
    126570234447969 15016149200      0       0       0       0
    TX errors:          aborted   fifo  window heartbt transns
                              0      0       0       0      10

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:51:59 +00:00
Paolo Lungaroni 02ca3aabe9 seg6: add counters support for SRv6 Behaviors
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

v2:
 - add help and route.8 man page updates

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:20:59 +00:00
Andrea Claudi e44786b269 tc: htb: improve burst error messages
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"

The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.

This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"

Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:13:22 +00:00
Andrea Claudi 28ee49e515 tipc: bail out if key is abnormally long
tipc segfaults when called with an abnormally long key:

$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated

Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Andrea Claudi 93c267bfb4 tipc: bail out if algname is abnormally long
tipc segfaults when called with an abnormally long algname:

$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated

Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Hoang Le 459f280813 tipc: call a sub-routine in separate socket
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.

Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Tyson Moore 0d95472a4b tc-cake: update docs to include LE diffserv
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.

Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:59:52 +00:00
Andrea Claudi 2d212aae55 dcb: fix memory leak
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.

Fix this using a goto, as it is already done in the same function.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:48:02 +00:00
Andrea Claudi cfd89a6f8b dcb: fix return value on dcb_cmd_app_show
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:47:57 +00:00
Andrea Claudi 3296d4fe77 lib: bpf_legacy: avoid to pass invalid argument to close()
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.

Avoid this checking for fd to be positive.

Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:43:54 +00:00
Andrea Claudi a2f1f66075 tc: q_ets: drop dead code from argument parsing
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
  parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
  parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
  specified and nbands is 0, terminating execution.

Thus nbands cannot be < 1 at this point and this code cannot be executed.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:42:44 +00:00
Jakub Kicinski 570d2cf0ec ip: align the name of the 'nohandler' stat
Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8848233056 8548168  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun   nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1142925945 4683483  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

After:

    RX: bytes  packets  errors  dropped missed  mcast
    8848297833 8548461  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1143049820 4683865  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:41:19 +00:00
David Ahern c3f852754f Update kernel headers
Update kernel headers to commit:
    8621436671f3 ("smc: disallow TCP_ULP in smc_setsockopt()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:16:04 +00:00
David Ahern c79fcefaaf Merge branch 'rdma-copy-on-fork' into next
Gal Pressman  says:

====================

This is the userspace part for the new copy-on-fork attribute added to
the get sys netlink command.

The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

Kernel series was merged:
https://lore.kernel.org/linux-rdma/20210418121025.66849-1-galpress@amazon.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:45:19 +00:00
Gal Pressman bce4247869 rdma: Add copy-on-fork to get sys command
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

If the attribute is not reported (expected on older kernels),
copy-on-fork is disabled.

Example:
$ rdma sys
netns shared copy-on-fork on

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:13 +00:00
Gal Pressman 212e2c1d0c rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
6cc9e215eb27 ("RDMA/nldev: Add copy-on-fork attribute to get sys command")

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:06 +00:00
Jianguo Wu 7f1d58d1a1 mptcp: make sure flag signal is set when add addr with port
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.

Fixes: 42fbca91cd ("mptcp: add support for port based endpoint")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-30 14:30:24 +00:00
David Ahern e1e089d1f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:48:28 +00:00
Jethro Beekman d56dcd3549 ip: Add nodst option to macvlan type source
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:45:59 +00:00
David Ahern e5f1505e53 Merge branch 'rdma-resource-tracking' into next
Leon Romanovsky  says:

====================

This is the user space part of already accepted to the kernel series
that extends RDMA netlink interface to return uverbs context and SRQ
information.

The accepted kernel series can be seen here:
https://lore.kernel.org/linux-rdma/20210422133459.GA2390260@nvidia.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:32 +00:00
Neta Ostrovsky 9b272e138d rdma: Add SRQ resource tracking information
Sample output:

$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon

$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon

$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon

Reviewed-by: Ido Kalir <idok@nvidia.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:16 +00:00
Neta Ostrovsky 4278941285 rdma: Add context resource tracking information
Sample output:

$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:59 +00:00
Neta Ostrovsky 4c61b5b9df rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit c6c11ad3ab9f
("RDMA/nldev: Add QP numbers to SRQ information")

Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:21 +00:00
David Ahern a5ea744ca2 Update kernel headers
Update kernel headers to commit:
    99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:35:30 +00:00
Stephen Hemminger 2363bc99f9 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Required manual fix of devlink/devlink.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 19:39:39 -07:00
Stephen Hemminger 1fdea28051 v5.12.0 2021-04-27 11:59:09 -07:00
Stephen Hemminger a3fb3fcb7d remove trailing whitespace
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 11:55:53 -07:00
Andrea Claudi e1ad689545 lib: bpf_legacy: fix missing socket close when connect() fails
In functions bpf_{send,recv}_map_fds(), when connect fails after a
socket is successfully opened, we return with error missing a close on
the socket.

Fix this closing the socket if opened and using a single return point
for both the functions.

Fixes: 6256f8c9e4 ("tc, bpf: finalize eBPF support for cls and act front-end")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 92af24c907 lib: bpf_legacy: treat 0 as a valid file descriptor
As stated in the man page(), open returns a non-negative integer as a
file descriptor. Hence, when checking for its return value to be ok, we
should include 0 as a valid value.

This fixes a covscan warning about a missing close() in this function.

Fixes: ecb05c0f99 ("bpf: improve error reporting around tail calls")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 932fe3453f tc: e_bpf: fix memory leak in parse_bpf()
envp_run is dinamically allocated with a malloc, and not freed in the
out: return path. This commit fix it.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 38ef5bb7b4 ip: netns: fix missing netns close on some error paths
In functions netns_pids() and netns_identify_pid(), the netns file is
not closed on some error paths.

Fix this using a conditional close and a single return point on both
functions.

Fixes: 44b563269e ("ip-nexthop: support flush by id")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:04:02 -07:00
Nikolay Aleksandrov c72de3713d bridge: vlan: dump port only if there are any vlans
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).

Example (no vlans):
$ bridge -d vlan show
port              vlan-id

Fixes: e5f87c8341 ("bridge: vlan: add support for the new rtm dump call")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:32:46 +00:00
Tony Ambardar e705b19d48 ip: drop 2-char command assumption
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:

  $ ./ip-full help
  Object "-full" is unknown, try "ip help".

This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).

Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.

Fixes: 351efcde4e ("Update header files to 2.6.14")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:29:42 +00:00
Stephen Hemminger b5a6ed9cc9 uapi: add missing virtio related headers
The build of iproute2 relies on having correct copy of santized
kernel headers. The vdpa utility introduced a dependency on
the vdpa related headers, but these headers were not present
in iproute2 repo.

Fixes: c2ecc82b9d ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-23 10:36:17 -07:00
Andrea Claudi 81bfd01a4c lib: move get_task_name() from rdma
The function get_task_name() is used to get the name of a process from
its pid, and its implementation is similar to ip/iptuntap.c:pid_name().

Move it to lib/fs.c to use a single implementation and make it easily
reusable.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:22:16 +00:00
David Ahern 75a35d50a3 Merge branch 'bridge-vlan' into next
Nikolay Aleksandrov  says:

====================

From: Nikolay Aleksandrov <nikolay@nvidia.com>

This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.

Example:
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state forwarding
 bridge            1 PVID Egress Untagged
                     state forwarding

 $ bridge vlan set vid 1 dev ens13 state blocking
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state blocking
 bridge            1 PVID Egress Untagged
                     state forwarding

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:20:13 +00:00
Nikolay Aleksandrov c311404780 bridge: monitor: add support for vlan monitoring
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:39 +00:00
Nikolay Aleksandrov e5f87c8341 bridge: vlan: add support for the new rtm dump call
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.

Example:
$ bridge -s -d vlan show
port              vlan-id
virbr1            1 PVID Egress Untagged
                    state forwarding

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:34 +00:00
Nikolay Aleksandrov 34c14bea22 libnetlink: add bridge vlan dump request helper
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:29 +00:00
Nikolay Aleksandrov 04e2783d5e bridge: vlan: add option set command and state option
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.

Example:
 $ bridge vlan set vid 10 dev br0 state learning
or a range:
 $ bridge vlan set vid 10-20 dev swp1 state blocking

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:24 +00:00
Nikolay Aleksandrov f2f52fcabe bridge: add parse_stp_state helper
Add a helper which parses an STP state string to its numeric value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:20 +00:00
Nikolay Aleksandrov f07516c3b0 bridge: rename and export print_portstate
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:09 +00:00
Florian Westphal ff619e4fd3 mptcp: add support for event monitoring
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.

While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.

This adds print support for the existing events.

Sample output of 'ip mptcp monitor':
[       CREATED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[   ESTABLISHED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[SF_ESTABLISHED] token=83f3a692 remid=0 locid=1 saddr4=10.0.2.2 daddr4=10.0.1.1 sport=40195 dport=10011 backup=0
[        CLOSED] token=83f3a692

Signed-off-by: Florian Westphal <fw@strlen.de>
2021-04-22 05:10:25 +00:00
David Ahern 98040c2dc1 Update kernel headers
Update kernel headers to commit:
    5d869070569a ("net: phy: marvell: don't use empty switch default case")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:09:39 +00:00
Andrea Claudi c8216fabe8 rdma: stat: fix return code
libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
return codes, and stat_qp_show_parse_cb() should do the same.

Fixes: 16ce4d2366 ("rdma: stat: initialize ret in stat_qp_show_parse_cb()")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-20 18:08:38 -07:00
Andrea Claudi 16ce4d2366 rdma: stat: initialize ret in stat_qp_show_parse_cb()
In the unlikely case in which the mnl_attr_for_each_nested() cycle is
not executed, this function return an uninitialized value.

Fix this initializing ret to 0.

Fixes: 5937552b42 ("rdma: Add "stat qp show" support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6a2c51da99 nexthop: fix memory leak in add_nh_group_attr()
grps is dinamically allocated with a calloc, and not freed in a return
path in the for cycle. This commit fix it.

While at it, make the function use a single return point.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6801ae8273 q_cake: remove useless check on argv
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6b8fa2ea2d devlink: always check strslashrsplit() return value
strslashrsplit() return value is not checked in __dl_argv_handle(),
despite the fact that it can return EINVAL.

This commit fix it and make __dl_argv_handle() return error if
strslashrsplit() return an error code.

Fixes: 2f85a9c535 ("devlink: allow to parse both devlink and port handle in the same time")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Stephen Hemminger cc718c191b uapi: update can.h
Upstream commit to force packing on ARM OABI

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:14:34 -07:00
Stephen Hemminger 06d0bbf1ee erspan: fix JSON output
The format for erspan/erspan6 output is not valid JSON, as on version 2 a
valueless key was presented. The direction should be value and erspan_dir
should be the key.

Fixes: 2897636267 ("erspan: add erspan version II support")
Cc: u9012063@gmail.com
Reported-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-10 09:52:48 -07:00
Chunmei Xu 44b563269e ip-nexthop: support flush by id
since id is unique for nexthop, it is heavy to dump all nexthops.
use existing delete_nexthop to support flush by id

Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-08 15:38:58 +00:00
Hoang Le 2021028306 tipc: use the libmnl functions in lib/mnl_utils.c
To avoid code duplication, tipc should be converted to use the helper
functions for working with libmnl in lib/mnl_utils.c

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-03 01:10:54 +00:00
Stephen Hemminger e77a0d3dc9 uapi: bpf.h update from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-30 16:38:05 -07:00
Baowen Zheng cf9ae1bd31 police: add support for packet-per-second rate limiting
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.

e.g.
 # $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
 # $TC actions ls action police
 total acts 1

	action order 0:  police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
	ref 1 bind 0

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:50 +00:00
Cooper Lees 16430e9afd Add Open/R to rt_protos
- Open Routing is using ID 99 for it's installed routes
- https://github.com/facebook/openr
- Kernel has accepted 99 in `rtnetlink.h`

Signed-of-by: Cooper Lees <me@cooperlees.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:09 +00:00
Petr Machata 7384c15e0e ip: Fix batch processing
After the comment cited below, batch mode neglects to set the global
variable batch_mode to a non-zero value. Netns and VRF commands use this
variable, and break in batch mode. Fix by setting the value again.

Fixes: 1d9a81b8c9 ("Unify batch processing across tools")
Reported-by: Tim Rice <trice@posteo.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-22 16:30:21 -07:00
David Ahern 76bfc185f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:16:01 +00:00
Sabrina Dubroca 3c75135835 ip: xfrm: add support for tfcpad
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:15:07 +00:00
Stephen Hemminger 872689d431 uapi: minor header update for l2tp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:36:07 -07:00
Stephen Hemminger 87d6d395d1 README: remove doc instructions
The out of date documentation was removed in 2017, but the instructions
in the README were not removed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:29:02 -07:00
David Ahern fa505da84b Merge branch 'nexthop-resilient-hash' into next
Petr Machata  says:

====================

Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:05:29 +00:00
Ido Schimmel 2be6d18b30 nexthop: Add support for nexthop buckets
Add ability to dump multiple nexthop buckets and get a specific one.
Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop
 id 1 via 192.0.2.2 dev dummy10 scope link
 id 2 via 192.0.2.19 dev dummy20 scope link
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
 # ip nexthop bucket
 id 10 index 0 idle_time 28.1 nhid 2
 id 10 index 1 idle_time 28.1 nhid 2
 id 10 index 2 idle_time 28.1 nhid 2
 id 10 index 3 idle_time 28.1 nhid 2
 id 10 index 4 idle_time 28.1 nhid 1
 id 10 index 5 idle_time 28.1 nhid 1
 id 10 index 6 idle_time 28.1 nhid 1
 id 10 index 7 idle_time 28.1 nhid 1
 # ip nexthop bucket show nhid 1
 id 10 index 4 idle_time 53.59 nhid 1
 id 10 index 5 idle_time 53.59 nhid 1
 id 10 index 6 idle_time 53.59 nhid 1
 id 10 index 7 idle_time 53.59 nhid 1
 # ip nexthop bucket get id 10 index 5
 id 10 index 5 idle_time 81 nhid 1
 # ip -j -p nexthop bucket get id 10 index 5
 [ {
         "id": 10,
         "bucket": {
             "index": 5,
             "idle_time": 104.89,
             "nhid": 1
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:25 +00:00
Ido Schimmel 9167671822 nexthop: Add support for resilient nexthop groups
Add ability to configure resilient nexthop groups and show their current
configuration. Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop show id 10
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
 # ip -j -p nexthop show id 10
 [ {
         "id": 10,
         "group": [ {
                 "id": 1
             },{
                 "id": 2
             } ],
         "type": "resilient",
         "resilient_args": {
             "buckets": 8,
             "idle_timer": 120,
             "unbalanced_timer": 0
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:18 +00:00
Ido Schimmel b82d6b81fa nexthop: Add ability to specify group type
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.

These two commands are equivalent:

 # ip nexthop add id 10 group 1/2/3
 # ip nexthop add id 10 group 1/2/3 type mpath

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:49 +00:00
Petr Machata 28fb925d8b nexthop: Extract a helper to parse a NH ID
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:43 +00:00
Petr Machata e757f741e9 json_print: Add print_tv()
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:08 +00:00
David Ahern a5b355c08c Update kernel headers
Update kernel headers to commit:
    38cb57602369 ("selftests: net: forwarding: Fix a typo")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 14:59:17 +00:00
Stephen Hemminger 6639fce430 ip: cleanup help message text
Wrap help message text at 80 characters, and put list of things
in alpha order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-18 11:24:06 -07:00
Tony Ambardar 06bee37c1c lib/bpf: add missing limits.h includes
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.

It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:

bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
   46 |  char buf[PATH_MAX], *s;
      |           ^~~~~~~~
      |           AF_MAX

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:53 -07:00
Sabrina Dubroca 6050055387 ip: xfrm: limit the length of the security context name when printing
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.

While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.

Fixes: b2bb289a57 ("xfrm security context support")
Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:28 -07:00
David Ahern 27ca8989c1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-15 15:08:01 +00:00
Toke Høiland-Jørgensen 60204c81e4 q_cake: Fix incorrect printing of signed values in class statistics
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.

Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-08 19:05:19 -08:00
Roi Dayan 9f366536ed dcb: Fix compilation warning about reallocarray
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:56:39 -08:00
Luca Boccassi 6739068fb0 iproute: fix printing resolved localhost
format_host_rta_r might return a cached hostname
via its return value and not use the input buffer.

Before:

$ ip -resolve -6 route
 dev lo proto kernel metric 256 pref medium

After:

$ ip/ip -resolve -6 route
localhost dev lo proto kernel metric 256 pref medium

Bug-Debian: https://bugs.debian.org/983591

Reported-by: Axel Scheepers <axel.scheepers76@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:54:16 -08:00
Parav Pandit c54e7bd605 devlink: Add error print when unknown values specified
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:16 +00:00
Parav Pandit 62ff25e51b devlink: Use generic socket helpers from library
User generic socket helpers from library for netlink generic socket
access.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:10 +00:00
Parav Pandit e3a4067e52 utils: Introduce helper routines for generic socket recv
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.

Use API in subsequent devlink patch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:04 +00:00
Parav Pandit 03662000e4 devlink: Use library provided string processing APIs
User helper routines provided by library for counting slash and
splitting string on delimiter.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 03:59:58 +00:00
Paolo Abeni 42fbca91cd mptcp: add support for port based endpoint
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.

Just parse and dump an additional, per endpoint, u16 attribute

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:15:10 +00:00
David Ahern 455c9f5361 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:07:57 +00:00
Stephen Hemminger 9d00602f82 vdpa: add .gitignore
Ignore the resulting binary vdpa.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:12:14 -08:00
Stephen Hemminger 5e0e73c347 Update kernel headers from 5.12-pre rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:11:12 -08:00
Stephen Hemminger 52c5f3f043 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2021-02-23 23:03:42 -08:00
Stephen Hemminger bbddfcec6c v5.11.0 2021-02-23 09:34:11 -08:00
Andrea Claudi b2d44b9a95 lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.

get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b3 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi 1de363b180 lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.

This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.

Fixes: ac3415f5c1 ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi d4fcdbbec9 lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.

As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.

Fixes: 95ae9a4870 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
2021-02-22 18:19:01 -08:00
Andrea Claudi 1e25de9a92 lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.

$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
.  ..

This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.

Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()

Fixes: e998e118dd ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:17:56 -08:00
Andrea Claudi e833dbe140 ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.

Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.

Fixes: 0486388a87 ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:48 -08:00
Andrea Claudi 546f738220 tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.

Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.

Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:03 -08:00
Vladimir Oltean e1d79d49ed man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 14f528a556 man8/bridge.8: explain self vs master for "bridge fdb add"
The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean b64ceb687d man8/bridge.8: fix which one of self/master is default for "bridge fdb"
The bridge program does:

fdb_modify:
	/* Assume self */
	if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
		req.ndm.ndm_flags |= NTF_SELF;

which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 10130bfafe man8/bridge.8: explain what a local FDB entry is
Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean ae3cb3d34d man8/bridge.8: document that "local" is default for "bridge fdb add"
The bridge does this:

fdb_modify:
	/* Assume permanent */
	if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
		req.ndm.ndm_state |= NUD_PERMANENT;

So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 1261459c64 man8/bridge.8: document the "permanent" flag for "bridge fdb add"
The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:

fdb_modify:
		} else if (matches(*argv, "local") == 0 ||
			   matches(*argv, "permanent") == 0) {
			req.ndm.ndm_state |= NUD_PERMANENT;

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Ido Kalir 675e2df632 rdma: Fix statistics bind/unbing argument handling
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.

Let's check the user input and ensure that arguments are valid.

Fixes: a6d0773ebe ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 10:52:39 -08:00
Thayne McCombs c7897ec2a6 ss: Make leading ":" always optional for sport and dport
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".

This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-14 22:09:37 -07:00
Amit Cohen 33e2471e8f ip route: Print "rt_offload_failed" indication
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:50:15 -07:00
David Ahern 34de4b26bf Update kernel headers
Update kernel headers to commit:
    c4762993129f ("Merge branch 'skbuff-introduce-skbuff_heads-bulking-and-reusing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:48:05 -07:00
Oleksandr Mazur c946f5d3e4 devlink: add support for port params get/set
Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:21:24 -07:00
David Ahern 143610383d Merge branch 'vdpa' into next
Parav Pandit  says:

====================

Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.

vdpa interface also includes showing supported management devices
which support such operations.

This patchset includes kernel uapi headers and a vdpa tool.

examples:

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
  supported_classes
    net
pci/0000:03.00:4
  supported_classes
    net
auxiliary/mlx5_core.sf.8
  supported_classes
    net

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:16:49 -07:00
Parav Pandit c2ecc82b9d vdpa: Add vdpa tool
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:15 -07:00
Parav Pandit 6c76994982 utils: Add helper to map string to unsigned int
In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:10 -07:00
Parav Pandit b822275ad8 utils: Add generic socket helpers
Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family

Hence add helper routines to open, close, query family and to perform
send receive operations.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:07 -07:00
Parav Pandit bd3709c3a7 utils: Add helper routines for indent handling
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:08:13 -07:00
Parav Pandit 5a6bf92a95 Add kernel headers
Add kernel headers to commit from kernel tree [1].
  6acba4951632 ("vdpa_sim_net: Add support for user supported devices")

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:07:47 -07:00
Paul Blakey 049708a002 tc: flower: Add support for ct_state reply flag
Matches on conntrack rpl ct_state.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:28 -07:00
Maxim Mikityanskiy b8b8b6d4c9 tc/htb: Hierarchical QoS hardware offload
This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:

$ tc qdisc show dev eth1
qdisc htb 1: root offloaded r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 offload
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
$ tc class show dev eth1
class htb 1:101 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:1 root rate 100Gbit ceil 100Gbit burst 0b cburst 0b  offload
class htb 1:103 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:102 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:105 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:104 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:107 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:106 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:108 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
$ tc -j qdisc show dev eth1
[{"kind":"htb","handle":"1:","root":true,"offloaded":true,"options":{"r2q":10,"default":"0","direct_packets_stat":0,"direct_qlen":1000,"offload":null}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}}]

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:13 -07:00
Thayne McCombs b7e5002456 ss: always prefer family as part of host condition to default family
ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.

This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:48:16 -07:00
Stephen Hemminger 2741208502 uapi: pick up rpl.h fix
Upstream change to fix byte order issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-03 08:16:16 -08:00
Luca Boccassi 5a37254b71 iproute: force rtm_dst_len to 32/128
Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:

 ip route get `1.0.0.0/1
  Error: ipv4: Invalid values in header for route get request.
 ip route get 0.0.0.0/0
  Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4

Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.

Bug-Debian: https://bugs.debian.org/944730
Reported-By: Clément 'wxcafé' Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:32:47 -08:00
Thayne McCombs 38957a2f6c ss: Add clarification about host conditions with multiple familes to man
In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).

The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.

I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?

Anyway, for now, this patch just notes the limitation in the man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:30:40 -08:00
Thayne McCombs df361a27c2 Add documentation of ss filter to man page
This adds some documentation of the syntax for the FILTER argument to
the ss command to the ss (8) man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:24:03 -08:00
Edwin Peer 9764761888 iplink: print warning for missing VF data
The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:18:42 -08:00
Paolo Abeni 3d6d9e6e67 ss: do not emit warn while dumping MPTCP on old kernels
Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:

RTNETLINK answers: Invalid argument

When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.

Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.

Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:17:14 -08:00
Vladimir Oltean 4712a46174 man: tc-taprio.8: document the full offload feature
Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.

This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:12:27 -08:00
Guillaume Nault 86d9660dc1 iplink_bareudp: cleanup help message and man page
* Fix PROTO description in help message (mpls isn't a valid argument).

 * Remove SRCPORTMIN description from help message since it doesn't
   appear in the syntax string.

 * Use same keywords in help message and in man page.

 * Use the "ethertype" option name (.B ethertype) rather than the
   option value (.I ETHERTYPE) in the man page description of
   [no]multiproto.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:11:32 -08:00
David Ahern d10f2a4bd8 Merge branch 'devlink-port-mgmt' into next
Parav Pandit  says:

====================

This patchset implements devlink port add, delete and function state
management commands.

An example sequence for a PCI SF:

Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false

Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active

Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive

Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:45:49 +00:00
Parav Pandit bdfb9f1bd6 devlink: Support set of port function state
Support set operation of the devlink port function state.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:48 +00:00
Parav Pandit 249465d3bf devlink: Support get port function state
Print port function state and operational state whenever reported by
kernel.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:41 +00:00
Parav Pandit 331bf89ad0 devlink: Supporting add and delete of devlink port
Enable user to add and delete the devlink port.

Examples for adding and deleting one SF port:

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

Add devlink port of flavour 'pcipf' for PF number 0 SF number 88:

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Delete newly added devlink port
$ devlink port del pci/0000:06:00.0/32768

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:36 +00:00
Parav Pandit 836a1365b7 devlink: Introduce PCI SF port flavour and attribute
Introduce PCI SF port flavour and port attributes such as PF
number and SF number.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:30 +00:00
Parav Pandit a9642c5fa6 devlink: Introduce and use string to number mapper
Instead of using static mapping in code, introduce a helper routine to
map a value to string.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:01:53 +00:00
David Ahern 1e61902180 Update kernel headers
Update kernel headers to commit:
    14e8e0f60088 ("tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 01:58:51 +00:00
Oliver Hartkopp 2ce313d1bb iplink_can: add Classical CAN frame LEN8_DLC support
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.

Add the command line support for cc-len8-dlc for Linux 5.11+

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-29 15:49:23 +00:00
Jarod Wilson 7887500008 bond: support xmit_hash_policy=vlan+srcmac
There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0

$ sudo ip link set bond0 type bond xmit_hash_policy vlan+srcmac

$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any
primary_reselect always fail_over_mac none xmit_hash_policy vlan+srcmac resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs
65535

$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlan+srcmac (5)

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
                [ clear_active_slave ] [ miimon MIIMON ]
                [ updelay UPDELAY ] [ downdelay DOWNDELAY ]
                [ peer_notify_delay DELAY ]
                [ use_carrier USE_CARRIER ]
                [ arp_interval ARP_INTERVAL ]
                [ arp_validate ARP_VALIDATE ]
                [ arp_all_targets ARP_ALL_TARGETS ]
                [ arp_ip_target [ ARP_IP_TARGET, ... ] ]
                [ primary SLAVE_DEV ]
                [ primary_reselect PRIMARY_RESELECT ]
                [ fail_over_mac FAIL_OVER_MAC ]
                [ xmit_hash_policy XMIT_HASH_POLICY ]
                [ resend_igmp RESEND_IGMP ]
                [ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
                [ all_slaves_active ALL_SLAVES_ACTIVE ]
                [ min_links MIN_LINKS ]
                [ lp_interval LP_INTERVAL ]
                [ packets_per_slave PACKETS_PER_SLAVE ]
                [ tlb_dynamic_lb TLB_DYNAMIC_LB ]
                [ lacp_rate LACP_RATE ]
                [ ad_select AD_SELECT ]
                [ ad_user_port_key PORTKEY ]
                [ ad_actor_sys_prio SYSPRIO ]
                [ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlan+srcmac
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:33:15 +00:00
wenxu c94fd71b34 tc: flower: add tc conntrack inv ct_state support
Matches on conntrack inv ct_state.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:16:35 +00:00
David Ahern c81a173f6b Update kernel headers
Update kernel headers to commit:
    59a49d9617e2 ("Merge branch 'mlxsw-expose-number-of-physical-ports'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:15:57 +00:00
Luca Boccassi 8498ca92d7 vrf: fix ip vrf exec with libbpf
The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:

$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?

https://bugs.debian.org/980046

Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com>

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:17 -08:00
Luca Boccassi 8dca565b17 vrf: print BPF log buffer if bpf_program_load fails
Necessary to understand what is going on when bpf_program_load fails

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:11 -08:00
Roi Dayan 1a22ad2721 build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366 ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc ("lib: Move get_size() from tc here")
Fixes: adbe5de966 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:28:47 -08:00
David Ahern b553cffa9f Merge branch 'dcb-app-dcbx' into next
Petr Machata  says:

====================

Add support to the dcb tool for the following two DCB objects:

- APP, which allows configuration of traffic prioritization rules based on
  several possible packet headers.

- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
  protocol is implemented in the device or in the host, and which version
  of the protocol should be used.

Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.

Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.

In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.

Patches #6 and #7 add the subtools themselves and their man pages.

v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
  This patch set now depends on that one.
- Patch #5:
    - Make it -N / --Numeric instead of -n / --no-nice-names
    - Rename the flag from no_nice_names to numeric as well
- Patch #6:
    - Adjust to s/no_nice_names/numeric/ from another patch.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:10:27 +00:00
Petr Machata 89d11ea596 dcb: Add a subtool for the DCBX object
The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.

For example:

	# dcb dcbx set dev eni1np1 host ieee
	# dcb dcbx show dev eni1np1
	host ieee

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 8e9bed1493 dcb: Add a subtool for the DCB APP object
DCB APP interfaces are standardized in 802.1q-2018, and allow configuration
of traffic prioritization rules based on several possible headers.

Add a dcb subtool for maintenance and display of the APP table. For
example:

    # dcb app add dev eni1np1 dscp-prio 0:0 CS3:3 CS6:6
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS6:6
    # dcb app add dev eni1np1 dscp-prio CS3:4
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS3:4 CS6:6
    # dcb app replace dev eni1np1 dscp-prio CS3:5
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:5 CS6:6

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 0aebd32b82 dcb: Support -N to suppress translation to human-readable names
Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata e59876ff55 dcb: Generalize dcb_get_attribute()
The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.

Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 69290c32dc dcb: Generalize dcb_set_attribute()
The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.

Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.

Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata c13216f7a6 lib: Generalize parse_mapping()
The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.

To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata bf244ee677 lib: rt_names: Add rtnl_dsfield_get_name()
For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
David Ahern fa2881b664 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 03:57:29 +00:00
Guillaume Nault 676a1a708f tc: flower: fix json output with mpls lse
The json output of the TCA_FLOWER_KEY_MPLS_OPTS attribute was invalid.

Example:

  $ tc filter add dev eth0 ingress protocol mpls_uc flower mpls \
      lse depth 1 label 100                                     \
      lse depth 2 label 200

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "  mpls":["    lse":["depth":1,"label":100],
                  "    lse":["depth":2,"label":200]]}...

This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.

Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.

Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.

Normal output isn't modified.
The json output now looks like:

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "mpls":[{"depth":1,"label":100},
                {"depth":2,"label":200}]}...

Fixes: eb09a15c12 ("tc: flower: support multiple MPLS LSE match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:13:36 -08:00
Petr Machata 934919b991 dcb: Change --Netns/-N to --netns/-n
This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata b4c0cad06e dcb: Plug a leaking DCB socket buffer
DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata 2e99c28161 dcb: Set values with RTM_SETDCB type
dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Stephen Hemminger 8b4b132261 uapi: update if_link.h from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:35 -08:00
Petr Machata ffe58c9185 include: uapi: Carry dcbnl.h
To allow building a new suite of DCB tools on an older kernel, carry a copy
of dcbnl.h.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:28 -08:00
Patrisious Haddad 537995c6d5 rdma: Add support for the netlink extack
Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:18:42 +00:00
Ido Schimmel 9bd498bfcd ipmonitor: Mention "nexthop" object in help and man page
Before:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid
 FILE := file FILENAME

After:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid | nexthop
 FILE := file FILENAME

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:17:32 +00:00
Ido Schimmel 043e03a369 nexthop: Fix usage output
Before:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get| del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
       [ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
 GROUP := [ id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

After:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get | del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
         [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
 GROUP := [ <id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:14:08 +00:00
Stephen Hemminger 2953235e61 uapi: update kernel headers to 5.11 pre rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-24 19:38:35 -08:00
Stephen Hemminger 2639bdc176 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main 2020-12-24 19:29:15 -08:00
Stephen Hemminger c9c64b8d1e 5.10.0 2020-12-21 10:28:53 -08:00
Guillaume Nault cb0debfe2d testsuite: Add mpls packet matching tests for tc flower
Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:

  * with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
    first LSE),

  * with the more generic "lse" keyword (allows matching at different
    depth of the MPLS label stack).

This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:14:26 +00:00
David Ahern c01dec8475 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:06:06 +00:00
Thomas Karlsson 42f5642a40 iplink:macvlan: Added bcqueuelen parameter
This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument

This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.

If not specified, the driver default (1000) will be used.

Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.

For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:02:07 +00:00
Andrea Claudi c8faeca5ad ss: mptcp: fix add_addr_accepted stat print
add_addr_accepted value is not printed if add_addr_signal value is 0.
Fix this properly looking for add_addr_accepted value, instead.

Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-15 13:59:13 -08:00
Andrea Claudi 0d78e8eabf tc: pedit: fix memory leak in print_pedit
keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.

Fixes: 081d6c310d ("tc: pedit: Support JSON dumping")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:24:08 -08:00
Andrea Claudi ec1346acbe devlink: fix memory leak in cmd_dev_flash()
nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.

This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:23:24 -08:00
Andrea Claudi 309e6027e5 man: tc-flower: fix manpage
Commit 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.

This commit simply removes it.

Fixes: 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:22:53 -08:00
David Ahern ee50fd58dc Merge branch 'dcb-pfc-buffer-maxrate' into next
Petr Machata  says:
====================

Add support to the dcb tool for the following three DCB objects:

- PFC, for "Priority-based Flow Control", allows configuration of priority
  lossiness, and related toggles.

- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
  allow configuration of port headroom buffers.

- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
  and allow configuration of rate with which traffic in a given traffic
  class is sent.

Patches #1-#4 fix small issues in the current DCB code and man pages.

Patch #5 adds new helpers to the DCB dispatcher.

Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.

Patches #8-#10 add the subtools themselves and their man pages.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:48:38 +00:00
Petr Machata 117939d9bd dcb: Add a subtool for the DCB maxrate object
DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.

Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:

    # dcb maxrate show dev eni1np1
    tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:07 +00:00
Petr Machata 2e36f91000 dcb: Add a subtool for the DCB buffer object
DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.

Add a dcb subtool to allow showing and tweaking of buffer priority mapping
and buffer sizes. For example:

    # dcb buf show dev eni1np1
    prio-buffer 0:0 1:0 2:0 3:3 4:0 5:0 6:6 7:0
    buffer-size 0:10000 1:0 2:0 3:70000 4:0 5:0 6:10000 7:0
    total-size 221072

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:03 +00:00
Petr Machata 6567cb588b dcb: Add a subtool for the DCB PFC object
PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.

Add a dcb subtool to allow showing and tweaking of individual PFC
configuration options, and querying statistics. For example:

    # dcb pfc show dev eni1np1
    pfc-cap 8 macsec-bypass on delay 0
    pg-pfc 0:off 1:on 2:off 3:off 4:off 5:off 6:off 7:on
    requests 0:0 1:217 2:0 3:0 4:0 5:0 6:0 7:28
    indications 0:0 1:179 2:0 3:0 4:0 5:0 6:0 7:18

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:58 +00:00
Petr Machata 808dd741fc dcb: Add -i to enable IEC mode
Allow switching "dcb" into the ISO/IEC mode of units by passing -i.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:54 +00:00
Petr Machata 6e9687db04 dcb: Add -s to enable statistics
Allow selective display of statistical counters by passing -s.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:50 +00:00
Petr Machata 11a72186a0 dcb: Add dcb_set_u32(), dcb_set_u64()
The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:45 +00:00
Petr Machata 7e94711c71 man: dcb-ets: Remove an unnecessary empty line
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:40 +00:00
Petr Machata a7c2eaac39 dcb: ets: Change the way show parameters are given in synopsis
None, one, or many parameters can be given on the command line, but
the current synopsis allows only none or one. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:22 +00:00
Petr Machata 12d41d0184 dcb: ets: Fix help display for "show" subcommand
"dcb ets show dev X help" currently shows full "ets" help instead of just
help for the show command. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:19 +00:00
Petr Machata 7fe954ee34 dcb: Remove unsupported command line arguments from getopt_long()
getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:40:32 +00:00
Stephen Hemminger 376367d917 uapi: merge in change to bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 08:07:06 -08:00
David Ahern 6e9bfdcdde Merge branch 'devlink-reload' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:43:41 +00:00
Moshe Shemesh c2d7c45c32 devlink: Add reload stats to dev show
Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.

Output examples:
$ devlink dev show -s
pci/0000:82:00.0:
  stats:
      reload:
          driver_reinit:
            unspecified 2
          fw_activate:
            unspecified 1 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
pci/0000:82:00.1:
  stats:
      reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 1
          fw_activate:
            unspecified 1 no_reset 0

$ devlink dev show -s -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 2
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                }
            }
        },
        "pci/0000:82:00.1": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 1
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                }
            }
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:15 +00:00
Moshe Shemesh 0c0023ad71 devlink: Add pr_out_dev() helper function
Add pr_out_dev() helper function and use it both by cmd_dev_show_cb()
and by cmd_mon_show_cb().

Dev stats will be added on the next patch to dev context, so
cmd_mon_show_cb() should print the whole dev context and not just dev
handle.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:09 +00:00
Moshe Shemesh f28c910274 devlink: Add devlink reload action and limit options
Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.

The following reload actions are supported:
  driver_reinit: driver entities re-initialization, applying
                 devlink-param and devlink-resource values.
  fw_activate: firmware activate.

The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.

By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.

Reload limit added:
  no_reset: No reset allowed, no down time allowed, no link flap and no
            configuration is lost.

Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

devlink dev reload pci/0000:82:00.1 action driver_reinit -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit" ]
    }
}

devlink dev reload pci/0000:82:00.0 action fw_activate -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit","fw_activate" ]
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:40:00 +00:00
David Ahern 120cdeb1b7 Merge branch 'rate-size-parsing-output' into next
Petr Machata says:
==================

The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.

s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.

The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.

Patch #4 then improves the code of sprint_size() a little bit.

Patch #5 fixes a buglet in formatting small rates in IEC mode.

Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.

This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:32:17 +00:00
Petr Machata 44396bdfcc lib: Move get_size() from tc here
The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:50 +00:00
Petr Machata f3be0e6366 lib: Move get_rate(), get_rate64() from tc here
The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:44 +00:00
Petr Machata aaeda2a768 lib: print_color_rate(): Fix formatting small rates in IEC mode
ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:41 +00:00
Petr Machata a0a4b6618c lib: sprint_size(): Uncrustify the code a bit
Ideally this and the rate printing would both be converted to a common
helper, but unfortunately the two format differently and this would break
tests and scripts out there. So just make the code look less like a wad of
hay.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:36 +00:00
Petr Machata adbe5de966 lib: Move sprint_size() from tc here, add print_size()
When displaying sizes of various sorts, tc commonly uses the function
sprint_size() to format the size into a buffer as a human-readable string.
This string is then displayed either using print_string(), or in some code
even fprintf(). As a result, a typical sequence of code when formatting a
size is something like the following:

	SPRINT_BUF(b);
	print_uint(PRINT_JSON, "foo", NULL, foo);
	print_string(PRINT_FP, NULL, "foo %s ", sprint_size(foo, b));

For a concept as broadly useful as size, it would be better to have a
dedicated function in json_print.

To that end, move sprint_size() from tc_util to json_print. Add helpers
print_size() and print_color_size() that wrap arount sprint_size() and
provide the JSON dispatch as appropriate.

Since print_size() should be the preferred interface, convert vast majority
of uses of sprint_size() to print_size(). Two notable exceptions are:

- q_tbf, which does not show the size as such, but uses the string
  "$human_readable_size/$cell_size" even in JSON. There is simply no way to
  have print_size() emit the same text, because print_size() in JSON mode
  should of course just use the raw number, without human-readable frills.

- q_cake, which relies on the existence of sprint_size() in its macro-based
  formatting helpers. There might be ways to convert this particular case,
  but given q_tbf simply cannot be converted, leave it as is.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:25 +00:00
Petr Machata 60265cc226 lib: Move print_rate() from tc here; modernize
The functions print_rate() and sprint_rate() are useful for formatting
rate-like values. The DCB tool would find these useful in the maxrate
subtool. However, the current interface to these functions uses a global
variable use_iec as a flag indicating whether 1024- or 1000-based powers
should be used when formatting the rate value. For general use, a global
variable is not a great way of passing arguments to a function. Besides, it
is unlike most other printing functions in that it deals in buffers and
ignores JSON.

Therefore make the interface to print_rate() explicit by converting use_iec
to an ordinary parameter. Since the interface changes anyway, convert it to
follow the pattern of other json_print functions (except for the
now-explicit use_iec parameter). Move to json_print.c.

Add a wrapper to tc, so that all the call sites do not need to repeat the
use_iec global variable argument, and convert all call sites.

In q_cake.c, the conversion is not straightforward due to usage of a macro
that is shared across numerous data types. Simply hand-roll the
corresponding code, which seems better than making an extra helper for one
call site.

Drop sprint_rate() now that everybody just uses print_rate().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:15 +00:00
Petr Machata cdd9425315 Move the use_iec declaration to the tools
The tools "ip" and "tc" use a flag "use_iec", which indicates whether, when
formatting rate values, the prefixes "K", "M", etc. should refer to powers
of 1024, or powers of 1000. The flag is currently kept as a global variable
in "ip" and "tc", but is nonetheless declared in util.h.

Instead, move the declaration to tool-specific headers ip/ip_common.h and
tc/tc_common.h.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:28:43 +00:00
Paolo Lungaroni 69629b4e43 seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors
We introduce the "vrftable" attribute for supporting the SRv6 End.DT4 and
End.DT6 behaviors in iproute2.
The "vrftable" attribute indicates the routing table associated with
the VRF device used by SRv6 End.DT4/DT6 for routing IPv4/IPv6 packets.

The SRv6 End.DT4/DT6 is used to implement IPv4/IPv6 L3 VPNs based on Segment
Routing over IPv6 networks in multi-tenants environments.
It decapsulates the received packets and it performs the IPv4/IPv6 routing
lookup in the routing table of the tenant.

The SRv6 End.DT4/DT6 leverages a VRF device in order to force the routing
lookup into the associated routing table using the "vrftable" attribute.

Some examples:
 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0
 $ ip -6 route add 2001:db8::2 encap seg6local action End.DT6 vrftable 200 dev eth0

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT4 vrftable 100 dev eth0 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End.DT6",
        "vrftable": 200,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

v2:
 - no changes made: resubmit after pulling out this patch from the kernel
   patchset.

v1:
 - mixing this patch with the kernel patchset confused patckwork.

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:27:42 +00:00
David Ahern cfad32569f Update kernel headers
Update kernel headers to commit:
    afae3cc2da10 ("net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:25:34 +00:00
David Ahern 8065d28218 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:25:12 +00:00
David Ahern b3c4a55064 Only compile mnl_utils when HAVE_MNL is defined
New lib/mnl_utils.c fails to compile if libmnl is not installed:

  mnl_utils.c:9:10: fatal error: libmnl/libmnl.h: No such file or directory
      9 | #include <libmnl/libmnl.h>

Make it dependent on HAVE_MNL.

Fixes: 72858c7b77 ("lib: Extract from devlink/mnlg a helper, mnlu_socket_open()")
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:19:05 +00:00
Stephen Hemminger 2e80ae89ca Merge branch 'gcc-10' into main 2020-12-03 08:33:06 -08:00
Luca Boccassi 755b1c584e tc/mqprio: json-ify output
As reported by a Debian user, mqprio output in json mode is
invalid:

{
     "kind": "mqprio",
     "handle": "8021:",
     "dev": "enp1s0f0",
     "root": true,
     "options": { tc 2 map 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
          queues:(0:3) (4:7)
          mode:channel
          shaper:dcb}
}

json-ify it, while trying to maintain the same formatting
for standard output.

New output:

{
    "kind": "mqprio",
    "handle": "8001:",
    "root": true,
    "options": {
        "tc": 2,
        "map": [ 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
        "queues": [ [ 0, 3 ], [ 4, 7 ] ],
        "mode": "channel",
        "shaper": "dcb"
    }
}

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=972784

Reported-by: Roméo GINON <romeo.ginon@ilexia.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:32:42 -08:00
Luca Boccassi 975c4944e8 ip/netns: use flock when setting up /run/netns
If multiple ip processes are ran at the same time to set up
separate network namespaces, and it is the first time so /run/netns
has to be set up first, and they end up doing it at the same time,
the processes might enter a recursive loop creating thousands of
mount points, which might crash the system depending on resources
available.

Try to take a flock on /run/netns before doing the mount() dance, to
ensure this cannot happen. But do not try too hard, and if it fails
continue after printing a warning, to avoid introducing regressions.

First reported on Debian: https://bugs.debian.org/949235

To reproduce (WARNING: run in a VM to avoid system lockups):

for i in {0..9}
do
        strace -e trace=mount -e inject=mount:delay_exit=1000000 ip \
 netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait

The strace is to ensure the problem always reproduces, to add an
artificial synchronization point after the first mount().

Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:31:23 -08:00
Vlad Buslov ea130da81e tc: implement support for action terse dump
Implement support for action terse dump using new TCA_ACT_FLAG_TERSE_DUMP
value of TCA_ROOT_FLAGS tlv. Set the flag when user requested it with
following example CLI (-br for 'brief'):

$ tc -s -br actions ls action tunnel_key
total acts 2

        action order 0: tunnel_key       index 1
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 1: tunnel_key       index 2
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
action (kind, index) and stats, if requested by the user.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:51:06 +00:00
Vlad Buslov 00fffb2d79 tc: use TCA_ACT_ prefix for action flags
Use TCA_ACT_FLAG_LARGE_DUMP_ON alias according to new preferred naming for
action flags.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:49:14 +00:00
David Ahern 23683dec32 Update kernel headers
Update kernel headers to commit:
    cec85994c6b4 ("bareudp: constify device_type declaration")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:47:07 +00:00
Sergey Ryazanov d7190d4ced ip: add IP_LIB_DIR environment variable
Do not hardcode /usr/lib/ip as a path and allow libraries path
configuration in run-time.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-02 16:37:07 +00:00
Stephen Hemminger fb054cb336 uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:17:22 -08:00
Stephen Hemminger c95d63e4fb uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:16:50 -08:00
Stephen Hemminger cae2e9291a f_u32: fix compiler gcc-10 compiler warning
With gcc-10 it complains about array subscript error.

f_u32.c: In function ‘u32_parse_opt’:
f_u32.c:1113:24: warning: array subscript 0 is outside the bounds of an interior zero-length array ‘struct tc_u32_key[0]’ [-Wzero-length-bounds]
 1113 |    hash = sel2.sel.keys[0].val & sel2.sel.keys[0].mask;
      |           ~~~~~~~~~~~~~^~~
In file included from tc_util.h:11,
                 from f_u32.c:26:
../include/uapi/linux/pkt_cls.h:253:20: note: while referencing ‘keys’
  253 |  struct tc_u32_key keys[0];
      |

This is because the keys are actually allocated in the second element
of the parent structure.

Simplest way to address the warning is to assign directly to the keys
in the containing structure.

This has always been in iproute2 (pre-git) so no Fixes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:33 -08:00
Stephen Hemminger c014983921 misc: fix compiler warning in ifstat and nstat
The code here was doing strncpy() in a way that causes gcc 10
warning about possible string overflow. Just use strlcpy() which
will null terminate and bound the string as expected.

This has existed since start of git era so no Fixes tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:31 -08:00
Stephen Hemminger 2319db9052 tc: fix compiler warnings in ip6 pedit
Gcc-10 complains about referencing a zero size array.
This occurs because the array of keys is actually in the following
structure which is part of the overall selector.

The original code was safe, but better to just use the key
array directly.

Fixes: 2d9a8dc439 ("tc: p_ip6: Support pedit of IPv6 dsfield")
Cc: petrm@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:23 -08:00
Stephen Hemminger 5bdc4e9151 bridge: fix string length warning
Gcc-10 complains about possible string length overflow.
This can't happen Ethernet address format is always limited to
18 characters or less. Just resize the temp buffer.

Fixes: 70dfb0b883 ("iplink: bridge: export bridge_id and designated_root")
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:16 -08:00
Stephen Hemminger f817699939 devlink: fix uninitialized warning
GCC-10 complains about uninitialized variable.

devlink.c: In function ‘cmd_dev’:
devlink.c:2803:12: warning: ‘val_u32’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 2803 |    val_u16 = val_u32;
      |    ~~~~~~~~^~~~~~~~~
devlink.c:2747:11: note: ‘val_u32’ was declared here
 2747 |  uint32_t val_u32;
      |           ^~~~~~~

This is a false positive because it can't figure out the control flow
when the parse returns error.

Fixes: 2557dca2b0 ("devlink: Add string to uint{8,16,32} conversion for generic parameters")
Cc: shalomt@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:19:36 -08:00
Vladimir Oltean c29f65db34 bridge: add support for L2 multicast groups
Extend the 'bridge mdb' command for the following syntax:
bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:54:02 +00:00
Luca Boccassi f5c1246e6a Add dcb/.gitignore
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:39:47 +00:00
David Ahern f98ce50046 Merge branch 'libbpf' into next
Hangbin Liu  says:

====================

This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v6:
a) print runtime libbpf version in ip -V and tc -V

v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.

Here are the test results with patched iproute2:
== Show libbpf version
$ ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
$ tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0

== setup env
$ clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
$ clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
$ rm -rf /sys/fs/bpf/xdp/globals
$ /root/iproute2/ip/ip link add type veth
$ /root/iproute2/ip/ip link set veth0 up
$ /root/iproute2/ip/ip link set veth1 up

== Load objs
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
$ /root/iproute2/ip/ip link set veth0 xdp off

== Load objs again to make sure maps could be reused
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Load new btf defined maps
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Test load objs by tc
$ /root/iproute2/tc/tc qdisc add dev veth0 ingress
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
$ /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
$ ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
$ bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:24:15 -07:00
Hangbin Liu 71c7c1fb4f examples/bpf: add bpf examples with BTF defined maps
Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:08 -07:00
Hangbin Liu 1ac8285a69 examples/bpf: move struct bpf_elf_map defined maps to legacy folder
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:06 -07:00
Hangbin Liu 6d61a2b557 lib: add libbpf support
This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.

In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.

Other maps/progs will be loaded by libbpf directly.

[1] https://lore.kernel.org/bpf/20190820114706.18546-1-toke@redhat.com/

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:05 -07:00
Hangbin Liu dc800a4ed4 lib: make ipvrf able to use libbpf and fix function name conflicts
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.

Function bpf_prog_load() is removed as it's conflict with libbpf
function name.

bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:04 -07:00
Hangbin Liu 503e9229b0 iproute2: add check_libbpf() and get_libbpf_version()
This patch aim to add basic checking functions for later iproute2
libbpf support.

First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.

Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.

When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:02 -07:00
David Ahern ee5d4b24e3 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:04:48 -07:00
Roi Dayan ed40b7e2ae tc flower: fix parsing vlan_id and vlan_prio
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e0 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:45:20 -07:00
Petr Machata ca5ec9a17a ip: iptuntap: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:41 -07:00
Petr Machata 66e574c4c5 ip: ipnetconf: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:34 -07:00
Petr Machata 07d82b4a79 ip: iplink_bridge_slave: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:30 -07:00
Petr Machata 3e0d2a73ba ip: iplink_bridge_slave: Port over to parse_on_off()
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:27 -07:00
Petr Machata 5f685d064b ip: iplink: Convert to use parse_on_off()
Invoke parse_on_off() instead of rolling a custom function.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:23 -07:00
Petr Machata 94d12fd796 bridge: link: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:19 -07:00
Petr Machata 9262ccc3ed bridge: link: Port over to parse_on_off()
Convert bridge/link.c from a custom on_off parser to the new global one.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:14 -07:00
David Ahern e1ae6efbb8 Merge branch 'nexthop-flags' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@nvidia.com>

Patch #1 prints the recently added 'RTNH_F_TRAP' flag.

Patch #2 makes sure that nexthop flags are always printed for nexthop
objects. Even when the nexthop does not have a device, such as a
blackhole nexthop or a group.

Example output with netdevsim:

$ ip nexthop
id 1 via 192.0.2.2 dev eth0 scope link trap
id 2 blackhole trap
id 3 group 2 trap

Example output with mlxsw:

$ ip nexthop
id 1 via 192.0.2.2 dev swp3 scope link offload
id 2 blackhole offload
id 3 group 2 offload

Tested with fib_nexthops.sh that uses "ip nexthop" output:

Tests passed: 164
Tests failed:   0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:46:30 -07:00
Ido Schimmel 0788678991 nexthop: Always print nexthop flags
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.

Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:43:56 -07:00
Ido Schimmel 3de35f41be ip route: Print "trap" nexthop indication
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:42:20 -07:00
David Ahern db8b149b16 Update kernel headers
Update kernel headers to commit:
    f9e425e99b07 ("octeontx2-af: Add support for RSS hashing based on Transport protocol field")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:41:23 -07:00
Stephen Hemminger 7a49ff9d79 bridge: report correct version
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-15 08:58:52 -08:00
Zahari Doychev 4c551369e0 tc flower: use right ethertype in icmp/arp parsing
Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 20:07:38 -07:00
David Ahern 1ed00380b0 Merge branch 'dcb-tool' into next
Petr Machata  says:
====================

The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc.

Currently a common libre tool for configuration of DCB is OpenLLDP. This
suite contains a daemon that uses Linux DCB interface to configure HW
according to the DCB TLVs exchanged over an interface. The daemon can also
be controlled by a client, through which the user can adjust and view the
configuration. The downside of using OpenLLDP is that it is somewhat
heavyweight and difficult to use in scripts, and does not support
extensions such as buffer and rate commands.

For access to many HW features, one would be perfectly fine with a
fire-and-forget tool along the lines of "ip" or "tc". For scripting in
particular, this would be ideal. This author is aware of one such tool,
mlnx_qos from Mellanox OFED scripts collection[1].

The downside here is that the tool is very verbose, the command line
language is awkward to use, it is not packaged in Linux distros, and
generally has the appearance of a very vendor-specific tool, despite not
being one.

This patchset addresses the above issues by providing a seed of a clean,
well-documented, easily usable, extensible fire-and-forget tool for DCB
configuration:

    # dcb ets set dev eni1np1 \
                  tc-tsa all:strict 0:ets 1:ets 2:ets \
		  tc-bw all:0 0:33 1:33 2:34

    # dcb ets show dev eni1np1 tc-tsa tc-bw
    tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict
    tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0

    # dcb ets set dev eni1np1 tc-bw 1:30 2:37

    # dcb -j ets show dev eni1np1 | jq '.tc_bw[2]'
    37

The patchset proceeds as follows:

- Many tools in iproute2 have an option to work in batch mode, where the
  commands to run are given in a file. The code to handle batching is
  largely the same independent of the tool in question. In patch #1, add a
  helper to handle the batching, and migrate individual tools to use it.

- A number of configuration options come in a form of an on-off switch.
  This in turn can be considered a special case of parsing one of a given
  set of strings. In patch #2, extract helpers to parse one of a number of
  strings, on top of which build an on-off parser.

  Currently each tool open-codes the logic to parse the on-off toggle. A
  future patch set will migrate instances of this code over to the new
  helpers.

- The on/off toggles from previous list item sometimes need to be dumped.
  While in the FP output, one typically wishes to maintain consistency with
  the command line and show actual strings, "on" and "off", in JSON output
  one would rather use booleans. This logic is somewhat annoying to have to
  open-code time and again. Therefore in patch #3, add a helper to do just
  that.

- The DCB tool is built on top of libmnl. Several routines will be
  basically the same in DCB as they are currently in devlink. In patches
  #4-#6, extract them to a new module, mnl_utils, for easy reuse.

- Much of DCB is built around arrays. A syntax similar to the iplink_vlan's
  ingress-qos-map / egress-qos-map is very handy for describing changes
  done to such arrays. Therefore in patch #7, extract a helper,
  parse_mapping(), which manages parsing of key-value arrays. In patch #8,
  fix a buglet in the helper, and in patch #9, extend it to allow setting
  of all array elements in one go.

- In patch #10, add a skeleton of "dcb", which contains common helpers and
  dispatches to subtools for handling of individual objects. The skeleton
  is empty as of this patch.

  In patch #11, add "dcb_ets", a module for handling of specifically DCB
  ETS objects.

  The intention is to gradually add handlers for at least PFC, APP, peer
  configuration, buffers and rates.

[1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:48:52 -07:00
Petr Machata ef15b07601 dcb: Add a subtool for the DCB ETS object
ETS, for "Enhanced Transmission Selection", is a set of configurations that
permit configuration of mapping of priorities to traffic classes, traffic
selection algorithm to use per traffic class, bandwidth allocation, etc.

Add a dcb subtool to allow showing and tweaking of individual ETS
configuration options. For example:

    # dcb ets show dev eni1np1
    willing on ets_cap 8 cbs off
    tc-bw 0:0 1:0 2:0 3:0 4:100 5:0 6:0 7:0
    pg-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    tc-tsa 0:strict 1:strict 2:strict 3:strict 4:ets 5:strict 6:strict 7:strict
    prio-tc 0:1 1:3 2:5 3:0 4:0 5:0 6:0 7:0
    reco-tc-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    reco-tc-tsa 0:strict 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
    reco-prio-tc 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 67033d1c1c Add skeleton of a new tool, dcb
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc. Add a new tool to show that
configuration and tweak it.

DCB allows configuration of several objects, and possibly could expand to
pre-standard CEE interfaces. Therefore the tool itself is a lean shell that
dispatches to subtools each dedicated to one of the objects.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 66a2d71487 lib: parse_mapping: Recognize a keyword "all"
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.

To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.

Have iplink_vlan pass false for allow_all.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata bc3523ae70 lib: parse_mapping: Update argc, argv on error
Currently argc and argv are not updated unless parsing of all of the
mapping was successful. However in that case, "ip link" will point at the
wrong argument when complaining:

    # ip link add name eth0.100 link eth0 type vlan id 100 egress 1:1 2:foo
    Error: argument "1" is wrong: invalid egress-qos-map

Update argc and argv even in the case of parsing error, so that the right
element is indicated.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 28e663ee65 lib: Extract from iplink_vlan a helper to parse key:value arrays
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.

Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 6dd778e837 lib: Extract from devlink/mnlg a helper, mnlu_socket_recv_run()
Receiving a message in libmnl is a somewhat involved operation. Devlink's
mnlg library has an implementation that is going to be handy for other
tools as well. Extract it into a new helper.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata dd78dfc7be lib: Extract from devlink/mnlg a helper, mnlu_msg_prepare()
Allocation of a new netlink message with the two usual headers is reusable
with other netlink netlink message types. Extract it into a helper,
mnlu_msg_prepare(). Take the second header as an argument, instead of
passing in parameters to initialize it, and copy it in.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 72858c7b77 lib: Extract from devlink/mnlg a helper, mnlu_socket_open()
This little dance of mnl_socket_open(), option setting, and bind, is the
same regardless of tool. Extract into a new module that should hold helpers
for working with libmnl, mnl_util.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 9091ff0251 lib: json_print: Add print_on_off()
The value of a number of booleans is shown as "on" and "off" in the plain
output, and as an actual boolean in JSON mode. Add a function that does
that.

RDMA tool already uses a function named print_on_off(). This function
always shows "on" and "off", even in JSON mode. Since there are probably
very few if any consumers of this interface at this point, migrate it to
the new central print_on_off() as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 82604d2852 lib: Add parse_one_of(), parse_on_off()
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.

Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".

Convert the macsec code to the new helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 1d9a81b8c9 Unify batch processing across tools
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Guillaume Nault 8682f588bf tc-mpls: fix manpage example and help message string
Manpage:
 * Remove the extra "and to ip packets" part from command description
   to make it more understandable.

 * Redirect packets to eth1, instead of eth0, as told in the
   description.

Help string:
 * "mpls pop" can be followed by a CONTROL keyword.

 * "mpls modify" can also set the MPLS_BOS field.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:28 -08:00
Guillaume Nault 7c7a0fe0c8 tc-vlan: fix help and error message strings
* "vlan pop" can be followed by a CONTROL keyword.

 * Add missing space in error message.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:18 -08:00
Stephen Hemminger 72f88bd42a uapi: update kernel headers from 5.10-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:47:27 -08:00
Stephen Hemminger b90c39be33 rdma: fix spelling error in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:44:19 -08:00
Stephen Hemminger c8424b73e1 man: fix spelling errors
Lots of little typo errors on man pages.
Found by running codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:40:30 -08:00
Stephen Hemminger cbf6481797 tc/m_gate: fix spelling errors
Fix spelling errors in error messages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:34:23 -08:00
Stephen Hemminger 14b189f066 uapi: updates from 5.10-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-03 08:29:53 -08:00
David Ahern 51f28eb928 Merge branch 'tc-terse-dump' into next
Vlad Buslov  says:

====================

Implement support for terse dump mode which provides only essential
classifier/action info (handle, stats, cookie, etc.). Use new
TCA_DUMP_FLAGS_TERSE flag to prevent copying of unnecessary data from
kernel.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:18:43 -06:00
Vlad Buslov 477ca0dfb4 tc: implement support for terse dump
Implement support for classifier/action terse dump using new TCA_DUMP_FLAGS
tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Set the flag when
user requested it with following example CLI (-br for 'brief'):

$ tc -s -br filter show dev ens1f0 ingress
filter protocol ip pref 49151 flower chain 0
filter protocol ip pref 49151 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter protocol ip pref 49152 flower chain 0
filter protocol ip pref 49152 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
filter and action (handle, cookie, etc.) and stats, if requested by the
user. The intention is to significantly improve rule dump rate by omitting
all static data that do not change after rule is created.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:15:15 -06:00
Vlad Buslov a99ebeeef2 tc: skip actions that don't have options attribute when printing
Modify implementations that return error from action_until->print_aopt()
callback to silently skip actions that don't have their corresponding
TCA_ACT_OPTIONS attribute set (some actions already behave like this).
Print action kind before returning from action_until->print_aopt()
callbacks. This is necessary to support terse dump mode in following patch
in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:14:01 -06:00
Johannes Berg 9fc5bf734f libnetlink: define __aligned conditionally
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.

Use our own definition only if there isn't one already.

Fixes: d5acae244f ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-28 10:24:02 -07:00
David Ahern eb12cc9ae1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:08:12 -06:00
Guillaume Nault f1298d7660 m_mpls: test the 'mac_push' action after 'modify'
Commit 02a261b5ba ("m_mpls: add mac_push action") added a matches()
test for the "mac_push" string before the test for "modify".
This changes the previous behaviour as 'action m' used to match
"modify" while it now matches "mac_push".

Revert to the original behaviour by moving the "mac_push" test after
"modify".

Fixes: 02a261b5ba ("m_mpls: add mac_push action")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:07:13 -06:00
David Ahern 2b7a768408 Merge branch 'tipc-encryption' into next
Tuong Lien  says:

====================

This series adds two new options in the 'iproute2/tipc' command, enabling users
to use the new TIPC encryption features, i.e. the master key and rekeying which
have been recently merged in kernel.

The help menu of the "tipc node set key" command is also updated accordingly:

 # tipc node set key --help
Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES]
       tipc node set key rekeying REKEYING

KEY
  Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form:
  [KEY: 16, 24 or 32 octets][SALT: 4 octets]

ALGNAME
  Cipher algorithm [default: "gcm(aes)"]

PROPERTIES
  master                - Set KEY as a cluster master key
  <empty>               - Set KEY as a cluster key
  nodeid NODEID         - Set KEY as a per-node key for own or peer

REKEYING
  INTERVAL              - Set rekeying interval (in minutes) [0: disable]
  now                   - Trigger one (first) rekeying immediately

EXAMPLES
  tipc node set key this_is_a_master_key master
  tipc node set key 0x746869735F69735F615F6B657931365F73616C74
  tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002
  tipc node set key rekeying 600

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:05:40 -06:00
Tuong Lien 2bf1ba5a5c tipc: add option to set rekeying for encryption
As supported in kernel, the TIPC encryption rekeying can be tuned using
the netlink attribute - 'TIPC_NLA_NODE_REKEYING'. Now we add the
'rekeying' option correspondingly to the 'tipc node set key' command so
that user will be able to perform that tuning:

tipc node set key rekeying REKEYING

where the 'REKEYING' value can be:

INTERVAL              - Set rekeying interval (in minutes) [0: disable]
now                   - Trigger one (first) rekeying immediately

For example:
$ tipc node set key rekeying 60
$ tipc node set key rekeying now

The command's help menu is also updated with these descriptions for the
new command option.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:45 -06:00
Tuong Lien 5fb3681885 tipc: add option to set master key for encryption
In addition to the support of master key in kernel, we add the 'master'
option to the 'tipc node set key' command for user to be able to
specify a key as master key during the key setting. This is carried out
by turning on the new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER'.
For example:

$ tipc node set key "this_is_a_master_key" master

The command's help menu is also updated to give a better description of
all the available options.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:37 -06:00
David Ahern b4edd6a8a6 Merge branch 'tc-mpls-l2-vpn' into next
Guillaume Nault  says:

====================

This patch series adds the possibility for TC to tunnel Ethernet frames
over MPLS.

Patch 1 allows adding or removing the Ethernet header.
Patch 2 allows pushing an MPLS LSE before the MAC header.

By combining these actions, it becomes possible to encapsulate an
entire Ethernet frame into MPLS, then add an outer Ethernet header
and send the resulting frame to the next hop.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:47 -06:00
Guillaume Nault 02a261b5ba m_mpls: add mac_push action
Add support for the new TCA_MPLS_ACT_MAC_PUSH action (kernel commit
a45294af9e96 ("net/sched: act_mpls: Add action to push MPLS LSE before
Ethernet header")). This action let TC push an MPLS header before the
MAC header of a frame.

Example (encapsulate all outgoing frames with label 20, then add an
outer Ethernet header):
 # tc filter add dev ethX matchall \
       action mpls mac_push label 20 ttl 64 \
       action vlan push_eth dst_mac 0a:00:00:00:00:02 \
                            src_mac 0a:00:00:00:00:01

This patch also adds an alias for ETH_P_TEB, since it is useful when
decapsulating MPLS packets that contain an Ethernet frame.

With MAC_PUSH, there's no previous Ethertype to modify. However, the
"protocol" option is still needed, because the kernel uses it to set
skb->protocol. So rename can_modify_ethtype() to can_set_ethtype().

Also add a test suite for m_mpls, which covers the new action and the
pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:08 -06:00
Guillaume Nault d61167dd88 m_vlan: add pop_eth and push_eth actions
Add support for the new TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH
actions (kernel commit 19fbcb36a39e ("net/sched: act_vlan:
Add {POP,PUSH}_ETH actions"). These action let TC remove or add the
Ethernet at the head of a frame.

Drop an Ethernet header:
 # tc filter add dev ethX matchall action vlan pop_eth

Push an Ethernet header (the original frame must have no MAC header):
 # tc filter add dev ethX matchall action vlan \
       push_eth dst_mac 0a:00:00:00:00:02 src_mac 0a:00:00:00:00:01

Also add a test suite for m_vlan, which covers these new actions and
the pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:36:38 -06:00
Jacob Keller 3342688a66 devlink: display elapsed time during flash update
For some devices, updating the flash can take significant time during
operations where no status can meaningfully be reported. This can be
somewhat confusing to a user who sees devlink appear to hang on the
terminal waiting for the device to update.

Recent changes to the kernel interface allow such long running commands
to provide a timeout value indicating some upper bound on how long the
relevant action could take.

Provide a ticking counter of the time elapsed since the previous status
message in order to make it clear that the program is not simply stuck.

Display this message whenever the status message from the kernel
indicates a timeout value. Additionally also display the message if
we've received no status for more than couple of seconds. If we elapse
more than the timeout provided by the status message, replace the
timeout display with "timeout reached".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-17 09:30:06 -06:00
Stephen Hemminger cb7ce51cc1 v5.9.0 2020-10-15 15:18:35 -07:00
zhangkaiheb@126.com 78ace1c211 tc: fq: clarify the length of orphan_mask.
Signed-off-by: kai zhang <zhangkaiheb@126.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-15 15:16:52 -07:00
Jan Engelhardt 0ca1312c20 ip: add error reporting when RTM_GETNSID failed
`ip addr` when run under qemu-user-riscv64, fails. This likely is due
to qemu-5.1 not doing translation of RTM_GETNSID calls. Aborting ip
completely is not helpful for the user however. This patch reworks
the error handling.

Before:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ffrtest:/ #

Afterwards:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
rtnl_send(RTM_GETNSID): Operation not supported. Continuing anyway.
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.147/28 brd 192.168.72.159 scope global host0
       valid_lft forever preferred_lft forever
    inet6 fe80::443f:2dff:fe88:3ddb/64 scope link
       valid_lft forever preferred_lft forever

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-12 08:10:25 -07:00
Dmitry Yakunin 58c3c55f38 lib: ignore invalid mounts in cg_init_map
In case of bad entries in /proc/mounts just skip cgroup cache initialization.
Cgroups in output will be shown as "unreachable:cgroup_id".

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 23:02:35 -07:00
Stephen Hemminger 003b9af516 uapi: add new SNMP entry
Update to snmp.h from 5.9

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 22:50:22 -07:00
David Ahern b5a583fb32 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:11:09 -06:00
Johannes Berg 7812012849 genl: ctrl: print op -> policy idx mapping
Newer kernels can dump per-op policies, so print out the new
mapping attribute to indicate which op has which policy.

v2:
 * print out both do/dump policy idx
v3:
 * fix userspace API which renumbered after patch rebasing

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:10:09 -06:00
David Ahern 91c54917cd Merge branch 'bridge-igmpv3-mldv2' into next
Nikolay Aleksandrov  says:

====================
This set adds support for IGMPv3/MLDv2 attributes, they're mostly
read-only at the moment. The only new "set" option is the source address
for S,G entries. It is added in patch 01 (see the patch commit message for
an example). Patch 02 shows a missing flag (fast_leave) for
completeness, then patch 03 shows the new IGMPv3/MLDv2 flags:
added_by_star_ex and blocked. Patches 04-06 show the new extra
information about the entry's state when IGMPv3/MLDv2 are enabled. That
includes its filter mode (include/exclude), source list with timers and
origin protocol (currently only static/kernel), in order to show the new
information the user must use "-d"/show_details.
Here's the output of a few IGMPv3 entries:
 dev bridge port ens12 grp 239.0.0.1 src 20.21.22.23 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 8.9.10.11 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 1.2.3.1 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 temp filter_mode exclude source_list 20.21.22.23/0.00,8.9.10.11/0.00,1.2.3.1/0.00 proto kernel    26.65

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:09:14 -06:00
Nikolay Aleksandrov 86588450c5 bridge: mdb: print protocol when available
Print the mdb entry's protocol (i.e. who added it)  when it's available if
the user requested to show details (-d). Currently the only possible
values are RTPROT_STATIC (user-space added) or RTPROT_KERNEL
(automatically added by kernel). The value is kernel controlled.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:50 -06:00
Nikolay Aleksandrov 2de81d1eff bridge: mdb: print source list when available
Print the mdb entry's source list when it's available if the user
requested to show details (-d). Each source has an associated timer
which controls if traffic should be forwarded to that S,G entry (if the
timer is non-zero traffic is forwarded, otherwise it's not).
Currently the source list is kernel controlled and can't be changed by
user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:45 -06:00
Nikolay Aleksandrov 1d28c48046 bridge: mdb: print filter mode when available
Print the mdb entry's filter mode when it's available if the user
requested to show details (-d). It can be either include or exclude.
Currently it's kernel controlled and can't be changed by user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:39 -06:00
Nikolay Aleksandrov e331677ea2 bridge: mdb: show igmpv3/mldv2 flags
With IGMPv3/MLDv2 support we have 2 new flags:
 - added_by_star_ex: set when the S,G entry was automatically created
                     because of a *,G entry in EXCLUDE mode
 - blocked: set when traffic for the S,G entry for that port has to be
            blocked
Both flags are used only on the new S,G entries and are currently kernel
managed, i.e. similar to other flags which can't be set from user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:34 -06:00
Nikolay Aleksandrov f94e8b0749 bridge: mdb: print fast_leave flag
We're not showing the fast_leave flag when it's set. Currently that can
be only when an mdb entry is being deleted due to fast leave, so it will
only affect mdb monitor.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:30 -06:00
Nikolay Aleksandrov 547b319762 bridge: mdb: add support for source address
This patch adds the user-space control and dump of mdb entry source
address. When setting the new MDBA_SET_ENTRY_ATTRS nested attribute is
used and inside is added MDBE_ATTR_SOURCE based on the address family.
When dumping we look for MDBA_MDB_EATTR_SOURCE and if present we add the
"src x.x.x.x" output. The source address will be always shown as it's
needed to match the entry to modify it from user-space.

Example:
 $ bridge mdb add dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100
 $ bridge mdb show
 dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:25 -06:00
David Ahern f905191a48 Update kernel headers
Update kernel headers to commit:
    bc081a693a56 ("Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:04:57 -06:00
Antony Antony 4322b13c8d ip xfrm: support setting XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute can be set in states (4.19+)
It is optional and the kernel default is 0xffffffff
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

e.g.
./ip/ip xfrm state add output-mark 0x6 mask 0xab proto esp \
 auth digest_null 0 enc cipher_null ''
ip xfrm state
src 0.0.0.0 dst 0.0.0.0
	proto esp spi 0x00000000 reqid 0 mode transport
	replay-window 0
	output-mark 0x6/0xab
	auth-trunc digest_null 0x30 0
	enc ecb(cipher_null)
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	sel src 0.0.0.0/0 dst 0.0.0.0/0

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:10:47 -06:00
Jiri Pirko 8dc1db80e4 devlink: Add health reporter test command support
Add health reporter test command and allow user to trigger a test event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:08:53 -06:00
Jacob Keller 012164718b devlink: support setting the overwrite mask attribute
The recently added DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK allows
userspace to indicate how a device should handle subsections of a flash
component when updating. For example, a flash component might contain
vital data such as PCIe serial number or configuration fields such as
settings that control device bootup.

The overwrite mask allows specifying whether the device should overwrite
these subsections when updating from the provided image. If nothing is
specified, then the update is expected to preserve all vital fields and
configuration.

Add support for specifying the overwrite mask using the new "overwrite"
option to the flash command line.

By specifying "overwrite identifiers", the user request that the flash
update should overwrite any settings in the updated flash component with
settings from the provided flash image

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite identifiers

By specifying "overwrite settings" the user requests that the flash update
should overwrite any settings in the updated flash component with setting
values from the provided flash image.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings

These options may be combined, in which case both subsections will be sent
in the overwrite mask, resulting in a request to overwrite all settings and
identifiers stored in the updated flash components.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings overwrite identifiers

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:02:16 -06:00
David Ahern 34be2d2619 Update kernel headers
Update kernel headers to commit:
    9faebeb2d800 ("Merge branch 'ethtool-allow-dumping-policies-to-user-space'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:01:26 -06:00
Stephen Hemminger be1bea8432 addr: Fix noprefixroute and autojoin for IPv4
These were reported as IPv6-only and ignored:

     # ip address add 192.0.2.2/24 dev dummy5 noprefixroute
     Warning: noprefixroute option can be set only for IPv6 addresses
     # ip address add 224.1.1.10/24 dev dummy5 autojoin
     Warning: autojoin option can be set only for IPv6 addresses

This enables them back for IPv4.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Adel Belhouane <bugs.a.b@free.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:15:56 -07:00
Eyal Birger e410c963e3 ipntable: add missing ndts_table_fulls ntable stat
Used for tracking neighbour table overflows.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:07:10 -07:00
Kamal Heib 10414de9e6 ip: iplink_ipoib.c: Remove extra spaces
Remove the extra space between the reported ipoib attrs - use only one
space instead of two.

Fixes: de0389935f ("iplink: Added support for the kernel IPoIB RTNL ops")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 22:29:05 -07:00
Ciara Loftus d2be31d9b6 ss: add support for xdp statistics
The patch exposes statistics for XDP sockets which can be useful for
debugging purposes.

The stats exposed are:
    rx dropped
    rx invalid
    rx queue full
    rx fill ring empty
    tx invalid
    tx ring empty

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:21:24 -06:00
David Ahern f481515c89 Update kernel headers
Update kernel headers to commit:
    280095713ce2 ("Merge branch 'ibmvnic-refactor-some-send-handle-functions'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:13:21 -06:00
Stephen Hemminger 03fb6fa1d8 uapi: update headers from 5.9-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:50:36 -07:00
Jan Engelhardt fece144abc build: avoid make jobserver warnings
I observe:

	» make -j8 CCOPTS=-ggdb3
	lib
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	make[1]: Nothing to be done for 'all'.
	ip
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	    CC       ipntable.o

MFLAGS is a historic variable of some kind; removing it fixes the
jobserver issue.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:49:24 -07:00
Jakub Kicinski b8663da049 ip: promote missed packets to the -s row
missed_packet_errors are much more commonly reported:

linux$ git grep -c '[.>]rx_missed_errors ' -- drivers/ | wc -l
64
linux$ git grep -c '[.>]rx_over_errors ' -- drivers/ | wc -l
37

Plus those drivers are generally more modern than those
using rx_over_errors.

Since recently merged kernel documentation makes this
preference official, let's make ip -s output more informative
and let rx_missed_errors take the place of rx_over_errors.

Before:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    6.04T      4.67G    0       0       0       67.7M
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       7
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

After:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped missed  mcast
    6.04T      4.67G    0       0       7       67.7M
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:23:29 -06:00
David Ahern cec67df974 Merge branch 'devlink-controller-external-info' into next
Parav Pandit  says:

====================

For certain devlink port flavours controller number and optionally external=
 attributes are reported by the kernel.

(a) controller number indicates that a given port belong to which local or =
external controller.
(b) external port attribute indicates that if a given port is for external =
or local controller.

This short series shows this attributes to user.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:17:48 -06:00
Parav Pandit 748cbad33b devlink: Show controller number of a devlink port
Show the controller number of the devlink port whenever kernel reports
it.

Example of a PCI VF port for an external controller number 1:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:09 -06:00
Parav Pandit 8fadd01101 devlink: Show external port attribute
If a port is for an external controller, port's external attribute is
set. Show such external attribute.

An example of an external controller port for PCI VF:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:04 -06:00
David Ahern 454429e8b4 Update kernel headers
Update kernel headers to commit:
    748d1c8a425e ("Merge branch 'devlink-Use-nla_policy-to-validate-range'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:10:43 -06:00
Roman Mashak aba44dc2ea ip: updated ip-link man page
Added description of link flags allmulticast, promisc and trailers.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-14 20:42:04 -07:00
Wei Wang ad34d5fadb iproute2: ss: add support to expose various inet sockopts
This commit adds support to expose the following inet socket options:
-- recverr
-- is_icsk
-- freebind
-- hdrincl
-- mc_loop
-- transparent
-- mc_all
-- nodefrag
-- bind_address_no_port
-- recverr_rfc4884
-- defer_connect
with the option --inet-sockopt. The individual option is only shown
when set.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:36:06 -06:00
David Ahern c8eb4b52c1 Update kernel headers
Update kernel headers to commit:
4349abdb409b ("net: dsa: don't print non-fatal MTU error if not supported")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:35:28 -06:00
Hoang Le abee772ff1 tipc: support 128bit node identity for peer removing
Problem:
In kernel upstream, we add the support to set node identity with
128bit. However, we are still using legacy format in command tipc
peer removing. Then, we got a problem when trying to remove
offline node i.e:

$ tipc node list
Node Identity                    Hash     State
d6babc1c1c6d                     1cbcd7ca down

$ tipc peer remove address d6babc1c1c6d
invalid network address, syntax: Z.C.N
error: No such device or address

Solution:
We add the support to remove a specific node down with 128bit
node identifier, as an alternative to legacy 32-bit node address.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 20:01:39 -06:00
Roopa Prabhu 6fd53b2a1c iplink: add support for protodown reason
This patch adds support for recently
added link IFLA_PROTO_DOWN_REASON attribute.
IFLA_PROTO_DOWN_REASON enumerates reasons
for the already existing IFLA_PROTO_DOWN link
attribute.

$ cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ ip link set dev vx10 protodown on protodown_reason vrrp on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "proto_down_reason": [ "vrrp" ]
} ]
$ip link set dev vx10 protodown_reason mlag on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <mlag,vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "protodown_reason": [ "mlag","vrrp" ]
} ]

$ip -p -j link show dev vx10
$ip link set dev vx10 protodown off protodown_reason vrrp off
Error: Cannot clear protodown, active reasons.
$ip link set dev vx10 protodown off protodown_reason mlag off
$

Note: for somereason the json and non-json key for protodown
are different (protodown and proto_down). I have kept the
same for protodown reason for consistency (protodown_reason and
proto_down_reason).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:52:13 -06:00
Antony Antony af27494d2e ip xfrm: support printing XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute is set in states (4.19+).
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

sample output: note the output-mark mask
ip xfrm state
	src 192.1.2.23 dst 192.1.3.33
	proto esp spi 0xSPISPI reqid REQID mode tunnel
	replay-window 32 flag af-unspec
	output-mark 0x3/0xffffff
	aead rfc4106(gcm(aes)) 0xENCAUTHKEY 128
	if_id 0x1

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:49:29 -06:00
David Ahern 275eed9be5 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:46:20 -06:00
Phil Sutter 23203b750e ip link: Fix indenting in help text
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.

While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Fixes: 5a3ec4ba64 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-31 12:32:26 -07:00
Johannes Berg cc889b8241 genl: ctrl: support dumping netlink policy
Support dumping the netlink policy of a given generic netlink
family, the policy (with any sub-policies if appropriate) is
exported by the kernel in a general fashion.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:14 -06:00
Johannes Berg d5acae244f libnetlink: add nl_print_policy() helper
This prints out the data from the given nested attribute
to the given FILE pointer, interpreting the firmware that
the kernel has for showing netlink policies.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:07 -06:00
Johannes Berg 784fa9f62f libnetlink: add rtattr_for_each_nested() iteration macro
This is useful for iterating elements in a nested attribute,
if they're not parsed with a strict length limit or such.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:34:29 -06:00
Murali Karicheri ea6aeeb90c ip: iplink: prp: update man page for new parameter
PRP support requires a proto parameter which is 0 for hsr and 1 for
prp. Default is hsr and is backward compatible.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Murali Karicheri 68f027724b iplink: hsr: add support for creating PRP device similar to HSR
This patch enhances the iplink command to add a proto parameters to
create PRP device/interface similar to HSR. Both protocols are
quite similar and requires a pair of Ethernet interfaces. So re-use
the existing HSR iplink command to create PRP device/interface as
well. Use proto parameter to differentiate the two protocols.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Amit Cohen 8e6bce735a devlink: Add fflush() in cmd_mon_show_cb()
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

Without it, stdout output is buffered and not written to the disk.

This is useful when writing scripts that rely on devlink-monitor output.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:13:11 -07:00
Sascha Hauer 7e7a1d107b iproute2: ip maddress: Check multiaddr length
ip maddress add|del takes a MAC address as argument, so insist on
getting a length of ETH_ALEN bytes. This makes sure the passed argument
is actually a MAC address and especially not an IPv4 address which
was previously accepted and silently taken as a MAC address.

While at it, do not print *argv in the error path as this has been
modified by ll_addr_a2n() and doesn't contain the full string anymore,
which can lead to misleading error messages.

Also while at it, replace the hardcoded buffer size with the actual
buffer size using sizeof().

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:12:30 -07:00
Stephen Hemminger bf538de59d uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 16:09:52 -07:00
Leon Romanovsky db6d6becb0 rdma: Properly print device and link names in CLI output
The citied commit broke the CLI output and printed ifindex/ifname
instead of dev/link.

Before:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/lqpn 1 type GSI state RTS sq-psn 0 comm ib_core
[leonro@vm ~]$ rdma res show cq
ifindex 0 ifname rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm ib_core

After:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/- lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
[leonro@vm ~]$ rdma res show cq
dev rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm [ib_core]

It was missed because rdmatool mostly used in JSON mode.

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Leon Romanovsky 7ded3c97b9 rdma: Fix owner name for the kernel resources
Owner of kernel resources is printed in different format than user
resources to easy with the reader by simply looking on the name.
The kernel owner will have "[ ]" around the name.

Before this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm ib_core

After this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm [ib_core]

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Stephen Hemminger 52d767aff8 uapi: update kernel headers
pre-rc1 version of Linux kernel headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-11 13:18:41 -07:00
Mark Zhang e8e8f16ed1 rdma: Document the new "pid" criteria for auto mode
Document the new supported criteria of auto mode. Examples:
$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto pid,type on

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:12 +00:00
Mark Zhang e28133316d rdma: Add "PID" criteria support for statistic counter auto mode
With this new criteria, QPs have different PIDs will be bound to
different counters in auto mode. This can be used in combination with
other criteria like "type". Examples:

$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto type,pid on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp show link mlx5_0 qp-type UD

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:04 +00:00
Mark Zhang cb69794736 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 76251e15ea73
("RDMA/counter: Add PID category support in auto mode")

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:25:04 +00:00
David Ahern e572e3af0d Merge branch 'main' into next
Conflicts:
	bridge/fdb.c
	man/man8/bridge.8

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:21:35 +00:00
Stephen Hemminger 53159d8115 v5.8.0 2020-08-03 10:03:42 -07:00
Stephen Hemminger d530608d33 lnstat: use same version as iproute2
Lnstat was trying to be different and have its own version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Stephen Hemminger fbef655568 replace SNAPSHOT with auto-generated version string
Replace the iproute2 snapshot with a version string which is
autogenerated as part of the build process using git describe.

This will also allow seeing if the version of the command
is built from the same sources is as upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Vasundhara Volam 7332b188a6 devlink: Add board.serial_number to info subcommand.
Add support for reading board serial_number to devlink info
subcommand. Example:

$ devlink dev info pci/0000:af:00.0 -jp
{
    "info": {
        "pci/0000:af:00.0": {
            "driver": "bnxt_en",
            "serial_number": "00-10-18-FF-FE-AD-1A-00",
            "board.serial_number": "433551F+172300000",
            "versions": {
                "fixed": {
                    "board.id": "7339763 Rev 0.",
                    "asic.id": "16D7",
                    "asic.rev": "1"
                },
                "running": {
                    "fw": "216.1.216.0",
                    "fw.psid": "0.0.0",
                    "fw.mgmt": "216.1.192.0",
                    "fw.mgmt.api": "1.10.1",
                    "fw.ncsi": "0.0.0.0",
                    "fw.roce": "216.1.16.0"
                }
            }
        }
    }
}

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 15:35:56 +00:00
Petr Vaněk a7f1974f6e ip-xfrm: add support for oseq-may-wrap extra flag
This flag allows to create SA where sequence number can cycle in
outbound packets if set.

Signed-off-by: Petr Vaněk <pv@excello.cz>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:57:25 +00:00
David Ahern 91922a4121 Update kernel headers
Update kernel headers to commit:
    bd0b33b24897 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:56:28 +00:00
Danielle Ratson e5f4165a9e devlink: Expose port split ability
Add a new attribute that indicates the port split ability to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1
splittable false lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:37:14 +00:00
Danielle Ratson fcbc6c1c71 devlink: Expose number of port lanes
Add a new attribute that indicates the port's number of lanes to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1 lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:36:52 +00:00
Julien Fortin cb17e0cc57 bridge: fdb show: fix fdb entry state output for json context
bridge json fdb show is printing an incorrect / non-machine readable
value, when using -j (json output) we are expecting machine readable
data that shouldn't require special handling/parsing.

$ bridge -j fdb show | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
	"master": "br0",
	"mac": "56:23:28:4f:4f:e5",
	"flags": [],
	"ifname": "vx0",
	"state": "state=0x80"  <<<<<<<<< with the patch: "state": "0x80"
    }
]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-29 18:08:46 -07:00
Briana Oursler 41a9e38469 tc: Add space after format specifier
Add space after format specifier in print_string call. Fixes broken
qdisc tests within tdc testing suite. Per suggestion from Petr Machata,
remove a space and change spacing in tc/q_event.c to complete the fix.

Tested fix in tdc using:
./tdc.py -c qdisc

All qdisc RED tests return ok.

Fixes: d0e450438571("tc: q_red: Add support for qevents "mark" and "early_drop")
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-29 17:03:46 +00:00
Anton Danilov 65c0c4d21b bridge: fdb: the 'dynamic' option in the show/get commands
In most of cases a user wants to see only the dynamic mac addresses
in the fdb output. But currently the 'fdb show' displays tons of
various self entries, those only waste the output without any useful
goal.

New option 'dynamic' for 'show' and 'get' commands forces display
only relevant records.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:41:39 -07:00
Matthieu Baerts 3a53ff7e58 mptcp: show all endpoints when no ID is specified
According to 'ip mptcp help', 'endpoint show' can accept no argument:

  ip mptcp endpoint show [ id ID ]

It makes sense to print all endpoints when no filter is used.

So here if the following command is used, all endpoints are printed:

  ip mptcp endpoint show

Same as:

  ip mptcp endpoint

Fixes: 7e0767cd ("add support for mptcp netlink interface")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:39:58 -07:00
David Ahern 1ca65af1c5 Merge branch 'devlink-port-health' into next
Moshe Shemesh  says:

====================

Implement commands for interaction with per-port devlink health
reporters. To do this, adapt devlink-health for usage of port handles
with any existing devlink-health subcommands. Add devlink-port health
subcommand as an alias for devlink-health.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:34:07 +00:00
Vladyslav Tarasiuk 1fe8c44bd9 devlink: Update devlink-health and devlink-port manpages
Describe support for per-port reporters in devlink-health and
devlink-port commands.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:37 +00:00
Vladyslav Tarasiuk 211c8d6ca9 devlink: Add devlink port health command
Add devlink port health show subcommand which displays information about
specified port reporter or all present port reporters as in the example.
Device and port reporters can be distinguished by a handle being used.

Make other devlink-health subcommands be aliased by devlink port health.
Refactor devlink-health commands for usage of port handles in order to
interact with port reporters.

Change devlink health show output to dump information about both device
and port reporters with correct handles.

Example:
$ devlink health show
pci/0000:00:0b.0:
  reporter fw
    state healthy error 0 recover 0 auto_dump true
  reporter fw_fatal
    state healthy error 0 recover 0 grace_period 1200000 auto_recover true auto_dump true
pci/0000:00:0b.0/1:
  reporter tx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink health show pci/0000:00:0b.0/1 reporter rx
Which is equivalent to:
$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink port health show pci/0000:00:0b.0/1 reporter rx -j --pretty
{
    "health": {
         "pci/0000:00:0b.0/1": [ {
                 "reporter": "rx",
                 "state": "healthy",
                 "error": 0,
                 "recover": 0,
                 "grace_period": 500,
                 "auto_recover": true,
                 "auto_dump": true
              } ]
    }
}

$ devlink health set pci/0000:00:0b.0/1 reporter rx grace_period 5000
Which is equivalent to:
$ devlink port health set pci/0000:00:0b.0/1 reporter rx grace_period 5000

$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 5000 auto_recover true auto_dump true

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:32 +00:00
Vladyslav Tarasiuk e533faa72e devlink: Add a possibility to print arrays of devlink port handles
Add a capability of printing port handles for arrays in non-JSON format
in devlink-health manner.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:26 +00:00
Stephen Hemminger 848b1b8e04 uapi: update bpf.h
Upstrean 5.8-rc6 changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-21 09:18:15 -07:00
Guillaume Nault 4735df15a2 testsuite: Add tests for bareudp tunnels
Test the plain MPLS (unicast and multicast) and IP (v4 and v6) modes.
Also test the multiproto option for MPLS and for IP.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:30:55 -07:00
Anton Danilov 8f5a602f7a misc: make the pattern matching case-insensitive
To improve the usability better use case-insensitive pattern-matching
in ifstat, nstat and ss tools.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:29:55 -07:00
Jamie Gloudon 66702fb9ba tc/m_estimator: Print proper value for estimator interval in raw.
While looking at the estimator code, I noticed an incorrect interval
number printed in raw for the handles. This patch fixes the formatting.

Before patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=4294967294 e=2]

After patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=-2 e=2]

Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:25:56 -07:00
David Ahern 7b6361bf61 Merge branch 'tc-qevent-block' into next
Petr Machata  says:

====================

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well:

    # ip link add up type dummy
    # tc qdisc add dev dummy1 root handle 1: \
         red min 30000 max 60000 avpkt 1000 qevent early_drop block 100
    # tc filter add block 100 pref 1234 handle 102 matchall action drop
    # tc filter show block 100
    Cannot find block "100"

This patchset fixes this issue:

    # tc filter show block 100
    filter protocol all pref 1234 matchall chain 0
    filter protocol all pref 1234 matchall chain 0 handle 0x66
      not_in_hw
            action order 1: gact action drop
             random type none pass val 0
             index 2 ref 1 bind 1

In patch #1, the helpers and necessary infrastructure is introduced,
including a new qdisc_util callback that implements sniffing out bound
blocks in a given qdisc.

In patch #2, RED implements the new callback.

v3:
- Patch #1:
    - Do not pass &ctx->found directly to has_block. Do it through a
      helper variable, so that the callee does not overwrite the result
      already stored in ctx->found.

v2:
- Patch #1:
    - In tc_qdisc_block_exists_cb(), do not initialize 'q'.
    - Propagate upwards errors from q->has_block.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:36:41 +00:00
Petr Machata 02dce2fdce tc: q_red: Implement has_block for RED
In order for "tc filter show block X" to find a given block, implement the
has_block callback.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:49 +00:00
Petr Machata af0e036c09 tc: Look for blocks in qevents
When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well.

In order to support that, extend struct qdisc_util with a new callback,
has_block. That should report whether, give the attributes in TCA_OPTIONS,
a blocks with a given number is bound to a qevent. In
tc_qdisc_block_exists_cb(), invoke that callback when set.

Add a helper to the tc_qevent module that walks the list of qevents and
looks for a given block. This is meant to be used by the individual qdiscs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:02 +00:00
Paolo Abeni 9c3be2c0ee ss: mptcp: add msk diag interface support
This implement support for MPTCP sockets type, comprising
extended socket info. Note that we need to add an extended
attribute carrying the actual protocol number to the diag
request.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:57:36 +00:00
David Ahern beaf281cff Update kernel headers
Update kernel headers to commit:
    81adcd65b685 ("ksz884x: switch from 'pci_' to 'dma_' API")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:56:53 +00:00
David Ahern b78c480532 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:52:43 +00:00
Eyal Birger f33a871b80 ip xfrm: policy: support policies with IF_ID in get/delete/deleteall
The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).

Add support for getting/deleting policies with this attribute.

For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Eyal Birger ee93c1107f ip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies
In commit aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
I added the ability to set/print the xfrm interface ID without updating
the man page.

Fixes: aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Hoang Huu Le ca75a86337 tipc: fixed a compile warning in tipc/link.c
Fixes: 5027f233e3 ("tipc: add link broadcast get")
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:43:32 -07:00
Julien Fortin 8fc09aff8d bridge: fdb get: add missing json init (new_json_obj)
'bridge fdb get' has json support but the json object is never initialized

before patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0
56:23:28:4f:4f:e5 dev vx0 master br0 permanent
$

after patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0 | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
        "master": "br0",
        "mac": "56:23:28:4f:4f:e5",
        "flags": [],
        "ifname": "vx0",
        "state": "permanent"
    }
]
$

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:41:42 -07:00
Tony Ambardar 650591a7a7 configure: support ipset version 7 with kernel version 5
The configure script checks for ipset v6 availability but doesn't test
for v7, which is backward compatible and used on kernel v5.x systems.
Update the script to test for both ipset versions. Without this change,
the tc ematch function em_ipset will be disabled.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:48:02 -07:00
Andrea Claudi a8d6f51c84 ip address: remove useless include
utils.h is included two times in ipaddress.c, there is no need for that.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:47:28 -07:00
Stephen Hemminger 0689785782 genl: use <> for system includes
Be consistent about local versus system headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:41:24 -07:00
Stephen Hemminger a12b203c78 rtacct: drop unused header 2020-07-08 08:40:20 -07:00
Stephen Hemminger d44bcd2fbf iplink_bareudp: use common include syntax
Follow the precedent of other parts of iproute2 follow the example of:
  Standard libc headers
  Linux headers

  Iproute2 support headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:38:58 -07:00
Louis Peens 7c8d7848c7 devlink: add 'disk' to 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:57 -07:00
Ido Schimmel 2d4c3f65e2 devlink: Document zero policer identifier
When setting a policer to a trap group, a value of "0" will unbind the
currently bound policer from the group.

The behavior is intentional and tested in kernel selftests, so document
it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:24 -07:00
Guillaume Nault eb09a15c12 tc: flower: support multiple MPLS LSE match
Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:12:43 -07:00
Guillaume Nault a6c5c952ab ip link: initial support for bareudp devices
Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:11:05 -07:00
Dmitry Yakunin 8f1cd119b3 lib: fix checking of returned file handle size for cgroup
Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:05:54 -07:00
Sorah Fukumori 9e5d246877 ip fou: respect preferred_family for IPv6
ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:03:09 -07:00
Anton Danilov d80a05b795 tc: improve the qdisc show command
Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Stephen Hemminger 085622b1f5 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Bjarni Ingi Gislason 860a5d12d5 devlint-health.8: use a single-font macro for a single argument
Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:47 -07:00
Bjarni Ingi Gislason f9bc806c9d devlink-dev.8: use a single-font macro for one argument
Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:38 -07:00
Bjarni Ingi Gislason 472fb39d55 devlink.8: Use a single-font macro for a single argument
Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:34 -07:00
Bjarni Ingi Gislason 57cfcc62af man8/bridge.8: fix misuse of two-fonts macros
Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:28 -07:00
Bjarni Ingi Gislason 2df0dc2437 libnetlink.3: display section numbers in roman font, not boldface
Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-07-06 10:46:23 -07:00
David Ahern a5c9d01c5c Merge branch 'rdma-raw-format-dumps' into next
Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:49 +00:00
Maor Gottlieb e2bbf737e6 rdma: Add support to get MR in raw format
Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:37 +00:00
Maor Gottlieb 94323e9611 rdma: Add support to get CQ in raw format
Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:33 +00:00
Maor Gottlieb 7c01e0fc9c rdma: Add support to get QP in raw format
Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:00 +00:00
Maor Gottlieb 8f23492823 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:10:50 +00:00
David Ahern 79ea01927c Merge branch 'tc-qevents' into next
Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:45:48 +00:00
Petr Machata d0e4504385 tc: q_red: Add support for qevents "mark" and "early_drop"
The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:49 +00:00
Petr Machata 3cf51fb3c8 man: tc: Describe qevents
Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:45 +00:00
Petr Machata 01bb0bcd00 tc: Add helpers to support qevent handling
Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:27 +00:00
Po Liu bc4d9f982f action police: make 'mtu' could be set independently in police action
Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:34:04 +00:00
Po Liu 3c5570706b action police: change the print message quotes style
Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:33:59 +00:00
Alexandre Cassen 30f3beea0d add support to keepalived rtm_protocol
Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:03:45 +00:00
David Ahern 482e463d6c Merge branch 'devlink-port-mac-addr' into next
Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:53 +00:00
Parav Pandit 4dca81e9a8 devlink: Support setting port function hardware address
Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:32 +00:00
Parav Pandit b3adafd154 devlink: Support querying hardware address of port function
Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:22 +00:00
Parav Pandit 2de449df19 devlink: Move devlink port code at start to reuse
To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:48:34 +00:00
David Ahern e17466e484 Update kernel headers
Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:33:15 +00:00
Stephen Hemminger 2f31d12a25 man/tc: remove obsolete reference to ipchains
It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-24 12:13:46 -07:00
Roi Dayan 473d18e219 ip address: Fix loop initial declarations are only allowed in C99
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 15:05:20 -07:00
Stephen Hemminger 3d66d83d25 uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:52:38 -07:00
Ido Schimmel abda1e9d2b devlink: Add 'mirror' trap action
Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Ido Schimmel fd71244a20 devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Stephen Hemminger 12fafa27c7 devlink: update include files
Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:49:46 -07:00
Roopa Prabhu 468f787f64 bridge: support for nexthop id in fdb entries
This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:58 +00:00
Roopa Prabhu a56d17463c ipnexthop: support for fdb nexthops
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:29 +00:00
David Ahern 5f6f17db3b Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-08 14:40:54 +00:00
Stephen Hemminger e4932ae6b3 uapi: update headers
Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-05 08:36:54 -07:00
Stephen Hemminger 0a5dbbeddb Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2020-06-05 08:33:29 -07:00
Stephen Hemminger 1bfa3b3f66 v5.7.0 2020-06-02 20:35:00 -07:00
Donald Sharp 2c78aba2fb nexthop: Fix Deletion display
Actually display that deletions are happening
when monitoring nexthops.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:08:46 -07:00
Stephen Hemminger 6facadcfb6 uapi: fix comment in xfrm.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-01 08:07:02 -07:00
Ian K. Coolidge 5413a735a6 iproute2: ip addr: Add support for setting 'optimistic'
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:33 +00:00
Ian K. Coolidge 9d59c86e57 iproute2: ip addr: Organize flag properties structurally
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:22 +00:00
Roman Mashak bd4b8c632e tc: report time an action was first used
Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 22:51:19 +00:00
Andrea Claudi 354efaec38 bpf: Fixes a snprintf truncation warning
gcc v9.3.1 reports:

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this simply checking snprintf return code and properly handling the error.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
Andrea Claudi 358abfe004 Revert "bpf: replace snprintf with asprintf when dealing with long buffers"
This reverts commit c0325b0638.
It introduces a segfault in bpf_make_custom_path() when custom pinning is used.

This happens because asprintf allocates exactly the space needed to hold a
string in the buffer passed as its first argument, but if this buffer is later
used in strcat() or similar we have a buffer overrun.

As the aim of commit c0325b0638 is simply to fix a compiler warning, it
seems safe and reasonable to revert it.

Fixes: c0325b0638 ("bpf: replace snprintf with asprintf when dealing with long buffers")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-27 15:05:25 -07:00
David Ahern e50290e687 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:08:27 +00:00
Tuong Lien 9a25abde3a tipc: enable printing of broadcast rcv link stats
This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:07:22 +00:00
Alexander Aring 9f91f1b7b8 lwtunnel: add support for rpl segment routing
This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 00:03:17 +00:00
Roman Mashak db35e411ec tc: action: fix time values output in JSON format
Report tcf_t values in seconds, not jiffies, in JSON format as it is now
for stdout.

v2: use PRINT_ANY, drop the useless casts and fix the style (Stephen Hemminger)

Fixes: 2704bd6255 ("tc: jsonify actions core")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 21:19:04 -07:00
Stephen Hemminger 1c7aa12104 uapi: update to bpf.h
Part of the zero-length array changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:31:54 -07:00
Eric Dumazet d7c67a6ed4 utils: remove trailing zeros in print_time() and print_time64()
Before :

tc qd sh dev eth1

... refill_delay 40.0ms timer_slack 10.000us horizon 10.000s

After :
... refill_delay 40ms timer_slack 10us horizon 10s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:30 -07:00
Paul Blakey 924c43778a man: tc-ct.8: Add manual page for ct tc action
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-19 14:30:24 -07:00
Maciej Fijalkowski 42796dcd36 tc: mqprio: reject queues count/offset pair count higher than num_tc
Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-18 14:57:15 +00:00
Dmitry Yakunin 7bd9188581 ss: add checks for bc filter support
As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:38 +00:00
Dmitry Yakunin 14f4bda590 ss: add support for cgroup v2 information and filtering
This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:35 +00:00
Dmitry Yakunin d5e6ee0dac ss: introduce cgroup2 cache and helper functions
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:04 +00:00
Po Liu 965a5f6a1b iproute2-next: add gate action man page
This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:20:12 +00:00
Po Liu 07d5ee70b5 iproute2-next:tc:action: add a gate control action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:19:46 +00:00
David Ahern 0e9b227e2d Update kernel headers and import tc_gate.h
Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:18:15 +00:00
Eric Dumazet 0ecb90b33c tc: fq: fix two issues
My latest patch missed the fact that this file got JSON support.

Also fixes a spelling error added during JSON change.

Fixes: be9ca9d541 ("tc: fq: add timer_slack parameter")
Fixes: d15e2bfc04 ("tc: fq: add support for JSON output")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:27:26 -07:00
Stephen Hemminger 8142c76232 ss: update to bw print
Display kilobit with the standard suffix.
Add comment to describe where data rate suffixes come from.
Add support for terrabit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 10:18:58 -07:00
Jakub Kicinski ec04b6fc24 devlink: support kernel-side snapshot id allocation
Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 17:10:27 +00:00
Eric Dumazet e133fa9c73 ss: add support for Gbit speeds in sprint_bw()
Also use 'g' specifier instead of 'f' to remove trailing zeros,
and increase precision.

Examples of output :
 Before        After
 8.0Kbps       8Kbps
 9.9Mbps       9.92Mbps
 55001Mbps     55Gbps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-05 09:50:22 -07:00
David Ahern 8c109059b5 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:49:38 +00:00
David Ahern c1b21f5286 Import rpl.h and rpl_iptunnel.h uapi headers
Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:23:14 +00:00
Davide Caratti 3175bca718 tc: full JSON support for 'bpf' filter
example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:19:06 +00:00
David Ahern ae57e82da0 Update kernel headers
Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:11:22 +00:00
Benjamin Poirier 0501fe734f Replace open-coded instances of print_nl()
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier e0c457b1a5 bridge: Align output columns
Use fixed column widths to improve readability.

Before:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port    vlan-id tunnel-id
vx0      2       2
         1010-1020       1010-1020
         1030    65556
vx-longname      2       2

After:
root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
port              vlan-id    tunnel-id
vx0               2          2
                  1010-1020  1010-1020
                  1030       65556
vx-longname       2          2

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 5a07a5df5a json_print: Return number of characters printed
When outputting in normal mode, forward the return value from
color_fprintf().

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier b262a9becb bridge: Fix output with empty vlan lists
Consider this configuration:

ip link add br0 type bridge
ip link add vx0 type vxlan dstport 4789 external
ip link set dev vx0 master br0
bridge vlan del vid 1 dev vx0
ip link add vx1 type vxlan dstport 4790 external
ip link set dev vx1 master br0

	root@vsid:/src/iproute2# ./bridge/bridge vlan
	port    vlan-id
	br0      1 PVID Egress Untagged

	vx0     None
	vx1      1 PVID Egress Untagged

	root@vsid:/src/iproute2#

Note the useless and inconsistent empty lines.

	root@vsid:/src/iproute2# ./bridge/bridge vlan tunnelshow
	port    vlan-id tunnel-id
	br0
	vx0     None
	vx1

What's the difference between "None" and ""?

	root@vsid:/src/iproute2# ./bridge/bridge -j -p vlan tunnelshow
	[ {
		"ifname": "br0",
		"tunnels": [ ]
	    },{
		"ifname": "vx1",
		"tunnels": [ ]
	    } ]

Why does vx0 appear in normal output and not json output?
Why output an empty list for br0 and vx1?

Fix these inconsistencies and avoid outputting entries with no values. This
makes the behavior consistent with other iproute2 commands, for example
`ip -6 addr`: if an interface doesn't have any ipv6 addresses, it is not
part of the listing.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 91b1b49ed3 bridge: Fix typo
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Benjamin Poirier 594b2d7799 bridge: Use consistent column names in vlan output
Fix singular vs plural. Add a hyphen to clarify that each of those are
single fields.

Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-05-04 17:13:53 -07:00
Xin Long 4e578c78fe tc: f_flower: add options support for erspan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:27 +00:00
Xin Long 93c8d5f72f tc: f_flower: add options support for vxlan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:22 +00:00
Xin Long 668fd9b25d tc: m_tunnel_key: add options support for erpsan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:18 +00:00
Xin Long f72c3ad00f tc: m_tunnel_key: add options support for vxlan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:14 +00:00
Xin Long 39fa047938 iproute_lwtunnel: add options support for erspan metadata
This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:09 +00:00
Xin Long b1bc0f3892 iproute_lwtunnel: add options support for vxlan metadata
This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:03 +00:00
Xin Long ca7614d4c6 iproute_lwtunnel: add options support for geneve metadata
This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.

Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.

With this patch, users can add and dump geneve options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up; ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
  # ip -n b addr add 1.1.1.1/24 dev geneve1
  # ip -n b link set geneve1 up
  # ip -n b route add 2.1.1.0/24 dev geneve1
  # ip -n a link add geneve1 type geneve external
  # ip -n a addr add 2.1.1.1/24 dev geneve1
  # ip -n a link set geneve1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
    1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
    dst 10.1.0.2 dev geneve1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print class and type as uint and print data as hex string.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:31:58 +00:00
Jacob Keller 7ae84fedcb devlink: add support for DEVLINK_CMD_REGION_NEW
Add support to request that a new snapshot be taken immediately for
a devlink region. To avoid confusion, the desired snapshot id must be
provided.

Note that if a region does not support snapshots on demand, the kernel
will reject the request with -EOPNOTSUP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:31:55 -07:00
Stephen Hemminger 2b93f66863 uapi: update bpf.h
Minor spelling in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-29 22:30:48 -07:00
Petr Machata 081d6c310d tc: pedit: Support JSON dumping
The action pedit does not currently support dumping to JSON. Convert
print_pedit() to the print_* family of functions so that dumping is correct
both in plain and JSON mode. In plain mode, the output is character for
character the same as it was before. In JSON mode, this is an example dump:

$ tc filter add dev dummy0 ingress prio 125 flower \
         action pedit ex munge udp dport set 12345 \
	                 munge ip ttl add 1        \
			 munge offset 10 u8 clear
$ tc -j filter show dev dummy0 ingress | jq
[
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0
  },
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0,
    "options": {
      "handle": 1,
      "keys": {},
      "not_in_hw": true,
      "actions": [
        {
          "order": 1,
          "kind": "pedit",
          "control_action": {
            "type": "pass"
          },
          "nkeys": 3,
          "index": 1,
          "ref": 1,
          "bind": 1,
          "keys": [
            {
              "htype": "udp",
              "offset": 0,
              "cmd": "set",
              "val": "3039",
              "mask": "ffff0000"
            },
            {
              "htype": "ipv4",
              "offset": 8,
              "cmd": "add",
              "val": "1000000",
              "mask": "ffffff"
            },
            {
              "htype": "network",
              "offset": 8,
              "cmd": "set",
              "val": "0",
              "mask": "ffff00ff"
            }
          ]
        }
      ]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:43:23 +00:00
William Tu 846b6b2da8 erspan: Add type I version 0 support.
The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.

To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
	local 172.16.1.100 remote 172.16.1.200 \
	erspan_ver 0

CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:40:10 +00:00
Paolo Abeni 0c42c6b130 man: ip.8: add reference to mptcp man-page
While at it, additionally fix a mandoc warning in mptcp.8

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 17:36:14 +00:00
David Ahern d38f2a10dd Merge branch 'mptcp' into next
Paolo Abeni  says:

====================

This introduces support for the MPTCP PM netlink interface, allowing admins
to configure several aspects of the MPTCP path manager. The subcommand is
documented with a newly added man-page.

This series also includes support for MPTCP subflow diag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:50:25 +00:00
Paolo Abeni 2d8b5fe93e man: mptcp man page
describe the mptcp subcommands implemented so far.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:47:45 +00:00
Davide Caratti 712fdd98c0 ss: allow dumping MPTCP subflow information
[root@f31 packetdrill]# ss -tni

 ESTAB    0        0           192.168.82.247:8080           192.0.2.1:35273
          cubic wscale:7,8 [...] tcp-ulp-mptcp flags:Mec token:0000(id:0)/5f856c60(id:0) seq:b810457db34209a5 sfseq:1 ssnoff:0 maplen:190

Additionally extends ss manpage to describe the new entry layout.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:44:55 +00:00
Paolo Abeni 7e0767cd86 add support for mptcp netlink interface
Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits

Examples:
1. Allows multiple subflows per MPTCP connection
   $ ip mptcp limits set subflows 2

2. Accept ADD_ADDR announcement from the peer (server):
   $ ip mptcp limits set add_addr_accepted 2

3. Add a ipv4 address to be annunced for backup subflows:
   $ ip mptcp endpoint add 10.99.1.2 signal backup

4. Add an ipv6 address used as source for additional subflows:
   $ ip mptcp endpoint add 2001::2 subflow

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:43:18 +00:00
David Ahern 02ade5a8ea Update kernel headers and import mptcp.h
Update kernel headers to commit
    790ab249b55d ("net: ethernet: fec: Prevent MII event after MII_SPEED write")

and import mptcp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:41:39 +00:00
Eric Dumazet be9ca9d541 tc: fq: add timer_slack parameter
Commit 583396f4ca4d ("net_sched: sch_fq: enable use of hrtimer slack")
added TCA_FQ_TIMER_SLACK parameter, with a default value of 10 usec.

Add the corresponding tc support to get/set this tunable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Eric Dumazet 7868f802e2 tc: fq_codel: add drop_batch parameter
Commit 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
added the new TCA_FQ_CODEL_DROP_BATCH_SIZE parameter, set by default to 64.

Add to tc command the ability to get/set the drop_batch

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:56:42 -07:00
Xin Long d27fc6390c xfrm: also check for ipv6 state in xfrm_state_keep
As commit f9d696cf41 ("xfrm: not try to delete ipcomp states when using
deleteall") does, this patch is to fix the same issue for ip6 state where
xsinfo->id.proto == IPPROTO_IPV6.

  # ip xfrm state add src 2000::1 dst 2000::2 spi 0x1000 \
    proto comp comp deflate mode tunnel sel src 2000::1 dst \
    2000::2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

Note that the xsinfo->proto in common states can never be IPPROTO_IPV6.

Fixes: f9d696cf41 ("xfrm: not try to delete ipcomp states when using deleteall")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:37 -07:00
Jiri Pirko 0149dabf2a tc: m_action: check cookie hex string len
Check the cookie hex string len is dividable by 2 as the valid hex
string always should be.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-27 14:50:27 -07:00
David Ahern 60f1075c21 Merge branch 'macsec-offload' into next
Igor Russkikh  says:

====================

From: Mark Starovoytov <mstarovoitov@marvell.com>

This series adds support for selecting the offloading mode of a MACsec
interface at link creation time.
Available modes are for now 'off', 'phy' and 'mac', 'off' being the default
when an interface is created.

First patch adds support for MAC offloading.

Last patch allows a user to change the offloading mode at runtime
through a new attribute, `ip link add link ... offload`:

  # ip link add link enp1s0 type macsec encrypt on offload off
  # ip link add link enp1s0 type macsec encrypt on offload phy
  # ip link add link enp1s0 type macsec encrypt on offload mac

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:20 +00:00
Mark Starovoytov bcbeb35ca4 macsec: add support for specifying offload at link add time
This patch adds support for configuring offload mode upon MACsec
device creation.

If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).

Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:03 +00:00
Mark Starovoytov 998534c99e macsec: add support for MAC offload
This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:31:37 +00:00
Stephen Hemminger b831c5ffcc bridge: man page spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:48:57 -07:00
Bastien Roucariès 8d5d91fd58 State of bridge STP port are now case insensitive
Improve use experience

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 498883a00f Document root_block option
Root_block is also called root port guard, document it.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 19bbebc459 Better documentation of BDPU guard
Document that guard disable the port and how to reenable it

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 420febf961 Document BPDU filter option
Disabled state is also BPDU filter

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 1cad8f8d78 Improve hairpin mode description
Mention VEPA and reflective relay.

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Bastien Roucariès 706f7d35e2 Better documentation of mcast_to_unicast option
This option is useful for Wifi bridge but need some tweak.

Document it from kernel patches documentation

Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:45:37 -07:00
Brian Norris 8b9d5728c1 man: replace $(NETNS_ETC_DIR) and $(NETNS_RUN_DIR) in ip-netns(8)
These can be configured to different paths. Reflect that in the
generated documentation.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Brian Norris 48e05899d0 man: add ip-netns(8) as generation target
Prepare for adding new variable substitutions. Unify the sed rules while
we're at it, since there's no need to write this out 4 times.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:39:27 -07:00
Benjamin Lee f03ad792f3 tc: fq_codel: fix class stat deficit is signed int
The fq_codel class stat deficit is a signed int.  This is a regression
from when JSON output was added.

Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:34:56 -07:00
Odin Ugedal 14d2df8874 q_cake: properly print memlimit
Load memlimit so that it will be printed if it isn't set to zero.

Also add a space to properly print it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal 6f883f168c q_cake: Make fwmark uint instead of int
This will help avoid overflow, since setting it to 0xffffffff would
result in -1 when converted to integer, resulting in being "-1", setting
the fwmark to 0x00.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:33:15 -07:00
Odin Ugedal e07c57e94e tc_util: detect overflow in get_size
This detects overflow during parsing of value using get_size:

eg. running:

$ tc qdisc add dev lo root cake memlimit 11gb

currently gives a memlimit of "3072Mb", while with this patch it errors
with 'illegal value for "memlimit": "11gb"', since memlinit is an
unsigned integer.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-20 09:31:01 -07:00
Eran Ben Elisha 4aa0c9c9f8 devlink: Add devlink health auto_dump command support
Add support for configuring auto_dump attribute per reporter.
With this attribute, one can indicate whether the devlink kernel core
should execute automatic dump on error.

The change will be reflected in show, set and man commands.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:27:13 +00:00
David Ahern 59ba1dd011 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:26:27 +00:00
Benjamin Lee fe821d64e6 man: tc-htb.8: fix class prio is not mandatory
Fix description for htb class prio parameter to indicate it's not
mandatory.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee 6ecd0198c0 man: tc-htb.8: add missing class parameter quantum
Add description for htb class parameter quantum.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Benjamin Lee d8d59421b6 man: tc-htb.8: add missing qdisc parameter r2q
Add description for htb qdisc parameter r2q.

Signed-off-by: Benjamin Lee <ben@b1c1l1.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:04:00 -07:00
Petr Machata 20927e0525 ip: link_gre: Do not send ERSPAN attributes to GRE tunnels
In the commit referenced below, ip link started sending ERSPAN-specific
attributes even for GRE and gretap tunnels. Fix by more carefully
distinguishing between the GRE/tap and ERSPAN modes. Do not show
ERSPAN-related help in GRE/tap mode, likewise do not accept ERSPAN
arguments, or send ERSPAN attributes.

Fixes: 83c543af87 ("erspan: set erspan_ver to 1 by default")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 14:02:54 -07:00
Jiri Pirko c4dfddccef devlink: fix JSON output of mon command
The current JSON output of mon command is broken. Fix it and make sure
that the output is a valid JSON. Also, handle SIGINT gracefully to allow
to end the JSON properly.

Example:
$ devlink mon -j -p
{
    "mon": [ {
            "command": "new",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "eth",
                    "netdev": "eth0",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "new",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "port": {
                "netdevsim/netdevsim10/0": {
                    "type": "notset",
                    "flavour": "physical",
                    "port": 1
                }
            }
        },{
            "command": "del",
            "dev": {
                "netdevsim/netdevsim10": {}
            }
        } ]
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-13 13:59:12 -07:00
David Ahern 5c762c3bc2 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:42:33 +00:00
Petr Machata 74c8610f3b man: tc-pedit: Drop the claim that pedit ex is only for IPv4
This sentence predates addition of extended pedit for IPv6 packets.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:59 +00:00
Petr Machata f91f788c70 man: tc-pedit: Add examples for dsfield and retain
Describe a way to update just the DSCP and just the ECN part of the
dsfield. That is useful on its own, but also it shows how retain works.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Petr Machata 2d9a8dc439 tc: p_ip6: Support pedit of IPv6 dsfield
Support keywords dsfield, traffic_class and tos in the IPv6 context.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:58 +00:00
Jiri Pirko 1c3ed78001 devlink: remove unused "jw" field
This field is not used. Remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-09 14:39:28 +00:00
Stephen Hemminger 27136cab54 man/tc-actions: fix formatting
Fix error from make check.
n-old.tmac: <standard input>: line 86: 'R' is a string (producing the registered sign), not a macro.
Error in tc-actions.8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:07:54 -07:00
Jiri Pirko e00248d296 man: add man page for devlink dpipe
Add simple man page for devlink dpipe.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:06:00 -07:00
Jiri Pirko 885f4b0d7a devlink: remove "dev" object sub help messages
Remove duplicate sub help messages for "dev" object and have them all
show help message for "dev".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b2522187d8 devlink: Fix help message for dpipe
Have one help message for all dpipe commands, as it is done for the rest
of the devlink object. Possible and required options to the help.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 342f462efa devlink: rename dpipe_counters_enable struct field to dpipe_counters_enabled
To be consistent with the rest of the code and name of netlink
attribute, rename the dpipe_counters_enable struct fielt
to dpipe_counters_enabled.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 192e7b3ffa devlink: Add alias "counters_enabled" for "counters" option
To be consistent with netlink attribute name and also with the
"dpipe table show" output, add "counters_enabled" for "counters" in
"dpipe table set" command.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 0b1875cdc6 devlink: fix encap mode manupulation
DEVLINK_ATTR_ESWITCH_ENCAP_MODE netlink attribute carries enum. But the
code assumes bool value. Fix this by treating the encap mode in the same
way as other eswitch mode attributes, switching from "enable"/"disable"
to "basic"/"none", according to the enum. Maintain the backward
compatibility to allow user to pass "enable"/"disable" too. Also to be
in-sync with the rest of the "mode" commands, rename to "encap-mode".
Adjust the help and man page accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko 90ce848b05 devlink: Fix help and man of "devlink health set" command
Fix the help and man page of "devlink health set" command to be aligned
with the rest of helps and man pages.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Jiri Pirko b37a863cb2 devlink: remove custom bool command line options parsing
Change the code that is doing custom processing of boolean command line
options to use dl_argv_bool(). Extend strtobool() to accept
"enable"/"disable" to maintain current behavior.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-04-06 10:00:32 -07:00
Stephen Hemminger 5d10f24fdd Merge ../iproute2-next 2020-04-06 10:00:12 -07:00
Jiri Pirko 0827cc53f3 tc: show used HW stats types
If kernel provides the attribute, show the used HW stats types.

Example:

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:30:04 +00:00
Ido Schimmel 0141ca64b8 bash-completion: devlink: Extend bash-completion for new commands
Extend bash-completion for two new commands:

devlink trap policer set DEV policer POLICER [ rate RATE ] [ burst BURST ]
devlink trap policer show DEV policer POLICER

And for "policer" / "nopolicer" parameters in existing command:

devlink trap group set DEV group GROUP [ action { trap | drop } ]
                       [ policer POLICER ] [ nopolicer ]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:13 +00:00
Ido Schimmel 02a2a6683f devlink: Add ability to bind policer to trap group
Add ability to associate a policer with a trap group. The policer can be
unbound by using the 'nopolicer' keyword. In which case, the value
encoded in the 'DEVLINK_ATTR_TRAP_POLICER_ID' attribute will be '0'.
This is consistent with ip-link 'nomaster' keyword and the 'IFLA_MASTER'
attribute.

Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops policer 2
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "policer": 2
            } ]
    }
}

# devlink trap group set netdevsim/netdevsim10 group l3_drops nopolicer
# devlink -jp trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:25:07 +00:00
Ido Schimmel a66af55693 devlink: Add devlink trap policer set and show commands
The trap policer set command allows the user to set the parameters of
the packet trap policer, such as rate and burst size. Example:

# devlink trap policer set netdevsim/netdevsim10 policer 1 rate 1000 burst 32

The trap policer show command allows the user to get the current
parameters of an individual policer or a dump of all policers in case
one is not specified. When '-s' is specified the policer's statistics
are shown. Example:

# devlink -jps trap policer show netdevsim/netdevsim10 policer 1
{
    "trap_policer": {
        "netdevsim/netdevsim10": [ {
                "policer": 1,
                "rate": 1000,
                "burst": 32,
                "stats": {
                    "rx": {
                        "dropped": 53
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:24:35 +00:00
David Ahern ce9191ffee Update kernel headers
Update kernel headers to commit:
    7f80ccfe9968 ("net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-31 23:23:28 +00:00
Stephen Hemminger 29981db0e0 v5.6.0 2020-03-30 08:06:08 -07:00
Andrea Claudi 0641bed8a3 man: bridge.8: fix bridge link show description
When multiple bridges are present, 'bridge link show' diplays ports
for all bridges. Make this clear in the command description, and
point out the user to the ip command to display ports for a specific
bridge.

Reported-by: Marc Muehlfeld <mmuehlfe@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-30 08:01:02 -07:00
Danielle Ratson 5a3faf2949 bash-completion: devlink: add bash-completion function
Add function for command completion for devlink in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:46:09 +00:00
Petr Machata 6c10fdca70 tc: q_red: Support 'nodrop' flag
Recognize the new configuration option of the RED Qdisc, "nodrop". Add
support for passing flags through TCA_RED_FLAGS, and use it when passing
TC_RED_NODROP flag.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:45:37 +00:00
Jakub Kicinski 1c74c20cbe tc: m_action: rename hw stats type uAPI
Follow the kernel rename to shorten the identifiers.
Rename hw_stats_type to hw_stats.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:42:33 +00:00
David Ahern 1ff1edb6d5 Update kernel headers
Update kernel headers to commit:
    cd556e40fdf3 ("devlink: expand the devlink-info documentation")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-25 16:41:49 +00:00
Stephen Hemminger adcab267b8 uapi: update linux/in.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-20 11:05:26 -07:00
Jiri Pirko 341903dd3b tc: m_action: introduce support for hw stats type
Introduce support for per-action hw stats type config.

This patch allows user to specify one of the following types of HW
stats for added action:
immediate - queried during dump time
delayed - polled from HW periodically or sent by HW in async manner
disabled - no stats needed

Note that if "hw_stats" option is not passed, user does not care about
the type, just expects any type of stats.

Examples:
$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats disabled
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 7 sec used 2 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        hw_stats disabled

$ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats immediate
$ tc -s filter show dev enp0s16np28 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  skip_sw
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 11 sec used 4 sec
        Action statistics:
        Sent 102 bytes 1 pkt (dropped 1, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 102 bytes 1 pkt
        backlog 0b 0p requeues 0
        hw_stats immediate

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:18:44 +00:00
David Ahern 25091a761f Update kernel headers
Update kernel headers to commit:
    3fd177cb2b47 ("net: stmmac: dwmac_lib: remove unnecessary checks in dwmac_dma_reset()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-20 16:17:55 +00:00
Guillaume Nault 72cc0bafb9 iproute2: fix MPLS label parsing
The initial value of "label" in parse_mpls() is 0xffffffff. Therefore
we should test for this value, and not 0, to detect if a label has been
provided. The "!label" test not only fails to detect a missing label
parameter, it also prevents the use of the IPv4 explicit NULL label,
which actually equals 0.

Reproducer:
  $ ip link add name dm0 type dummy
  $ tc qdisc add dev dm0 ingress

  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: act_mpls: Label is required for MPLS push.
  We have an error talking to the kernel
  --> Filter was pushed to the kernel, where it got rejected.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  Error: argument "label" is required
  --> Label 0 was rejected by iproute2.

Expected result:
  $ tc filter add dev dm0 parent ffff: matchall action mpls push
  Error: argument "label" is required
  --> Filter was directly rejected by iproute2.

  $ tc filter add dev dm0 parent ffff: matchall action mpls push label 0
  --> Filter is accepted.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:56:53 -07:00
Andrea Claudi d9b868436a nexthop: fix error reporting in filter dump
nh_dump_filter is missing a return value check in two cases.
Fix this simply adding an assignment to the proper variable.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-03-15 09:54:42 -07:00
Leslie Monis 94c4ce822c Revert "tc: pie: change maximum integer value of tc_pie_xstats->prob"
This reverts commit 92cfe3260e.

Kernel commit 3f95f55eb55d ("net: sched: pie: change tc_pie_xstats->prob")
removes the need to change the maximum integer value of
tc_pie_stats->prob here.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-10 18:29:26 +00:00
Leslie Monis 92cfe3260e tc: pie: change maximum integer value of tc_pie_xstats->prob
Kernel commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows"),
changes the maximum value of tc_pie_xstats->prob from (2^64 - 1) to
(2^56 - 1).

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-09 02:46:45 +00:00
David Ahern ad19a18e42 Merge branch 'macsec-offload' into next
Antoine Tenart  says:

====================

This series adds support for selecting and reporting the offloading mode
of a MACsec interface. Available modes are for now 'off' and 'phy',
'off' being the default when an interface is created. Modes are not only
'off' and 'on' as the MACsec operations can be offloaded to multiple
kinds of specialized hardware devices, at least to PHYs and Ethernet
MACs. The later isn't currently supported in the kernel though.

The first patch adds support for reporting the offloading mode currently
selected for a given MACsec interface through the `ip macsec show`
command:

   # ip macsec show
   18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c860001 on SA 0
           0: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           0: PN 1, state on, key 01000000000000000000000000000000
->     offload: phy
   19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
       cipher suite: GCM-AES-128, using ICV length 16
       TXSC: 3e5035b67c880001 on SA 0
           1: PN 1, state on, key 00000000000000000000000000000000
       RXSC: b4969112700f0001, state on
           1: PN 1, state on, key 01000000000000000000000000000000
->     offload: off

The second patch allows an user to change the offloading mode at runtime
through a new subcommand, `ip macsec offload`:

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

If a mode isn't supported, `ip macsec offload` will report an issue
(-EOPNOTSUPP).

Giving the offloading mode when a macsec interface is created was
discussed; it is not implemented in this series. It could come later
on, when needed, as we'll still want to support updating the offloading
mode at runtime (what's implemented in this series).
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:59:35 +00:00
Antoine Tenart c15674d80d macsec: add an accessor for validate_str
This patch adds an accessor for the validate_str array, to handle future
changes adding a member.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:41 +00:00
Antoine Tenart 69166f909b man: document the ip macsec offload command
Add a description of the `ip macsec offload` command used to select the
offloading mode on a macsec interface when the underlying device
supports it.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:36 +00:00
Antoine Tenart 791bc7ee48 macsec: add support for changing the offloading mode
MacSEC can now be offloaded to specialized hardware devices. Offloading
is off by default when creating a new MACsec interface, but the mode can
be updated at runtime. This patch adds a new subcommand,
`ip macsec offload`, to allow users to select the offloading mode of a
MACsec interface. It takes the mode to switch to as an argument, which
can for now either be 'off' or 'phy':

  # ip macsec offload macsec0 phy
  # ip macsec offload macsec0 off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:57:30 +00:00
Antoine Tenart da6abdba09 macsec: report the offloading mode currently selected
This patch adds support to report the MACsec offloading mode currently
being enabled, which as of now can either be 'off' or 'phy'. This
information is reported through the `ip macsec show` command:

  # ip macsec show
  18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c860001 on SA 0
          0: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          0: PN 1, state on, key 01000000000000000000000000000000
      offload: phy
  19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
      cipher suite: GCM-AES-128, using ICV length 16
      TXSC: 3e5035b67c880001 on SA 0
          1: PN 1, state on, key 00000000000000000000000000000000
      RXSC: b4969112700f0001, state on
          1: PN 1, state on, key 01000000000000000000000000000000
      offload: off

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:56:41 +00:00
Parav Pandit a5c44b821c devlink: Introduce devlink port flavour virtual
Currently PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.

Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.

An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.

$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:48:11 +00:00
Jiri Pirko 4fe07b8146 devlink: add trap metadata type for flow action cookie
Flow action cookie has been recently added to kernel, print it out.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:46:29 +00:00
David Ahern b6b8e40bf7 Update kernel headers
Update kernel headers to commit
ef71037047b0 ("Merge branch 'act_ct-software-offload-of-established-flows-fixes'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-03-04 19:44:21 +00:00
David Ahern b6de0bf7db Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-28 22:42:49 +00:00
Stephen Hemminger b5a77cf701 uapi: update bpf.h
Updated upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:55:38 -08:00
Andrea Claudi 31824e2299 man: rdma-statistic: Add filter description
Add description for filters on rdma statistics show command.
Also add a filter description on the help message of the command.
Additionally, fix some whitespace issue in the man page.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Andrea Claudi 8f1c9d4a3c man: rdma.8: Add missing resource subcommand description
Add resource subcommand in the OBJECT section and a short
description for it.

Reported-by: Zhaojuan Guo <zguo@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:53:00 -08:00
Xin Long f9d696cf41 xfrm: not try to delete ipcomp states when using deleteall
In kernel space, ipcomp(sub) states used by main states are not
allowed to be deleted by users, they would be freed only when
all main states are destroyed and no one uses them.

In user space, ip xfrm sta deleteall doesn't filter these ipcomp
states out, and it causes errors:

  # ip xfrm state add src 192.168.0.1 dst 192.168.0.2 spi 0x1000 \
      proto comp comp deflate mode tunnel sel src 192.168.0.1 dst \
      192.168.0.2 proto gre
  # ip xfrm sta deleteall
  Failed to send delete-all request
  : Operation not permitted

This patch is to fix it by filtering ipcomp states with a check
xsinfo->id.proto == IPPROTO_IPIP.

Fixes: c7699875be ("Import patch ipxfrm-20040707_2.diff")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:50:58 -08:00
Andrea Claudi 229bb886a3 man: ip.8: Add missing vrf subcommand description
Add description to the vrf subcommand and a reference to the
dedicated man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:48:23 -08:00
Donald Sharp 320c5c6e09 ip route: Do not imply pref and ttl-propagate are per nexthop
Currently `ip -6 route show` gives us this output:

sharpd@eva ~/i/ip (master)> ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1 pref medium

Displaying `pref medium` as the last bit of output implies
that the RTA_PREF is a per nexthop value, when it is infact
a per route piece of data.

Change the output to display RTA_PREF and RTA_TTL_PROPAGATE
before the RTA_MULTIPATH data is shown:

sharpd@eva ~/i/ip (master)> ./ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20 pref medium
        nexthop via fe80::99 dev enp39s0 weight 1
        nexthop via fe80::44 dev enp39s0 weight 1

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-28 13:42:59 -08:00
Andrea Claudi 2c7056ac26 nstat: print useful error messages in abort() cases
When nstat temporary file is corrupted or in some other corner cases,
nstat use abort() to stop its execution. This can puzzle some users,
wondering what is the reason for the crash.

This commit replaces abort() with some meaningful error messages and exit()

Reported-by: Renaud Métrich <rmetrich@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:52:08 -08:00
Xin Long 83c543af87 erspan: set erspan_ver to 1 by default
Commit 2897636267 ("erspan: add erspan version II support")
breaks the command:

 # ip link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1

as erspan_ver is set to 0 by default, then IFLA_GRE_ERSPAN_INDEX
won't be set in gre_parse_opt().

  # ip -d link show erspan1
    ...
    erspan remote 10.1.0.1 local 10.1.0.2 ... erspan_index 0 erspan_ver 1
                                              ^^^^^^^^^^^^^^

This patch is to change to set erspan_ver to 1 by default.

Fixes: 2897636267 ("erspan: add erspan version II support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-23 13:50:17 -08:00
Stephen Hemminger 0a6ea03be4 uapi: update magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-11 08:16:42 -08:00
Moshe Shemesh 5023df6a21 devlink: Add health error recovery status monitoring
Add support for devlink health error recovery status monitoring.
Update devlink-monitor man page accordingly.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-02-10 05:29:24 +00:00
Mohit P. Tahiliani 9dced637f8 tc: add support for FQ-PIE packet scheduler
This patch adds support for the FQ-PIE packet Scheduler

Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecn_prob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:24:39 -08:00
Peter Junos 39995691b5 ss: fix tests to reflect compact output
This fixes broken tests in commit c4f5862994 ("ss: use compact output
for undetected screen width")

It also escapes stars as grep is used and more bugs could sneak under
the radar with the previous solution.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-04 03:21:06 -08:00
Stephen Hemminger 8f9f2b9cdf devlink: fix warning from unchecked write
Warning seen on Ubuntu

devlink.c: In function ‘cmd_dev_flash’:
devlink.c:3071:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
 3071 |   write(pipe_w, &err, sizeof(err));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-02-02 04:20:58 -08:00
Andrea Claudi 5cdeb77cd6 ip link: xstats: fix TX IGMP reports string
This restore the string format we have before jsonification, adding a
missing space between v2 and v3 on TX IGMP reports string.

Fixes: a9bc23a792 ("ip: bridge: add xstats json support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:35 -08:00
Andrea Claudi 38dd041bfe ip-xfrm: Fix help messages
After commit 8589eb4efd ("treewide: refactor help messages") help
messages for xfrm state and policy are broken, printing many times the
same protocol in UPSPEC section:

$ ip xfrm state help
[...]
UPSPEC := proto { { tcp | tcp | tcp | tcp } [ sport PORT ] [ dport PORT ] |
                  { icmp | icmp | icmp } [ type NUMBER ] [ code NUMBER ] |
                  gre [ key { DOTTED-QUAD | NUMBER } ] | PROTO }

This happens because strxf_proto function is non-reentrant and gets called
multiple times in the same fprintf instruction.

This commit fix the issue avoiding calls to strxf_proto() with a constant
param, just hardcoding strings for protocol names.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 10:11:14 -08:00
David Ahern 8e66c8c112 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-29 15:16:54 +00:00
Stephen Hemminger f9ed2db593 uapi: updates to tcp.h, snmp.h and if_bridge.h
Upstream changes

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:48:19 -08:00
Stephen Hemminger 74da271551 uapi/pkt_sched: upstream changes from fq_pie
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:46:43 -08:00
Stephen Hemminger aa6d6b223b uapi: update bpf.h and btf.h
Upstream headers from 5.6 pre rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-29 05:45:53 -08:00
Stephen Hemminger d80d22d5fd Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Resolved conflict in tc/f_flower.c
2020-01-29 05:44:53 -08:00
Stephen Hemminger d4df55404a v5.5.0 2020-01-27 05:53:09 -08:00
Ron Diskin ff360fe984 devlink: Replace pr_out_bool/uint() wrappers with common print functions
Replace calls for pr_out_bool() and pr_out_uint() with direct calls
to common json_print library function print_bool() and print_uint().

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 5a71671a94 devlink: Replace pr_#type_value wrapper functions with common functions
Replace calls for pr_bool/uint/uint64_value with direct calls for the
matching common json_print library function: print_bool(), print_uint()
and print_u64()

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 3666eb0eb6 devlink: Replace pr_out_str wrapper function with common function
Replace calls for pr_out_str() and pr_out_str_value() with direct calls to
common json_print library functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin b20555e402 devlink: Replace json prints by common library functions
Substitute json prints to use json_print.c common library functions,
instead of directly calling jsonw_functions.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 98e48e7dd0 json_print: Add new json object function not as array item
Currently new json object opens (and delete_json_obj closes) the object as
an array, what adds prints for the matching bracket '[' ']' at the
start/end of the object. This patch adds new_json_obj_plain() and the
matching delete_json_obj_plain() to enable opening and closing json object,
not as array and leave it to the using function to decide which type of
object to open/close as the main object.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Ron Diskin 31ca29b2be json_print: Introduce print_#type_name_value
Until now print_#type functions supported printing constant names and
unknown (variable) values only.
Add functions to allow printing when the name is also sent to the
function as a variable.

Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-27 05:43:54 -08:00
Leslie Monis eae5f4b5c8 tc: parse attributes with NLA_F_NESTED flag
The kernel now requires all new nested attributes to set the
NLA_F_NESTED flag. Enable tc {qdisc,class,filter} to parse
attributes that have the NLA_F_NESTED flag set.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:45:48 +00:00
Sabrina Dubroca 22aec42679 ip: xfrm: add espintcp encapsulation
While at it, convert xfrm_xfrma_print and xfrm_encap_type_parse to use
the UAPI macros for encap_type as suggested by David Ahern, and add the
UAPI udp.h header (sync'd from ipsec-next to get the TCP_ENCAP_ESPINTCP
definition).

Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:42:01 +00:00
David Ahern 4df5ad933c Update kernel headers and import udp.h
Update kernel headers to commit:
    4f2c17e0f332 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next")

and import udp.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-22 03:40:26 +00:00
Roi Dayan 919046d326 tc: flower: fix print with oneline option
This commit fix all location in flower to use _SL_ instead of \n for
newline to allow support for oneline option.

Example before this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1
  indev ens1f0
  dst_mac 11:22:33:44:55:66
  eth_type ipv4
  ip_proto tcp
  src_ip 2.2.2.2
  src_port 99
  dst_port 1-10\  tcp_flags 0x5/5
  ip_flags frag
  ct_state -trk\  ct_zone 4\  ct_mark 255
  ct_label 00000000000000000000000000000000
  skip_hw
  not_in_hw\    action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 287 sec used 287 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Example output after this commit:

filter protocol ip pref 2 flower chain 0 handle 0x1 \  indev ens1f0\  dst_mac 11:22:33:44:55:66\  eth_type ipv4\  ip_proto tcp\  src_ip 2.2.2.2\  src_port 99\  dst_port 1-10\  tcp_flags 0x5/5\  ip_flags frag\  ct_state -trk\  ct_zone 4\  ct_mark 255\  ct_label 00000000000000000000000000000000\  skip_hw\  not_in_hw\action order 1: ct zone 5 pipe
         index 1 ref 1 bind 1 installed 346 sec used 346 sec
        Action statistics:\     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0\

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-21 15:40:21 -08:00
Ethan Sommer 5f78bc3e1d make yacc usage POSIX compatible
config: put YACC in config.mk and use environmental variable if present

ss:
use YACC variable instead of hardcoding bison
place options before source file argument
use -b to specify file prefix instead of output file, as -o isn't POSIX
compatible, this generates ssfilter.tab.c instead of ssfilter.c
replace any references to ssfilter.c with references to ssfilter.tab.c

tc:
use -p flag to set name prefix instead of bison-specific api.prefix
directive
remove unneeded bison-specific directives
use -b instead of -o, replace references to previously generated
emp_ematch.yacc.[ch] with references to newly generated
emp_ematch.tab.[ch]

Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:43:22 -08:00
Jan Engelhardt 31f45088c9 build: fix build failure with -fno-common
$ make CCOPTS=-fno-common
gcc ... -o ip
ld: rt_names.o (symbol from plugin): in function "rtnl_rtprot_n2a":
(.text+0x0): multiple definition of "numeric"; ip.o (symbol from plugin):(.text+0x0): first defined here

gcc ... -o tipc
ld: ../lib/libutil.a(utils.o):(.bss+0xc): multiple definition of `pretty';
tipc.o:tipc.c:28: first defined here

References: https://bugzilla.opensuse.org/1160244
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:40:59 -08:00
Stephen Hemminger f4d7ce9bfa ip: use print_nl() to handle one line mode
The helper function print_nl() does the right thing and prints
the newline or backslash.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:32:51 -08:00
Vladis Dronov 970db267a0 ip: fix link type and vlan oneline output
Move link type printing in print_linkinfo() so multiline output does not
break link options line. Add oneline support for vlan's ingress and egress
qos maps.

Before the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>               the option line is broken ^^^
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }   <<< a multiline output despite -oneline
      egress-qos-map { 2:1 } addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

After the fix:

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    vlan protocol 802.1Q id 4000 <REORDER_HDR>
      ingress-qos-map { 1:2 }
      egress-qos-map { 2:1 }

5: veth90.4000@veth90: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 26:9a:05:af:db:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 \    vlan protocol 802.1Q id 4000 <REORDER_HDR> \      ingress-qos-map { 1:2 } \      egress-qos-map { 2:1 }

Fixes: 5c302d518f ("vlan support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206241
Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-20 09:28:39 -08:00
David Ahern 46eb08dc2e Merge branch 'tc-ets-qdisc' into next
Petr Machata  says:

====================

A new Qdisc, "ETS", has been accepted into Linux at kernel commit
6bff00170277 ("Merge branch 'ETS-qdisc'"). Add iproute2 support for this
Qdisc.

Patch #1, changes libnetlink to admit NLA_F_NESTED in nested attributes.
Patch #2 then adds ETS support as such.

Examples (taken from the kernel patchset):

- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5

- Tweak quantum of one of the classes of the previous Qdisc:

    # tc class ch dev swp1 classid 1:4 ets quantum 1000
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
    # tc class ch dev swp1 classid 1:3 ets quantum 1000
    Error: Strict bands do not have a configurable quantum.

- Purely strict Qdisc with 1:1 mapping between priorities and TCs:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

- Use "bands" to specify number of bands explicitly. Underspecified bands
  are implicitly ETS and their quantum is taken from MTU. The following
  thus gives each band the same weight:

    # tc qdisc add dev swp1 root handle 1: \
	ets bands 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:55:02 +00:00
Petr Machata d2773f1261 tc: Add support for ETS Qdisc
Add a new module to generate and parse options specific to the ETS Qdisc.

Example output:

    bands 8 strict 3 priomap 0 1 2 3 4 5 6 7
qdisc ets 1: root refcnt 2 offloaded bands 8 strict 3 quanta 1514 1514 1514 1514 1514 priomap 0 1 2 3 4 5 6 7 7 7 7 7 7 7 7 7
[
  {
    "kind": "ets",
    "handle": "1:",
    "root": true,
    "refcnt": 2,
    "offloaded": true,
    "options": {
      "bands": 8,
      "strict": 3,
      "quanta": [1514, 1514, 1514, 1514, 1514],
      "priomap": [0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:54:12 +00:00
Petr Machata 0da4cfaa5d libnetlink: parse_rtattr_nested should allow NLA_F_NESTED flag
In kernel commit 8cb081746c03 ("netlink: make validation more configurable
for future strictness"), Linux started implicitly flagging nests with
NLA_F_NESTED, unless the nest is created with nla_nest_start_noflag().

The ETS code uses nla_nest_start() where possible, so it does not work with
the current iproute2 code. Have libnetlink catch up by admitting the flag
in the attribute.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:46:12 +00:00
Ido Schimmel ed81a2a040 ip route: Print "rt_offload" and "rt_trap" indication
The kernel now signals the offload state of a route using the
'RTM_F_OFFLOAD' and 'RTM_F_TRAP' flags. Print these to help users
understand the offload state of each route. The "rt_" prefix is used in
order to distinguish it from the offload state of nexthops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:40:20 +00:00
David Ahern 8b802d20e4 Update kernel headers
Update kernel headers to commit
    9aaa29494030 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-18 21:39:15 +00:00
Tuong Lien d5391e186f tipc: fix clang warning in tipc/node.c
When building tipc with clang, the following warning is found:

tipc
    CC       bearer.o
    CC       cmdl.o
    CC       link.o
    CC       media.o
    CC       misc.o
    CC       msg.o
    CC       nametable.o
    CC       node.o
node.c:182:24: warning: field 'key' with variable sized type 'struct tipc_aead_key' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
                struct tipc_aead_key key;

This commit fixes it by putting the memory area allocated for the user
input key along with the variable-sized 'key' structure in the 'union'
form instead.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:48 -08:00
Stephen Hemminger f8bebea915 tc: skbprio: add support for JSON output
Print limit in JSON

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:12:02 -08:00
Stephen Hemminger 1d6b73be70 tc: prio: fix space in JSON tag
The priomap should not have extra space in the tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-01-06 13:11:41 -08:00
Peter Junos c4f5862994 ss: use compact output for undetected screen width
This change fixes calculation of width in case user pipes the output.

SS output output works correctly when stdout is a terminal. When one
pipes the output, it tries to use 80 or 160 columns. That adds a
line-break if user has terminal width of 100 chars and output is of
the similar width. No width is assumed here.

To reproduce the issue, call
ss | less
and see every other line empty if your screen is between 80 and 160
columns wide.

This second version of the patch fixes screen_width being set to arbitrary
value.

Signed-off-by: Peter Junos <petoju@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 18:38:08 +00:00
David Ahern 404f2de114 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:49:45 +00:00
Andy Roulin 39ac2d2b80 iplink: bond: print lacp actor/partner oper states as strings
The 802.3ad/LACP actor/partner operating states are only printed as
numbers, e.g,

ad_actor_oper_port_state 15

Add an additional output in ip link show that prints a string describing
the individual 3ad bit meanings in the following way:

ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>

JSON output is also supported, the field becomes a json array:

"ad_actor_oper_port_state_str":
	["active","short_timeout","aggregating","in_sync"]

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:45:32 +00:00
David Ahern a6cf98c23f Update kernel headers
Update kernel headers to commit:
    fe23d63422c8 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-01-02 17:26:50 +00:00
Leslie Monis e819d3a03d tc: fq_codel: fix missing statistic in JSON output
Print JSON object even if tc_fq_codel_xstats->class_stats.drop_next
is negative.

Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Fixes: 997f2dc193 ("tc: Add JSON output of fq_codel stats")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 669314e817 tc: tbf: add support for JSON output
Enable proper JSON output for the TBF Qdisc.
Also, fix the style of the statement that's calculating "latency" in
tbf_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 85fdef052b tc: sfq: add support for JSON output
Enable proper JSON output for the SFQ Qdisc.
Use the long double format specifier to print the value of
"probability".
Also, fix the indentation in the online output of the contents in the
tc_sfqred_stats structure.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 46d032d002 tc: sfb: add support for JSON output
Enable proper JSON output for the SFB Qdisc.
Make the output for options "rehash" and "db" explicit.
Use the long double format specifier to print probability values.
Use sprint_time() to print time values.
Also, fix the indentation in sfb_print_opt().

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 0154d096c5 tc: pie: add support for JSON output
Enable proper JSON output for the PIE Qdisc.
Use sprint_time() to print the value of tc_pie_xstats->delay.
Use the long double format specifier to print tc_pie_xstats->prob.
Also, fix the indentation in the oneline output of statistics and update
the man page to reflect this change.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis f6564ed60d tc: hhf: add support for JSON output
Enable proper JSON output for the HHF Qdisc.
Also, use sprint_size() to print size values.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d15e2bfc04 tc: fq: add support for JSON output
Enable proper JSON output for the FQ Qdisc.
Use the "KEY VALUE" format for oneline output of statistics instead of
"VALUE KEY", and remove unnecessary commas from the output.
Use sprint_size() to print size values in fq_print_opt().
Use sprint_time64() to print time values in fq_print_xstats().
Also, update the man page to reflect the changes in the output format.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis 90a50a6fa2 tc: codel: add support for JSON output
Enable proper JSON output for the CoDel Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d3136b1e80 tc: choke: add support for JSON output
Enable proper JSON output for the choke Qdisc.
Also, use the long double format specifier to print the value of
"probability".

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Leslie Monis d8f673074b tc: cbs: add support for JSON output
Enable proper JSON output for the CBS Qdisc.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:57:27 -08:00
Stephen Hemminger 2dda733f6d utils: fix indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-29 09:53:09 -08:00
Antony Antony 2cf4d7af72 ip: xfrm if_id -ve value is error
if_id is u32, error on -ve values instead of setting to 0

after :
 ip link add ipsec1 type xfrm dev lo if_id -10
 Error: argument "-10" is wrong: if_id value is invalid

before : note xfrm if_id 0
 ip link add ipsec1 type xfrm dev lo if_id -10
 ip -d  link show dev ipsec1
 9: ipsec1@lo: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/none 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
    xfrm if_id 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-25 12:38:13 -08:00
Vivien Didelot 0dcf36db1a iplink: add support for STP xstats
Add support for the BRIDGE_XSTATS_STP xstats, as follow:

    # ip link xstats type bridge_slave dev lan4 stp
    lan4
                        STP BPDU:  RX: 0 TX: 61
                        STP TCN:   RX: 0 TX: 0
                        STP Transitions: Blocked: 2 Forwarding: 1

Or below as JSON:

    # ip -j -p link xstats type bridge_slave dev lan0 stp
    [ {
            "ifname": "lan0",
            "stp": {
                "rx_bpdu": 0,
                "tx_bpdu": 500,
                "rx_tcn": 0,
                "tx_tcn": 0,
                "transition_blk": 0,
                "transition_fwd": 0
            }
        } ]

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:31:39 +00:00
Michal Kubecek 2b8e6995fe ip link: show permanent hardware address
Display permanent hardware address of an interface in output of
"ip link show" and "ip addr show". To reduce noise, permanent address is
only shown if it is different from current one.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:28:02 +00:00
David Ahern 974f889c2d Update kernel headers
Update kernel headers to commit:
    6f6dded1385c ("Merge branch 'WireGuard-CI-and-housekeeping'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-17 16:25:26 +00:00
Aya Levin af646bf953 devlink: Fix fmsg nesting in non JSON output
When an object or an array opening follows a name (label), add a new
line and indentation before printing the label. When name (label) is
followed by a value, print both at the same line.

Prior to this patch nesting was not visible in a non JSON output:
JSON:
{
    "Common config": {
        "SQ": {
            "stride size": 64,
            "size": 1024
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        } },
    "SQs": [ {
            "channel ix": 0,
            "sqn": 10,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 6,
                "HW status": 0
            }
         },{
            "channel ix": 0,
            "sqn": 14,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 10,
                "HW status": 0
            }
         } ]
}

Before this patch:
Common Config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0

With this patch:
Common config:
  SQ:
    stride size: 64 size: 1024
    CQ:
      stride size: 64 size: 1024
SQs:
  channel ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 6 HW status: 0
  channel ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0
  CQ:
    cqn: 10 HW status: 0

Fixes: 7b8baf834d ("devlink: Add devlink health diagnose command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin 746b66b005 devlink: Add a new time-stamp format for health reporter's dump
Introduce a new attribute representing a new time-stamp format: current
time in ns (to comply with y2038) instead of jiffies. If the new
attribute was received, translate the time-stamp accordingly (ns).

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Aya Levin f678a2d08e devlink: Print health reporter's dump time-stamp in a helper function
Add pr_out_dump_reporter prefix to the helper function's name and
encapsulate the print in it.

Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:52:37 -08:00
Benjamin Poirier 1a500c78ae bridge: Fix tunnelshow json output
repeats for "vlan tunnelshow" what commit 0f36267485 ("bridge: fix vlan
show formatting") did for "vlan show". This fixes problems in json output.

Note that the resulting json output format of "vlan tunnelshow" is not the
same as the original, introduced in commit 8652eeb3ab ("bridge: vlan:
support for per vlan tunnel info"). Changes similar to the ones done for
"vlan show" in commit 0f36267485 ("bridge: fix vlan show formatting") are
carried over to "vlan tunnelshow".

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Fixes: 0f36267485 ("bridge: fix vlan show formatting")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 955a20be02 bridge: Deduplicate vlan show functions
print_vlan() and print_vlan_tunnel() are almost identical copies, save for
a missing newline in the latter which leads to broken output of "vlan
tunnelshow" in normal mode.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier dfa13e2273 bridge: Fix vni printing
Since commit c7c1a1ef51 ("bridge: colorize output and use JSON print
library"), print_range() is used for vid (16bits) and vni. However, the
latter are 32bits so they get truncated. They got truncated even before
that commit though.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 1f53ba7297 bridge: Fix BRIDGE_VLAN_TUNNEL attribute sizes
As per the kernel's vlan_tunnel_policy, IFLA_BRIDGE_VLAN_TUNNEL_VID and
IFLA_BRIDGE_VLAN_TUNNEL_FLAGS have type NLA_U16.

Fixes: 8652eeb3ab ("bridge: vlan: support for per vlan tunnel info")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier df1262155c bridge: Fix src_vni argument in man page
"SRC VNI" is only one argument and should appear as such. Moreover, this
argument to the src_vni option is documented under three forms: "SRC_VNI",
"SRC VNI" and "VNI" in different places. Consistenly use the simplest form,
"VNI".

Fixes: c5b176e5ba ("bridge: fdb: add support for src_vni option")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 43b0b6ec84 bridge: Fix typo in error messages
Fixes: 9eff0e5cc4 ("bridge: Add vlan configuration support")
Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier d88a6a98e8 testsuite: Fix line count test
a substring match is not enough, ex: 10 != 1

Fixes: 30383b074d ("tests: Add output testing")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Benjamin Poirier 15322f46c3 json_print: Remove declaration without implementation
Fixes: 6377572f0a ("ip: ip_print: add new API to print JSON or regular format output")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-16 20:49:28 -08:00
Paolo Lungaroni 0486388a87 add support for table name in SRv6 End.DT* behaviors
it allows to specify also the table name in addition to the table number in
SRv6 End.DT* behaviors.

To add an End.DT6 behavior route specifying the table by name:

    $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table main dev eth0

The ip route show to print output this route:

    $ ip -6 route show 2001:db8::1
    2001:db8::1  encap seg6local action End.DT6 table main dev eth0 metric 1024 pref medium

The JSON output:
    $ ip -6 -j -p route show 2001:db8::1
    [ {
            "dst": "2001:db8::1",
            "encap": "seg6local",
            "action": "End.DT6",
            "table": "main",
            "dev": "eth0",
            "metric": 1024,
            "flags": [ ],
            "pref": "medium"
        } ]

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:22:07 +00:00
Stephen Hemminger 7b0d424abe tc: do not output newline in oneline mode
In oneline mode the line seperator should be \
but several parts of tc aren't doing it right.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:21:10 +00:00
David Ahern f39da545b6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-12-11 17:13:49 +00:00
Brian Vazquez 9eee92a41a ss: fix end-of-line printing in misc/ss.c
The previous change to ss to show header broke the printing of
end-of-line for the last entry.

Tested:

diff <(./ss.old -nltp) <(misc/ss -nltp)
38c38
< LISTEN   0  128   [::1]:35417  [::]:*  users:(("foo",pid=65254,fd=116))
\ No newline at end of file

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:19:00 -08:00
Brian Vazquez 908985c670 tc: fix warning in tc/q_pie.c
Warning was:
q_pie.c:202:22: error: implicit conversion from 'unsigned long' to
'double'

Fixes: 492ec9558b ("tc: pie: change maximum integer value of tc_pie_xstats->prob")
Cc: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-05 12:18:54 -08:00
Brian Vazquez cad1b0bc5f tc: fix warning in tc/m_ct.c
Warning was:
m_ct.c:370:13: warning: variable 'nat' is used uninitialized whenever
'if' condition is false

Cc: Paul Blakey <paulb@mellanox.com>
Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:32:53 -08:00
Bjarni Ingi Gislason 9ab56784a2 man: Fix unequal number of .RS and .RE macros
Add missing or excessive ".RE" macros.

  Remove an excessive ".EE" macro.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 11:13:12 -08:00
Gautam Ramakrishnan 920700a425 tc: pie: add dq_rate_estimator option
PIE now uses per packet timestamps to calculate queuing
delay. The average dequeue rate based queue delay
calculation is now made optional. This patch adds the option
to enable or disable the use of Little's law to calculate
queuing delay.

Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:49:42 -08:00
Stephen Hemminger 42060e8d35 tc_util: break long lines
Try to keep lines less than 100 characters or so.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:45:47 -08:00
Eric Dumazet 81b365eb50 tc_util: support TCA_STATS_PKT64 attribute
Kernel exports 64bit packet counters for qdisc/class stats in linux-5.5

Tested:

$ tc -s -d qd sh dev eth1 | grep pkt
 Sent 4041158922097 bytes 46393862190 pkt (dropped 0, overlimits 0 requeues 2072)
 Sent 501362903764 bytes 5762621697 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 533282357858 bytes 6128246542 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 515878280709 bytes 5875638916 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 516221011694 bytes 5933395197 pkt (dropped 0, overlimits 0 requeues 258)
 Sent 513175109761 bytes 5898402114 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 480207942964 bytes 5519535407 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 483111196765 bytes 5552917950 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 497920120322 bytes 5723104387 pkt (dropped 0, overlimits 0 requeues 271)
$ tc -s -d cl sh dev eth1 | grep pkt
 Sent 513196316238 bytes 5898645862 pkt (dropped 0, overlimits 0 requeues 231)
 Sent 533304444981 bytes 6128500406 pkt (dropped 0, overlimits 0 requeues 329)
 Sent 480227709687 bytes 5519762597 pkt (dropped 0, overlimits 0 requeues 229)
 Sent 501383660279 bytes 5762860276 pkt (dropped 0, overlimits 0 requeues 247)
 Sent 483131168192 bytes 5553147506 pkt (dropped 0, overlimits 0 requeues 240)
 Sent 515899485505 bytes 5875882649 pkt (dropped 0, overlimits 0 requeues 267)
 Sent 497940747031 bytes 5723341475 pkt (dropped 0, overlimits 0 requeues 271)
 Sent 516242376893 bytes 5933640774 pkt (dropped 0, overlimits 0 requeues 258)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:43:46 -08:00
Stephen Hemminger 214299be7b uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-12-04 10:42:32 -08:00
Tuong Lien 24bee3bf97 tipc: add new commands to set TIPC AEAD key
Two new commands are added as part of 'tipc node' command:

 $tipc node set key KEY [algname ALGNAME] [nodeid NODEID]
 $tipc node flush key

which enable user to set and remove AEAD keys in kernel TIPC (requires
the kernel option - 'TIPC_CRYPTO').

For the 'set key' command, the given 'nodeid' parameter decides the
mode to be applied to the key, particularly:

- If NODEID is empty, the key is a 'cluster' key which will be used for
all message encryption/decryption from/to the node (i.e. both TX & RX).
The same key will be set in the other nodes.

- If NODEID is own node, the key is used for message encryption (TX)
from the node. Whereas, if NODEID is a peer node, the key is for
message decryption (RX) from that peer node. This is the 'per-node-key'
mode that each nodes in the cluster has its specific (TX) key.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:14:11 +00:00
David Ahern 7438afd2cc Update kernel headers
Update kernel headers to commit:
    c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 23:13:09 +00:00
Eli Britstein 482fd40adf tc: flower: support masked port destination and source match
Extend destination and source port match to support masks, accepting
both decimal and hexadecimal formats.
Also add missing documentation to synopsis in manpage.

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_proto tcp dst_port 1234/0xff00 action drop

$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 1234/0xff00
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 26 sec used 26 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

$ tc -p -j filter show dev eth0 parent ffff:
        "options": {
            "keys": {
                "dst_port": 1234,
                "dst_port_mask": 65280
                ...

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:08 +00:00
Eli Britstein 75fb816d9f tc_util: add functions for big endian masked numbers
Add functions for big endian masked numbers as a pre-step towards masked
port numbers.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:37:01 +00:00
Eli Britstein b20dcd0b31 tc: flower: add u16 big endian parse option
Add u16 big endian parse option as a pre-step towards TCP/UDP/SCTP
ports usage.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:36:25 +00:00
Moshe Shemesh 5ccc365740 ip: fix oneline output
Ip tool oneline option should output each record on a single line. While
oneline option is active the variable _SL_ replaces line feeds with the
'\' character. However, at the end of print_linkinfo() the variable _SL_
shouldn't be used, otherwise the whole output is on a single line.

Before this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00\2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP mode DEFAULT group default qlen 1000\
link/ether 52:54:00:60:0a:db brd ff:ff:ff:ff:ff:ff\3: eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode
DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd brd
ff:ff:ff:ff:ff:ff\

After this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 52:54:00:60:0a:db
brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:1b:05:cd
brd ff:ff:ff:ff:ff:ff

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:34:13 +00:00
David Ahern 3d9608b923 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-25 21:33:28 +00:00
Stephen Hemminger 74ea2526bf v5.4.0 2019-11-25 08:07:24 -08:00
Stephen Hemminger 1285475fe0 netem: remove redundant README
The README for distribution format was already in netem/
2019-11-25 08:02:43 -08:00
Stephen Hemminger 6fec41d5ef remove no longer useful README for lnstat
Superseded by man page.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-25 08:01:23 -08:00
Jakub Kicinski 668bd8d356 devlink: fix requiring either handle
devlink sb occupancy show requires device or port handle.
It passes both device and port handle bits as required to
dl_argv_parse() so since commit 1896b100af ("devlink: catch
missing strings in dl_args_required") devlink will now
complain that only one is present:

$ devlink sb occupancy show pci/0000:06:00.0/0
BUG: unknown argument required but not found

Drop the bit for the handle which was not found from required.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 1896b100af ("devlink: catch missing strings in dl_args_required")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-21 21:38:52 +00:00
David Ahern 536dcd2016 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/devlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:31:01 +00:00
Ido Kalir b0a688a542 rdma: Rewrite custom JSON and prints logic to use common API
Instead of doing open-coded solution to generate JSON and prints, let's
reuse existing infrastructure and APIs to do the same as ip/*.

Before this change:
 if (rd->json_output)
     jsonw_uint_field(rd->jw, "sm_lid", sm_lid);
 else
     pr_out("sm_lid %u ", sm_lid);

After this change:
 print_uint(PRINT_ANY, "sm_lid", "sm_lid %u ", sm_lid);

All the print functions are converted to support color but for now the
type of color is COLOR_NONE. This is done as a preparation to addition
of color enable option. Such change will require rewrite of command line
arguments parser which is out-of-scope for this patch.

Signed-off-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:27:36 +00:00
Danit Goldberg 738728cc6c ip link: Add support to get SR-IOV VF node GUID and port GUID
Extend iplink to show VF GUIDs (IFLA_VF_IB_NODE_GUID, IFLA_VF_IB_PORT_GUID),
giving the ability for user-space application to print GUID values.
This ability is added to the one of setting new node GUID and port GUID values.

Suitable ip link command:
- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-20 02:25:50 +00:00
Stephen Hemminger a7fa739d12 uapi: devlink.h health timestamp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:38:17 -08:00
Eli Britstein 9479ec1ed0 tc: flower: fix output for ip tos and ttl
Fix the output for ip tos and ttl to be numbers in JSON format.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ip_tos 5/0xf action drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 5/0xf
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ip_tos": "0x5/f",
                ...
After:
        "options": {
            "keys": {
                "ip_tos": 5,
                "ip_tos_mask": 15,
                ...

Fixes: 6ea2c2b1cf ("tc: flower: add support for matching on ip tos and ttl")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein bb3ee8b313 tc_util: fix JSON prints for ct-mark and ct-zone
Fix the output of ct-mark and ct-zone (both for matches and actions) to
be different in JSON/non-JSON mode.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Non JSON format remains the same:
$ tc filter show dev eth0 parent ffff:
$ tc -s filter show dev ens1f0_0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

JSON format is changed (partial output):
$ tc -p -j filter show dev eth0 parent ffff:
Before:
        "options": {
            "keys": {
                "ct_zone": "5",
                "ct_mark": "6/0xf"
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": "8/0xf",
                "zone": "7",
                ...
After:
        "options": {
            "keys": {
                "ct_zone": 5,
                "ct_mark": 6,
                "ct_mark_mask": 15
                ...
        "actions": [ {
                "order": 1,
                "kind": "ct",
                "action": "commit",
                "mark": 8,
                "mark_mask": 15,
                "zone": 7,
                ...

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 99d5ee8368 tc: flower: fix newline prints for ct-mark and ct-zone
Matches of ct-mark and ct-zone were printed all in the same line. Fix
that so each ct match is printed in a separate line.

Example:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 protocol ip parent ffff: prio 1 flower skip_hw \
      ct_zone 5 ct_mark 6/0xf action ct commit zone 7 mark 8/0xf drop

Before:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4 ct_zone 5 ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 31 sec used 31 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

After:
$ tc -s filter show dev eth0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ct_zone 5
  ct_mark 6/0xf
  skip_hw
  not_in_hw
        action order 1: ct commit mark 8/0xf zone 7 drop
         index 1 ref 1 bind 1 installed 108 sec used 108 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 746e6c0fd3 tc_util: add an option to print masked numbers with/without a newline
Add an option to print masked numbers with or without a newline, as a
pre-step towards using a common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Eli Britstein 04b215015b tc_util: introduce a function to print JSON/non-JSON masked numbers
Introduce a function to print masked number with a different output for
JSON or non-JSON methods, as a pre-step towards printing numbers using
this common function.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-19 11:36:05 -08:00
Roman Mashak cc08619c3c man: tc-ematch.8: documented canid() ematch rule
tc-ematch.8 was missing the description of canid() ematch rule, so document
this.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:31:04 -08:00
Roman Mashak 5d5c394726 man: tc-ematch.8: update list of filter using extended matches
Extended match rules are currently supported by basic, flow and cgroup
filters, so update the man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-17 12:28:01 -08:00
Stephen Hemminger d24f5ae3f2 uapi: SPDX license updates
Upstream changes to SPDX licenses in headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:24:10 -08:00
Hritik Vijay 5883c6eba5 ss: show header for --processes/-p
ss by default shows headers for every column but omits it for --processes
for no apparent reason. This patch adds the "Process" header.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:20:14 -08:00
Guillaume Nault 130f549604 man: remove ppp from list of devices not allowed to change netns
PPP devices can be moved to different network namespaces. The feature
was added by commit 79c441ae505c ("ppp: implement x-netns support")
in Linux 4.3.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-14 09:19:39 -08:00
David Ahern 611d3123be Merge branch 'nsid-cleanup' into next
Guillaume Nault  says:

====================

It's currently hard to review ipnetns. The netns ids are inconsistently
treated as signed or unsigned and most helper functions aren't prepared
to use negative ids.

Netns id attributes can be negative: NETNSA_NSID_NOT_ASSIGNED =3D=3D -1.
So let's consistently treat nsids as signed and also reject negative
values in functions that are supposed to only handle assigned netns
ids.

While there, let's drop the extra blank line generated by some command
line parsing errors (patch 5/5).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:34:24 +00:00
Guillaume Nault d0b645a51e ipnetns: remove blank lines printed by invarg() messages
Since invarg() automatically adds a '\n' character, having one in the
error message generates an extra blank line.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:06 +00:00
Guillaume Nault 1c9b69276c ipnetns: don't print unassigned nsid in json export
Don't output the nsid and current-nsid json keys if they're not set.
Otherwise a parser would have to special case the "not-assigned"
string.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:05 +00:00
Guillaume Nault 08ba67db7b ipnetns: harden helper functions wrt. negative netns ids
Negative values are invalid netns ids. Ensure that helper functions
don't accidentally try to process them.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:03 +00:00
Guillaume Nault df6da60bcb ipnetns: fix misleading comment about 'ip monitor nsid'
'ip monitor nsid' doesn't call print_nsid().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:02 +00:00
Guillaume Nault f19966efee ipnetns: treat NETNSA_NSID and NETNSA_CURRENT_NSID as signed
These attributes are signed (with -1 meaning NETNSA_NSID_NOT_ASSIGNED).
So let's use rta_getattr_s32() and print_int() instead of their
unsigned counterpart to avoid confusion.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 01:33:00 +00:00
Jakub Kicinski c3f69bf923 devlink: allow full range of resource sizes
Resource size is a 64 bit attribute at netlink level.
Make the command line argument 64 bit as well.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:50 +00:00
Jakub Kicinski 1896b100af devlink: catch missing strings in dl_args_required
Currently if dl_args_required doesn't contain a string
for a given option the fact that the option is missing
is silently ignored.

Add a catch-all case and print a generic error.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:44 +00:00
Jakub Kicinski 9a0a2fcbf4 devlink: fix referencing namespace by PID
netns parameter for devlink reload is supposed to take PID
as well as string name. However, the PID parsing has two
bugs:
 - the opts->netns member is unsigned so the < 0
   condition is always false;
 - the parameter list is not rewinded after parsing as
   a name, so parsing as a pid uses the wrong argument.

Fixes: 08e8e1ca3e ("devlink: extend reload command to add support for network namespace change")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:39:03 +00:00
David Ahern 081140bbc4 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-09 00:38:37 +00:00
Jakub Kicinski 0932814458 devlink: require resource parameters
If devlink resource set parameters are not provided it crashes:
$ devlink resource set netdevsim/netdevsim0
Segmentation fault (core dumped)

This is because even though DL_OPT_RESOURCE_PATH and
DL_OPT_RESOURCE_SIZE are passed as o_required, the validation
table doesn't contain a relevant string.

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-07 20:36:08 -08:00
Vlad Buslov fb2e033add tc: implement support for action flags
Implement setting and printing of action flags with single available flag
value "no_percpu" that translates to kernel UAPI TCA_ACT_FLAGS value
TCA_ACT_FLAGS_NO_PERCPU_STATS. Update man page with information regarding
usage of action flags.

Example usage:

 # tc actions add action gact drop no_percpu
 # sudo tc actions list action gact
 total acts 1

        action order 0: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 0
        no_percpu

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:44:23 -07:00
David Ahern 17a948c80a Update kernel headers
Update kernel headers to commit:
    c23fcbbc6aa4 ("tc-testing: added tests with cookie for conntrack TC action")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-02 07:43:01 -07:00
Michał Łyszczek eca5123948 libnetlink.c, ss.c: properly handle fread() errors
fread(3) returns size_t data type which is unsigned, thus check
`if (fread(...) < 0)' is always false. To check if fread(3) has
failed, user should check error indicator with ferror(3).

This commit also changes read logic a little bit by being less
forgiving for errors. Previous logic was checking if fread(3)
read *at least* required ammount of data, now code checks if
fread(3) read *exactly* expected ammount of data. This makes
sense because code parses very specific binary file, and reading
even 1 less/more byte than expected, will later corrupt data anyway.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:05:41 -07:00
Vlad Buslov cb83101626 tc: remove duplicated NEXT_ARG_FWD() in parse_ct()
Function parse_ct() manually calls NEXT_ARG_FWD() after
parse_action_control_dflt(). This is redundant because
parse_action_control_dflt() modifies argc and argv itself. Moreover, such
implementation parses out any following actions option. For example, adding
action ct with cookie errors:

$ sudo tc actions add action ct cookie 111111111111
Bad action type 111111111111
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

With fix:

$ sudo tc actions add action ct cookie 111111111111
$ sudo tc actions list action ct
total acts 1

        action order 0: ct zone 0 pipe
         index 1 ref 1 bind 0
        cookie 111111111111

Fixes: c8a494314c ("tc: Introduce tc ct action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-01 09:04:28 -07:00
Julien Fortin 4f73cd7f0d ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-11-01 09:03:54 -07:00
Michał Łyszczek 6749801b06 rdma/sys.c: fix possible out-of-bound array access
netns_modes_str[] array has 2 elements, when netns_mode is 2,
condition (2 <= 2) will be true and `mode_str = netns_modes_str[2]'
will be executed, which will result in out-of-bound read.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-28 10:33:27 -07:00
David Ahern 7534b3cd8c Merge branch 'alt-names' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:38:24 -07:00
Jiri Pirko afd67550c2 ip: allow to use alternative names as handle
Extend ll_name_to_index() to get the index of a netdevice using
alternative interface name. Allow alternative long names to pass checks
in couple of ip link/addr commands.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 3aa0e51be6 ip: add support for alternative name addition/deletion/list
Implement addition/deletion of lists of properties, currently
alternative ifnames. Also extent the ip link show command to list them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
Jiri Pirko 20fbe90771 lib/ll_map: cache alternative names
Alternative names are related to the "parent name". That means,
whenever ll_remember_index() is called to add/delete/update and it founds
the "parent name" im object by ifindex, processes related
alternative name im objects too. Put them in a list which holds the
relationship with the parent.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-28 07:35:29 -07:00
David Ahern 9a1e4561f1 Merge branch 'rdma-mr-stats' into next
Leon Romanovsky  says:

====================

This is supplementary part of "ODP information and statistics"
kernel series.
https://lore.kernel.org/linux-rdma/20191016062308.11886-1-leon@kernel.org

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 10:28:49 -07:00
Erez Alfasi 5c78ffa0e5 rdma: Document MR statistics
Add document of accessing the MR counters into
the rdma-statistic man pages.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:38 -07:00
Erez Alfasi 33552ade17 rdma: Add "stat show mr" support
Show MR counters statistics. Filters are also enabled.

Examples:
~$: rdma stat show mr
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0
dev mlx5_0 mrn 9 page_faults 1221 page_invalidations 0

~$: rdma stat show mr mrn 8
dev mlx5_0 mrn 8 page_faults 1221 page_invalidations 0

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-27 10:28:30 -07:00
David Ahern c9dc3af42e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-27 09:53:46 -07:00
Stephen Hemminger 085ab19bc3 don't install examples
No longer relevant
2019-10-23 10:21:06 -07:00
Stephen Hemminger e15011b5e5 remove out of date README
The original old README refers to stuff from the pre 2.6
era including cbz. Just kill it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 10:19:45 -07:00
Michał Łyszczek b7f28e0bd9 ipnetns: do not check netns NAME when -all is specified
When `-all' argument is specified netns runs cmd on all namespaces
and NAME is not used, but netns nevertheless checks if argv[1] is a
valid namespace name ignoring the fact that argv[1] contains cmd
and not NAME. This results in bug where user cannot specify
absolute path to command.

    # ip -all netns exec /usr/bin/whoami
    Invalid netns name "/usr/bin/whoami"

This forces user to have his command in PATH.

Solution is simply to not validate argv[1] when `-all' argument is
specified.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:18:08 -07:00
Stephen Hemminger d49e5c2437 examples: remove diffserv
The diffserv examples here are out of date and incomplete.
Remove them rather than try and fix them.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:13:55 -07:00
Stephen Hemminger 86c0bf5982 examples: remove gaiconf
The gaiconf script is a workaround for something now handled
in distros as part of libc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-23 09:12:19 -07:00
Stephen Hemminger 14fd32d3c6 examples: remove out of date cbq stuff
The examples around cbq are out of date and never updated.
There are better ways to achieve same kind of thing with more
modern qdisc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-22 09:53:26 -07:00
Nicolas Dichtel 6ed2915f9c ip-netns.8: document target-nsid and nsid options of list-id
This is a follow up of the commit eaefb07804 ("ipnetns: enable to dump
nsid conversion table").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Nicolas Dichtel 63ab204e7b ip-netns.8: document the 'auto' keyword of 'ip netns set'
This is a follow up of the commit ebe3ce2fcc ("ipnetns: parse nsid as a
signed integer").

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Florent Fourcot 10d39984b7 man: remove "defaut group" sentence on ip link
By default, all devices are listed, not only the default group.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-16 12:18:37 -07:00
Davide Caratti 14cadc707b ss: allow dumping kTLS info
now that INET_DIAG_INFO requests can dump TCP ULP information, extend 'ss'
to allow diagnosing kTLS when it is attached to a TCP socket. While at it,
import kTLS uAPI definitions from the latest net-next tree.

CC: Andrea Claudi <aclaudi@redhat.com>
Co-developed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:21 -07:00
David Ahern 4c23b12865 Update kernel headers and import tls.h
Update kernel headers to commit:
    85a83a8fca7f ("Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA'")

and add tls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-14 20:07:20 -07:00
Nicolas Dichtel eaefb07804 ipnetns: enable to dump nsid conversion table
This patch enables to dump/get nsid from a netns into another netns.

Example:
$ ./test.sh
+ ip netns add foo
+ ip netns add bar
+ touch /var/run/netns/init_net
+ mount --bind /proc/1/ns/net /var/run/netns/init_net
+ ip netns set init_net 11
+ ip netns set foo 12
+ ip netns set bar 13
+ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
+ ip -n foo netns set init_net 21
+ ip -n foo netns set foo 22
+ ip -n foo netns set bar 23
+ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
+ ip -n bar netns set init_net 31
+ ip -n bar netns set foo 32
+ ip -n bar netns set bar 33
+ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
+ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
+ ip -n foo netns list-id target-nsid 21
nsid 11 current-nsid 21 (iproute2 netns name: init_net)
nsid 12 current-nsid 22 (iproute2 netns name: foo)
nsid 13 current-nsid 23 (iproute2 netns name: bar)
+ ip -n bar netns list-id target-nsid 33 nsid 32
nsid 32 current-nsid 32 (iproute2 netns name: foo)
+ ip -n bar netns list-id target-nsid 31 nsid 32
nsid 12 current-nsid 32 (iproute2 netns name: foo)
+ ip netns list-id nsid 13
nsid 13 (iproute2 netns name: bar)

CC: Petr Oros <poros@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Petr Oros <poros@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-14 13:04:19 -07:00
Aya Levin 94ff4d6882 devlink: Fix inconsistency between command input and output
In devlink health show command the reporter's name parameter is called
reporter, but in the output the reporter's name is referred to as name

Before this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   name tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

After this patch:
$ devlink health show pci/0000:04:00.0 reporter tx
pci/0000:04:00.0:
   reporter tx
     state healthy error 0 recover 0 grace_period 500 auto_recover true

Reported-by: Jiri Pirko <jiri@mellanox.com>
Fixes: 2f1242efe9 ("devlink: Add devlink health show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 7dd3d51b91 devlink: Left justification on FMSG output
FMSG output is dynamic, space separator must be on the left hand side of
the value. Otherwise output has redundant left indentation regardless
the hierarchy.

Before the patch:
 Common config: SQ: stride size: 64 size: 1024
 CQ: stride size: 64 size: 1024
 SQs:
   channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
   channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
   channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
   channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

With the patch:
Common config: SQ: stride size: 64 size: 1024
CQ: stride size: 64 size: 1024
SQs:
  channel ix: 0 tc: 0 txq ix: 0 sqn: 10 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 6 HW status: 0
  channel ix: 1 tc: 0 txq ix: 1 sqn: 14 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 10 HW status: 0
  channel ix: 2 tc: 0 txq ix: 2 sqn: 18 HW state: 1 stopped: false cc: 5 pc: 5 CQ: cqn: 14 HW status: 0
  channel ix: 3 tc: 0 txq ix: 3 sqn: 22 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 18 HW status: 0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Aya Levin 56b725a442 devlink: Add helper for left justification print
Introduce a helper function which wraps code that adds a left hand side
space separator unless it follows a newline.

Fixes: e3d0f0c0e3 ("devlink: add option to generate JSON output")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:22:13 -07:00
Andrea Claudi e047ca988f tc: fix segmentation fault on gact action
tc segfaults if gact action is used without action or index:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Segmentation fault

We expect tc to fail gracefully with an error message.

This happens if gact is the last argument of the incomplete
command. In this case the "gact" action is parsed, the macro
NEXT_ARG_FWD() is executed and the next matches() crashes
because of null argv pointer.

To avoid this, simply use NEXT_ARG() instead.

With this change in place:

$ ip link add type dummy
$ tc actions add action pipe index 1
$ tc filter add dev dummy0 parent ffff: protocol ip \
  pref 10 u32 match ip src 127.0.0.2 flowid 1:10 action gact
Command line is not complete. Try option "help"

Fixes: fa49588973 ("tc: Fix binding of gact action by index.")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:51 -07:00
Damien Robert 7c503d88d2 man: add reference to `ip route add encap ... src`
The ability to specify the source adresse for 'encap ip' / 'encap ip6'
was added in commit 94a8722f2f but the man
page was not updated.

Also fixes a missing page in ip-route.8.in.

Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-08 20:18:15 -07:00
David Ahern 47a4c1533c Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:02:36 +00:00
Jiri Pirko 08e8e1ca3e devlink: extend reload command to add support for network namespace change
Extend existing devlink reload command by adding option "netns" by which
user can instruct kernel to reload the devlink instance into specified
network namespace.

Example:

$ ip netns add testns1
$ devlink dev reload netdevsim/netdevsim10 netns testns1

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 22:00:49 +00:00
Jiri Pirko 29993df876 devlink: introduce cmdline option to switch to a different namespace
Similar to ip tool, add an option to devlink to operate under certain
network namespace. Unfortunately, "-n" is already taken, so use "-N"
instead.

Example:

$ devlink -N testns1 dev show

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:59:50 +00:00
Leon Romanovsky f93134841e rdma: Relax requirement to have PID for HW objects
RDMA has weak connection between PIDs and HW objects, because
the latter tied to file descriptors for their lifetime management.

The outcome of such connection is that for the following scenario,
the returned PID will be 0 (not-valid):
 1. Create FD and context
 2. Share it with ephemeral child
 3. Create any object and exit that child

This flow was revealed in testing environment and of course real users
are not running such scenario, because it makes no sense at all in RDMA
world.

Let's do two changes in the code to support such workflow anyway:
 1. Remove need to provide PID/kernel name. Code already supports it,
    just need to remove extra validation.
 2. Ball-out in case PID is 0.

Link: https://lore.kernel.org/linux-rdma/20191002123245.18153-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 21:54:30 +00:00
David Ahern 9dcd8788fe Update kernel headers
Update kernel headers to commit:
    940f13821528 ("Merge branch 'dpaa2-eth-misc-cleanup'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2019-10-07 20:43:13 +00:00
Stephen Hemminger 2d0445c67b uapi: update btf from 5.4-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-10-01 08:55:01 -07:00
Roopa Prabhu 6284236237 ipneigh: neigh get support
This patch adds support to lookup a neigh entry
using recently added support in the kernel using RTM_GETNEIGH

example:
$ip neigh get 10.0.2.4 dev test-dummy0
10.0.2.4 dev test-dummy0 lladdr de:ad:be:ef:13:37 PERMANENT

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:23:43 -07:00
Roopa Prabhu 4ed5ad7bd3 bridge: fdb get support
This patch adds support to lookup a bridge fdb entry
using recently added support in the kernel using RTM_GETNEIGH
(and AF_BRIDGE family).

example:
$bridge fdb get 02:02:00:00:00:03 dev test-dummy0 vlan 1002
02:02:00:00:00:03 dev test-dummy0 vlan 1002 master bridge

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 08:22:32 -07:00
Julien Fortin 4ecefff3cf ip: fix ip route show json output for multipath nexthops
print_rta_multipath doesn't support JSON output:

{
    "dst":"27.0.0.13",
    "protocol":"bgp",
    "metric":20,
    "flags":[],
    "gateway":"169.254.0.1"dev uplink-1 weight 1 ,
    "flags":["onlink"],
    "gateway":"169.254.0.1"dev uplink-2 weight 1 ,
    "flags":["onlink"]
},

since RTA_MULTIPATH has nested objects we should print them
in a json array.

With the path we have the following output:

{
    "flags": [],
    "dst": "36.0.0.13",
    "protocol": "bgp",
    "metric": 20,
    "nexthops": [
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-1"
        },
        {
            "weight": 1,
            "flags": [
                "onlink"
            ],
            "gateway": "169.254.0.1",
            "dev": "uplink-2"
        }
    ]
}

Fixes: 663c3cb231 ("iproute: implement JSON and color output")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-10-01 07:59:47 -07:00
Thomas Haller 0d82ee9939 man: add note to ip-macsec manual about necessary key management
The man page of ip-macsec and the existance of the tool makes it seem like
the user could just configure static keys once, and be done with it. That is
not the case. Some form or key management must be done in user space.

Add a note about that.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-26 14:11:27 -07:00
David Ahern 8c2093e5d2 ip vrf: Add json support for show command
Add json support to 'ip vrf sh':
$ ip -j -p vrf ls
[ {
        "name": "mgmt",
        "table": 1001
    } ]

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:35:41 -07:00
David Ahern 92754430a6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-24 19:34:34 -07:00
Stephen Hemminger 8d88c37724 uapi: update headers from 5.4-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:38:57 -07:00
Stephen Hemminger 38e9ba9dc9 Merge ../iproute2-next 2019-09-24 12:37:33 -07:00
Stephen Hemminger 18e631bd4b v5.3.0 2019-09-24 12:32:05 -07:00
Joe Stringer e4c4685fd6 bpf: Fix race condition with map pinning
If two processes attempt to invoke bpf_map_attach() at the same time,
then they will both create maps, then the first will successfully pin
the map to the filesystem and the second will not pin the map, but will
continue operating with a reference to its own copy of the map. As a
result, the sharing of the same map will be broken from the two programs
that were concurrently loaded via loaders using this library.

Fix this by adding a retry in the case where the pinning fails because
the map already exists on the filesystem. In that case, re-attempt
opening a fd to the map on the filesystem as it shows that another
program already created and pinned a map at that location.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-24 12:29:38 -07:00
David Ahern a32692ac9c Merge branch 'master' into next
Conflicts:
	devlink/devlink.c

Fixed the conflict by updating the numbering for all new attributes
after the ones in master branch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:55:53 -07:00
Jiri Pirko 88fdbf8030 devlink: add reload failed indication
Add indication about previous failed devlink reload.

Example outputs:

$ devlink dev
netdevsim/netdevsim10: reload_failed true
$ devlink dev -j -p
{
    "dev": {
        "netdevsim/netdevsim10": {
            "reload_failed": true
        }
    }
}

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:51:47 -07:00
Andrea Claudi c0325b0638 bpf: replace snprintf with asprintf when dealing with long buffers
This reduces stack usage, as asprintf allocates memory on the heap.

This indirectly fixes a snprintf truncation warning (from gcc v9.2.1):

bpf.c: In function ‘bpf_get_work_dir’:
bpf.c:784:49: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |                                                 ^
bpf.c:784:2: note: ‘snprintf’ output between 2 and 4097 bytes into a destination of size 4096
  784 |  snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-19 07:49:46 -07:00
Nicolas Dichtel 80d0e62673 link_xfrm: don't force to set phydev
Since linux commit 22d6552f827e ("xfrm interface: fix management of
phydev"), phydev is not mandatory anymore.

Note that it also could be useful before the above commit to not force the
user to put a phydev (the kernel was checking it anyway).
For example, it was useful to not set it in case of x-netns, because the
phydev is not available in the current netns:

Before the patch:
$ ip netns add foo
$ ip link add xfrm1 type xfrm dev eth1 if_id 1
$ ip link set xfrm1 netns foo
$ ip -n foo link set xfrm1 type xfrm dev eth1 if_id 2
Cannot find device "eth1"
$ ip -n foo link set xfrm1 type xfrm if_id 2
must specify physical device

Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:26:21 +02:00
Andrea Claudi 6296d51825 man: ss.8: add documentation for drop counter
After commit 6df9c7a06a ("ss: add SK_MEMINFO_DROPS display") ss -m
displays also a drop counter for each socket.

This commit properly document it into the man page.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:25:25 +02:00
Mark Zhang 4e2d9fc4d8 rdma: Check comm string before print in print_comm()
Broken kernels (not-upstream) can provide wrong empty "comm" field.
It causes to segfault while printing in JSON format.

Fixes: 8ecac46a60 ("rdma: Add QP resource tracking information")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-17 17:17:38 +02:00
Jiri Pirko 9b13cddfe2 devlink: implement flash status monitoring
Listen to status notifications coming from kernel during flashing and
put them on stdout to inform user about the status.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:25 -07:00
Jiri Pirko 853be43f9e devlink: implement flash update status monitoring
Kernel sends notifications about flash update status, so implement these
messages for monitoring.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-16 07:49:24 -07:00
Dirk van der Merwe 850de16f12 devlink: unknown 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter now supports an unknown value.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:55:16 -07:00
Dirk van der Merwe c240e6748e devlink: add 'reset_dev_on_drv_probe' devlink param
Add support for the new devlink parameter along with string to uint
conversion.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:52:53 -07:00
David Dai 1157a6fc36 iproute2-next: police: support 64bit rate and peakrate in tc utility
For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward
binary compatibility.

Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:39:19 -07:00
David Ahern 3d72f125c3 Update kernel headers
Update kernel headers to commit:
    aa2eaa8c272a ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-15 10:32:58 -07:00
David Ahern 2caa8012e8 nexthop: Add space after blackhole
Add a space after 'blackhole' is missing to properly separate the
protocol when it is given.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:05:43 -07:00
Andrea Claudi 4fb98f0895 devlink: fix segfault on health command
devlink segfaults when using grace_period without reporter

$ devlink health set pci/0000:00:09.0 grace_period 3500
Segmentation fault

devlink is instead supposed to gracefully fail printing a warning
message

$ devlink health set pci/0000:00:09.0 grace_period 3500
Reporter's name is expected.

This happens because DL_OPT_HEALTH_REPORTER_NAME and
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD are both defined as BIT(27).
When dl_opts_put() parse options and grace_period is set, it erroneously
tries to set reporter name to null.

This is fixed simply shifting by 1 bit enumeration starting with
DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD.

Fixes: b18d89195b ("devlink: Add devlink health set command")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-09-04 12:01:19 -07:00
Donald Sharp 84b9168328 ip nexthop: Allow flush|list operations to specify a specific protocol
In the case where we have a large number of nexthops from a specific
protocol, allow the flush and list operations to take a protocol
to limit the commands scopes.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:20 -07:00
David Ahern 1a5141715e Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-09-04 07:48:15 -07:00
Stephen Hemminger 98631f134d uapi: update bpf.h header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-29 16:20:21 -07:00
David Ahern efaef0be95 Merge branch 'devlink-trap' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@mellanox.com>

This patchset adds devlink-trap support in iproute2.

Patch #1 increases the number of options devlink can handle.

Patches #2-#3 gradually add support for all devlink-trap commands.

Patch #4 adds a man page for devlink-trap.

See individual commit messages for example usage and output.

Changes in v2:
* Remove report option and monitor command since monitoring is done
  using drop monitor

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:51:38 -07:00
Ido Schimmel a7a56f6f9d devlink: Add man page for devlink-trap
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:50:32 -07:00
Ido Schimmel 4ede9e9d56 devlink: Add devlink trap group set and show commands
These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:48 -07:00
Ido Schimmel ef12d6dafa devlink: Add devlink trap set and show commands
The trap set command allows the user to set the action of an individual
trap. Example:

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap

The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:

# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "blackhole_route",
                "type": "drop",
                "generic": true,
                "action": "trap",
                "group": "l3_drops",
                "metadata": [ "input_port" ],
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:49:27 -07:00
Ido Schimmel b83220db37 devlink: Increase number of supported options
Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.

Increase the limit to 64 options.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:52 -07:00
David Ahern e3af717a8d Update kernel headers
Update kernel headers to commit:
    d83d508b74c4 ("Merge branch 'stmmac-next'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:48:02 -07:00
David Ahern 7ad06c82e7 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-18 11:40:30 -07:00
Donald Sharp dba639a598 ip nexthop: Add space to display properly when showing a group
When displaying a nexthop group made up of other nexthops, the display
line shows this when you have additional data at the end:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74proto zebra

Modify code so that it shows:

id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74 proto zebra

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-15 13:21:15 -07:00
Stephen Hemminger 260dc56ae3 lib: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:21:10 -07:00
Stephen Hemminger 69df9bf981 tc: fix spelling errors
Minor spelling errors found by codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 18:18:51 -07:00
Stephen Hemminger 42a66ee5f3 uapi: update socket.h
Upstream change to resolve gcc-9 issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:58:49 -07:00
Ido Schimmel 395370035e tc: Fix block-handle support for filter operations
The revert of batchsize accidently reverted more than it should
and broke shared block functionality.  Fix this by restoring the
original functionality.

To reproduce:

	dst_ip 192.0.2.0/24 action drop
Unknown filter "block", hence option "10" is unparsable

Fixes: e991c04d64 ("Revert "tc: Add batchsize feature for filter and actions"")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-12 10:31:24 -07:00
Andrea Claudi a8360dd3f2 ip tunnel: add json output
Add json support on iptunnel and ip6tunnel.
The plain text output format should remain the same.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:58 -07:00
Gal Pressman 39307384ce rdma: Add driver QP type string
RDMA resource tracker now tracks driver QPs as well, add driver QP type
string to qp_types_to_str function.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 12:00:01 -07:00
David Ahern 74ddde9b5f Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-08-07 11:59:19 -07:00
Patrick Talbert 2d7cb22240 ss: sctp: Formatting tweak in sctp_show_info for locals
'locals' output does not include a leading space so it runs up against
skmem:() output. Add a leading space to fix it.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:50:11 -07:00
Patrick Talbert 18db049f6f ss: sctp: fix typo for nodelay
nodealy should be nodelay.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-06 08:49:53 -07:00
Jiri Pirko efd12cd2d9 devlink: finish queue.h to list.h transition
Loose the "q" from the names and name the structure fields in the same
way rest of the code does. Also, fix list_add arg order which leads
to segfault.

Fixes: 33267017fa ("iproute2: devlink: port from sys/queue.h to list.h")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-05 11:35:23 -07:00
Stephen Hemminger 4dd599fdb8 tc: fflush after each command in batch mode
Restore behaviour of tc batch mode.
Flush stdout after each command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:55 -07:00
Stephen Hemminger e991c04d64 Revert "tc: Add batchsize feature for filter and actions"
This reverts commit 485d0c6001.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:51 -07:00
Stephen Hemminger bfdda70d59 Revert "tc: fix batch force option"
This reverts commit b133392468.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:46 -07:00
Stephen Hemminger 350bc27cf3 Revert "tc: flush after each command in batch mode"
This reverts commit d66fdfda71.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:42 -07:00
Stephen Hemminger 11120881d9 Revert "tc: Remove pointless assignments in batch()"
This reverts commit 6358bbc381.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-02 09:34:36 -07:00
Yamin Friedman 432b21bec7 rdma: Document adaptive-moderation
Add document of setting the adaptive-moderation for the ib device.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Yamin Friedman 8a56ef325c rdma: Control CQ adaptive moderation (DIM)
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]

rdma dev show -d
0: mlx5_0: node_type ca fw 16.25.0319 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
caps: <BAD_PKEY_CNTR, BAD_QKEY_CNTR, AUTO_PATH_MIG, CHANGE_PHY_PORT,
PORT_ACTIVE_EVENT, SYS_IMAGE_GUID, RC_RNR_NAK_GEN, MEM_WINDOW, XRC,
MEM_MGT_EXTENSIONS, BLOCK_MULTICAST_LOOPBACK, MEM_WINDOW_TYPE_2B,
RAW_IP_CSUM, CROSS_CHANNEL, MANAGED_FLOW_STEERING, SIGNATURE_HANDOVER,
ON_DEMAND_PAGING, SG_GAPS_REG, RAW_SCATTER_FCS, PCI_WRITE_END_PADDING>

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-02 09:30:56 -07:00
Stephen Hemminger 067925e2e1 json_print: drop extra semi-colons
The _PRINT_FUNC() macro expands to a function call.
Putting a semi-colon is unnecessary and causes warnings with -pedantic

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:45:32 -07:00
Kurt Kanzenbach c875433b14 utils: Fix get_s64() function
get_s64() uses internally strtoll() to parse the value out of a given
string. strtoll() returns a long long. However, the intermediate variable is
long only which might be 32 bit on some systems. So, fix it.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-29 08:44:20 -07:00
Stephen Hemminger ab45d91d6a iplink: document 'change' option to ip link
Add the command alias "change" to man page.
Don't show it on usage, since it is not commonly used.

Reported-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Matteo Croce <mcroce@redhat.com>
2019-07-29 08:43:24 -07:00
Antonio Borneo 36e584ad8a iplink_can: fix format output of clock with flag -details
The command
	ip -details link show can0
prints in the last line the value of the clock frequency attached
to the name of the following value "numtxqueues", e.g.
	clock 49500000numtxqueues 1 numrxqueues 1 gso_max_size
	 65536 gso_max_segs 65535

Add the missing space after the clock value.

Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
2019-07-26 15:05:20 -07:00
Sergei Trofimovich 33267017fa iproute2: devlink: port from sys/queue.h to list.h
sys/queue.h does not exist on linux-musl targets and fails build as:

    devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
       28 | #include <sys/queue.h>
          |          ^~~~~~~~~~~~~

The change ports to list.h API and drops dependency of 'sys/queue.h'.
The API maps one-to-one.

Build-tested on linux-musl and linux-glibc.

Bug: https://bugs.gentoo.org/690486
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: netdev@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-26 15:05:20 -07:00
Stephen Hemminger b89d6202c9 uapi: update kernel headers from 5.3-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-22 09:45:09 -07:00
Mark Zhang ca084842da rdma: Document counter statistic
Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a7137e517f rdma: Add default counter show support
Show default counter statistics, which are same through the sysfs
interface: /sys/class/infiniband/<dev>/ports/<port>/hw_counters/

Example:
$ rdma stat show link mlx5_2/1
link mlx5_2/1 rx_write_requests 8 rx_read_requests 4 rx_atomic_requests 0
out_of_buffer 0 out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0
packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0
resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0
resp_remote_access_errors 0 resp_cqe_flush_error 0 req_cqe_flush_error 0
rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0
np_cnp_sent 0 rx_icrc_encapsulated 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang a6d0773ebe rdma: Add stat manual mode support
In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.

Examples:
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp bind link mlx5_2/1 lqpn 178 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang cbe10b4e44 rdma: Make get_port_from_argv() returns valid port in strict port mode
When strict_port is set, make get_port_from_argv() returns failure if
no valid port is specified.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 887fc739eb rdma: Add rdma statistic counter per-port auto mode support
With per-QP statistic counter support, a user is allowed to monitor
specific QPs categories, which are bound to/unbound from counters
dynamically allocated/deallocated.

In per-port "auto" mode, QPs are bound to counters automatically
according to common criteria. For example a per "type"(qp type)
scheme, where in each process all QPs have same qp type are bind
automatically to a single counter.
Currently only "type" (qp type) is supported. Examples:

$ rdma statistic qp set link mlx5_2/1 auto type on
$ rdma statistic qp set link mlx5_2/1 auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 1b2ca7ada7 rdma: Add get per-port counter mode support
Add an interface to show which mode is active. Two modes are supported:
- "auto": In this mode all QPs belong to one category are bind automatically
  to a single counter set. Currently only "qp type" is supported;
- "manual": In this mode QPs are bound to a counter manually.

Examples:
$ rdma statistic qp mode
0/1: mlx5_0/1: qp auto off
1/1: mlx5_1/1: qp auto off
2/1: mlx5_2/1: qp auto type on
3/1: mlx5_3/1: qp auto off

$ rdma statistic qp mode link mlx5_0
0/1: mlx5_0/1: qp auto off

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Mark Zhang 5937552b42 rdma: Add "stat qp show" support
This patch presents link, id, task name, lqpn, as well as all sub
counters of a QP counter.
A QP counter is a dynamically allocated statistic counter that is
bound with one or more QPs. It has several sub-counters, each is
used for a different purpose.

Examples:
$ rdma stat qp show
link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
req_remote_access_errors 0 resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0
    LQPN: <178>
$ rdma stat show link rocep1s0f5/1
link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
$ rdma stat show link rocep1s0f5/1 -p
link rocep1s0f5/1
    rx_write_requests 0
    rx_read_requests 0
    rx_atomic_requests 0
    out_of_buffer 0
    duplicate_request 0
    rnr_nak_retry_err 0
    packet_seq_err 0
    implied_nak_seq_err 0
    local_ack_timeout_err 0
    resp_local_length_error 0
    resp_cqe_error 0
    req_cqe_error 0
    req_remote_invalid_request 0
    req_remote_access_errors 0
    resp_remote_access_errors 0
    resp_cqe_flush_error 0
    req_cqe_flush_error 0
    rp_cnp_ignored 0
    rp_cnp_handled 0
    np_ecn_marked_roce_packets 0
    np_cnp_sent 0

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:51:13 -07:00
Stephen Hemminger 51a8f9f8fb uapi: fix bpf comment typo
From upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:49:36 -07:00
Ivan Delalande ed54f76484 json: fix backslash escape typo in jsonw_puts
Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-19 10:48:38 -07:00
Vedang Patel a794d05237 tc: taprio: Update documentation
Add documentation for the latest options, flags and txtime-delay, to the
taprio manpage.

This also adds an example to run tc in txtime offload mode.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:07 -07:00
Vedang Patel 1738a16de9 tc: etf: Add documentation for skip_sock_check.
Document the newly added option (skip_sock_check) on the etf man-page.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:47:02 -07:00
Vedang Patel a5e6ee3b34 taprio: add support for setting txtime_delay.
This adds support for setting the txtime_delay parameter which is useful
for the txtime offload mode of taprio.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:36 -07:00
Vinicius Costa Gomes ee000bf217 taprio: Add support for setting flags
This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:46:31 -07:00
Vedang Patel d9114263d0 etf: Add skip_sock_check
ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them.  So, add an option to skip this check.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:44:21 -07:00
David Ahern 86545eaffb Merge branch 'tc-conntrack' into next
Paul Blakey  says:

====================

This patch series add connection tracking capabilities in tc.
It does so via a new tc action, called act_ct, and new tc flower classifier matching.
Act ct and relevant flower matches, are still under review in net-next mailing list.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:42:13 -07:00
Paul Blakey 2fffb1c030 tc: flower: Add matching on conntrack info
Matches on conntrack state, zone, mark, and label.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:30 -07:00
Paul Blakey c8a494314c tc: Introduce tc ct action
New tc action to send packets to conntrack module, commit
them, and set a zone, labels, mark, and nat on the connection.

It can also clear the packet's conntrack state by using clear.

Usage:
   ct clear
   ct commit [force] [zone] [mark] [label] [nat]
   ct [nat] [zone]

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:41:02 -07:00
David Ahern f47081beff Import tc_act/tc_ct.h uapi file
Import include/uapi/linux/tc_act/tc_ct.h header from commit of last
kernel headers sync.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:40:07 -07:00
Paul Blakey 18aa9f5583 tc: add NLA_F_NESTED flag to all actions options nested block
Strict netlink validation now requires this flag on all nested
attributes, add it for action options.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 15:38:09 -07:00
Andrea Claudi ee713339d3 tunnel: factorize printout of GRE key and flags
print_tunnel() functions in ip6tunnel.c and iptunnel.c contains
the same code to print out GRE key and flags

This commit factorize the code in a helper function in tunnel.c

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-18 10:19:47 -07:00
Andrea Claudi d035cc1b4e ip tunnel: warn when changing IPv6 tunnel without tunnel name
Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.

$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.

This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001🔢:1 remote 2001🔢:2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:26:21 -07:00
Andrea Claudi ad04dbc5b4 Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
This reverts commit ba126dcad2.
It breaks tunnel creation when using 'dev' parameter:

$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff💯:2 local 2001:db8:ffff💯:1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists

dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.

Reported-by: Jianwen Ji <jiji@redhat.com>
Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 12:25:05 -07:00
Stephen Hemminger 78d3832335 uapi: rdma netlink.h update
From upstream 5.3-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:58:44 -07:00
Stephen Hemminger 03dafe13f4 uapi: update uapi/magic.h
From upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-16 11:56:58 -07:00
Aya Levin f359942a25 devlink: Remove enclosing array brackets binary print with json format
Keep pr_out_binary_value function only for printing. Inner relations
like array grouping should be done outside the function.

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin 1d05cca2fd devlink: Fix binary values print
Fix function pr_out_binary_value() to start printing the binary buffer
from offset 0 instead of offset 1. Remove redundant new line at the
beginning of the output

Example:
With patch:
 mlx5e_txqsq:
   05 00 00 00 05 00 00 00 01 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0
Without patch
  mlx5e_txqsq:

  00 00 00 05 00 00 00 01 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 8e 6e 3a 13 07 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0

Fixes: 844a61764c ("devlink: Add helper functions for name and value separately")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Aya Levin b4d97ef57f devlink: Change devlink health dump show command to dumpit
Although devlink health dump show command is given per reporter, it
returns large amounts of data. Trying to use the doit cb results in
OUT-OF-BUFFER error. This complementary patch raises the DUMP flag in
order to invoke the dumpit cb. We're safe as no existing drivers
implement the dump health reporter option yet.

Fixes: 041e6e651a ("devlink: Add devlink health dump show command")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:50:55 -07:00
Matteo Croce 1f420318bd utils: don't match empty strings as prefixes
iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:48:48 -07:00
Andrea Claudi 6bc13e4a20 tc: util: constrain percentage in 0-100 interval
parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.

Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%

Using values above 100% we have instead:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%

This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.

With this check in place, we have:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-15 13:45:59 -07:00
Stephen Hemminger fda6f26e9b uapi: fix bpf.h link
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:36:29 -07:00
Stephen Hemminger d5ddb441a5 tc: print all error messages to stderr
Many tc modules were printing error messages to stdout.
This is problematic if using JSON or other output formats.
Change all these places to use fprintf(stderr, ...) instead.

Also, remove unnecessary initialization and places
where else is used after error return.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-11 15:35:07 -07:00
David Ahern 1f250b6c53 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:41:13 -07:00
David Ahern 25c567bdcd Merge branch 'tc-mpls-action' into next
John Hurley  says:

====================

Recent kernel additions to TC allows the manipulation of MPLS headers as
filter actions.

The following patchset creates an iproute2 interface to the new actions
and includes documentation on how to use it.

v1->v2:
- change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
- split long line in explain() message (David Ahern)
- use _SL_ instead of /n in print message (David Ahern)

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:07:42 -07:00
John Hurley 3b810b3b9a man: update man pages for TC MPLS actions
Add a man page describing the newly added TC mpls manipulation actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:36 -07:00
John Hurley fb57b0920f tc: add mpls actions
Create a new action type for TC that allows the pushing, popping, and
modifying of MPLS headers.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:32 -07:00
John Hurley 11d7087a4e lib: add mpls_uc and mpls_mc as link layer protocol names
Update the llproto_names array to allow users to reference the mpls
protocol ids with the names 'mpls_uc' for unicast MPLS and 'mpls_mc' for
multicast.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:06:28 -07:00
David Ahern 1827694858 Import tc_mpls.h uapi header
Import tc_mpls.h uapi header from kernel headers at commit:
        1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 14:05:19 -07:00
Parav Pandit 056399399a devlink: Introduce PCI PF and VF port flavour and attribute
Introduce PCI PF and VF port flavour and port attributes such as PF
number and VF number.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:57:19 -07:00
Vincent Bernat eb105e0d94 ip: bond: add peer notification delay support
Ability to tweak the delay between gratuitous ND/ARP packets has been
added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
specify a delay between peer notifications"), through
IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
value.

Example:

    $ ip -d link set bond0 type bond peer_notify_delay 1000
    $ ip -d link l dev bond0
    2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    state UP mode DEFAULT group default qlen 1000
        link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
        bond mode active-backup active_slave eth0 miimon 100 updelay 0
    downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
    arp_validate none arp_all_targets any primary eth0
    primary_reselect always fail_over_mac active xmit_hash_policy
    layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
    0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
    stable tlb_dynamic_lb 1 addrgenmode eu

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:54:09 -07:00
David Ahern 01db6c4174 Update kernel headers
Update kernel headers to commit:
    1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params")

import include/uapi/linux/const.h per new dependency in
include/uapi/linux/pkt_cls.h.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-10 13:52:48 -07:00
Roman Mashak 26a49de4db tc: document 'mask' parameter in skbedit man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Roman Mashak 82f3df2028 tc: added mask parameter in skbedit action
Add 32-bit missing mask attribute in iproute2/tc, which has been long
supported by the kernel side.

v2: print value in hex with print_hex() as suggested by Stephen Hemminger.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:31:16 -07:00
Andrea Claudi 89ce8012d7 ip-route: fix json formatting for metrics
Setting metrics for routes currently lead to non-parsable
json output. For example:

$ ip link add type dummy
$ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
$ ip -j route | jq
parse error: ':' not as part of an object at line 1, column 319

Fixing this opening a json object in the metrics array and using
print_string() instead of fprintf().

This is the output for the above commands applying this patch:

$ ip -j route | jq
[
  {
    "dst": "192.168.2.0",
    "dev": "dummy0",
    "scope": "link",
    "metric": 100,
    "flags": [],
    "metrics": [
      {
        "mtu": 1000,
        "rto_min": 3
      }
    ]
  }
]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Fixes: 968272e791 ("iproute: refactor metrics print")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-09 17:30:06 -07:00
Parav Pandit 2eb23f3e7a devlink: Show devlink port number
Show devlink port number whenever kernel reports that attribute.

An example output for a physical port.
$ devlink port show
pci/0000:06:00.1/65535: type eth netdev eth1_p1 flavour physical port 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:56:14 -07:00
David Ahern a257456f96 ss: Change resolve_services to numeric
Commit ca697cee4c ("ip: add a new parameter -Numeric") changed
!resolve_services to numeric in ss.c.

A commit in master:
  d791e75d74 ("ss: in --numeric mode, print raw numbers for data rates")
added another reference to !resolve_services. Convert it to numeric.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:54:34 -07:00
David Ahern 830ac9abe6 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-07-09 14:26:44 -07:00
Stephen Hemminger af2583437e v5.2.0 2019-07-08 11:09:59 -07:00
Andrea Claudi 90f0b587d8 tc: netem: fix r parameter in Bernoulli loss model
As the man page for tc netem states:

    To use the Bernoulli model, the only needed parameter is p while the
    others will be set to the default values r=1-p, 1-h=1 and 1-k=0.

However r parameter is erroneusly set to 1, and not to 1-p.
Fix this using the same approach of the 4-state loss model.

Fixes: 3c7950af59 ("netem: add support for 4 state and GE loss model")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:17:22 -07:00
Tomasz Torcz d791e75d74 ss: in --numeric mode, print raw numbers for data rates
ss by default shows data rates in human-readable form - as Mbps/Gbps etc.
 Enhance --numeric mode to show raw values in bps, without conversion.

  Signed-of-by: Tomasz Torcz <tomasz.torcz@nordea.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-08 08:16:23 -07:00
Andrea Claudi c95e17dcba man: tc-netem.8: fix URL for netem page
URL for netem page on sources section points to a no more existent
resource. Fix this using the correct URL.

Fixes: cd72dcf13c ("netem: add man-page")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-02 17:44:51 -07:00
Denis Kirjanov 0f48f9f46a ipaddress: correctly print a VF hw address in the IPoIB case
Current code assumes that we print ethernet mac and
that doesn't work in the IPoIB case with SRIOV-enabled hardware

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state
disable,
    trust off, query_rss off
    ...

After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
        link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: updated kernel headers to uapi commit
v2->v3: fixed alignment
v3->v4: aligned print statements as used through the source

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ committer note: flipped argument order for print_vfinfo to keep fp first
  and fixed alignment issues ]
2019-06-28 16:20:12 -07:00
David Ahern ea985eb42d Update kernel headers
Update kernel headers to commit:
    5cdda5f1d6ad ("ipv4: enable route flushing in network namespaces")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:14:25 -07:00
Andrea Claudi 1e5746d5e1 utils: move parse_percent() to tc_util
As parse_percent() is used only in tc.

This reduces ip, bridge and genl binaries size:

$ bloat-o-meter -t bridge/bridge bridge/bridge.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=50973, After=50864, chg -0.21%

$ bloat-o-meter -t genl/genl genl/genl.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=30298, After=30189, chg -0.36%

$ bloat-o-meter ip/ip ip/ip.new
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-109 (-109)
Total: Before=674164, After=674055, chg -0.02%

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:06:26 -07:00
Hoang Le 1df17579d8 tipc: support interface name when activating UDP bearer
Support for indicating interface name has an ip address in parallel
with specifying ip address when activating UDP bearer.
This liberates the user from keeping track of the current ip address
for each device.

Old command syntax:
$tipc bearer enable media udp name NAME localip IP

New command syntax:
$tipc bearer enable media udp name NAME [localip IP|dev DEVICE]

v2:
    - Removed initial value for fd
    - Fixed the returning value for cmd_bearer_validate_and_get_addr
      to make its consistent with using: zero or non-zero
v3: - Switch to use helper 'get_ifname' to retrieve interface name
v4: - Replace legacy SIOCGIFADDR by netlink
v5: - Fix leaky rtnl_handle

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-28 16:03:16 -07:00
Baruch Siach d0272f5404 devlink: fix libc and kernel headers collision
Since commit 2f1242efe9 ("devlink: Add devlink health show command") we
use the sys/sysinfo.h header for the sysinfo(2) system call. But since
iproute2 carries a local version of the kernel struct sysinfo, this
causes a collision with libc that do not rely on kernel defined sysinfo
like musl libc:

In file included from devlink.c:25:0:
.../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct sysinfo'
 struct sysinfo {
        ^~~~~~~
In file included from ../include/uapi/linux/kernel.h:5:0,
                 from ../include/uapi/linux/netlink.h:5,
                 from ../include/uapi/linux/genetlink.h:6,
                 from devlink.c:21:
../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
 struct sysinfo {
        ^~~~~~~

Move the sys/sysinfo.h userspace header before kernel headers, and
suppress the indirect include of linux/sysinfo.h.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Baruch Siach ee09370a72 devlink: fix format string warning for 32bit targets
32bit targets define uint64_t as long long unsigned. This leads to the
following build warning:

devlink.c: In function ‘pr_out_u64’:
devlink.c:1729:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
    pr_out("%s %lu", name, val);
           ^
devlink.c:59:21: note: in definition of macro ‘pr_out’
   fprintf(stdout, ##args);   \
                     ^~~~

Use uint64_t specific conversion specifiers in the format string to fix
that.

Cc: Aya Levin <ayal@mellanox.com>
Cc: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:20:00 -07:00
Andrea Claudi 68c46872ce ip address: do not set mngtmpaddr option for IPv4 addresses
'mngtmpaddr' option make the kernel manage temporary addresses
created from the specified one as template on behalf of Privacy
Extensions (RFC3041). This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set mngtmpaddr on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 mngtmpaddr
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global mngtmpaddr dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_MANAGETEMPADDR flag.

Fixes: 5b7e21c417 ("add support for IFA_F_MANAGETEMPADDR")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi e4448b6c7d ip address: do not set home option for IPv4 addresses
'home' option designates a IPv6 address as "home address" as
defined in RFC 6275. This option should be available only for
IPv6 addresses, as correctly stated in the manpage.

However it is possible to set home on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 home
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global home dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_HOMEADDRESS flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Andrea Claudi 8ae99cc46d ip address: do not set nodad option for IPv4 addresses
Duplicate Address Detection (RFC 4862) is available only for IPv6
addresses. As a consequence, 'nodad' option, turning it off, should
be available only for IPv6, and is defined like that in the man page.

However it is possible to set nodad on IPv4 addresses, too:

$ ip link add dummy0 type dummy
$ ip -4 addr add 192.168.1.1 dev dummy0 nodad
$ ip a
1: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
   link/ether 1a:6d:c6:96:ca:f8 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.1/32 scope global nodad dummy0
      valid_lft forever preferred_lft forever

Fix this adding a check on the protocol family before setting
IFA_F_NODAD flag.

Fixes: bac735c53a ("enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-28 15:18:28 -07:00
Stefano Brivio b5cf263670 iproute: Set flags and attributes on dump to get IPv6 cached routes to be flushed
With a current (5.1) kernel version, IPv6 exception routes can't be listed
(ip -6 route list cache) or flushed (ip -6 route flush cache). Kernel
support for this is being added back. Relevant net-next commits:

  564c91f7e563 fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED
  ef11209d4219 Revert "net/ipv6: Bail early if user only wants cloned entries"
  3401bfb1638e ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()
  bf9a8a061ddc ipv6/route: Change return code of rt6_dump_route() for partial node dumps
  1e47b4837f3b ipv6: Dump route exceptions if requested
  40cb35d5dc04 ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()

However, to allow the kernel to filter routes based on the RTM_F_CLONED
flag, we need to make sure this flag is always passed when we want cached
routes to be dumped, and we can also pass table and output interface
attributes to have the kernel filtering on them, if requested by the user.

Use the existing iproute_dump_filter() as a filter for the dump request in
iproute_flush(). This way, 'ip -6 route flush cache' works again.

v2: Instead of creating a separate 'filter' function dealing with
    RTM_F_CACHED only, use the existing iproute_dump_filter() and get
    table and oif kernel filtering for free. Suggested by David Ahern.

Fixes: aba5acdfdb ("(Logical change 1.3)")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:27:00 -07:00
Hangbin Liu 5a403866f3 ip/iptoken: fix dump error when ipv6 disabled
When we disable IPv6 from the start up (ipv6.disable=1), there will be
no IPv6 route info in the dump message. If we return -1 when
ifi->ifi_family != AF_INET6, we will get error like

$ ip token list
Dump terminated

which will make user feel confused. There is no need to return -1 if the
dump message not match. Return 0 is enough.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 14:23:12 -07:00
Stephen Hemminger f799505372 devlink: replace print macros with functions
Using functions is safer, and printing is not performance
critical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-26 09:18:18 -07:00
Eyal Birger bfa757e02f tc: adjust xtables_match and xtables_target to changes in recent iptables
iptables commit 933400b37d09 ("nft: xtables: add the infrastructure to translate from iptables to nft")
added an additional member to struct xtables_match and struct xtables_target.

This change is available for libxtables12 and up.
Add these members conditionally to support both newer and older versions.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-24 16:12:17 -07:00
David Ahern f7eef91897 Merge branch 'master' into next
Conflicts:
	include/uapi/linux/snmp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:59:24 -07:00
Jakub Kicinski b3cf1167e7 tc: q_netem: JSON-ify the output
Add JSON output support to q_netem.

The normal output is untouched.

In JSON output always use seconds as the base of time units,
and non-percentage numbers (0.01 instead of 1%). Try to always
report the fields, even if they are zero.
All this should make the output more machine-friendly.

v2: less macroes

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-21 15:51:35 -07:00
Nicolas Dichtel 6d77d9c6ae ip monitor: display interfaces from all groups
Only interface from group 0 were displayed.

ip monitor calls ipaddr_reset_filter() and there is no reason to not reset
the filter group in this function.

Fixes: c4fdf75d3d ("ip link: fix display of interface groups")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-21 12:59:50 -07:00
Matteo Croce b2e2922373 netns: make netns_{save,restore} static
The netns_{save,restore} functions are only used in ipnetns.c now, since
the restore is not needed anymore after the netns exec command.
Move them in ipnetns.c, and make them static.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce d81d4ba15d ip vrf: use hook to change VRF in the child
On vrf exec, reset the VRF associations in the child process, via the
new hook added to cmd_exec(). In this way, the parent doesn't have to
reset the VRF associations before spawning other processes.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Matteo Croce 903818fbf9 netns: switch netns in the child when executing commands
'ip netns exec' changes the current netns just before executing a child
process, and restores it after forking. This is needed if we're running
in batch or do_all mode.
Some cleanups must be done both in the parent and in the child: the
parent must restore the previous netns, while the child must reset any
VRF association.
Unfortunately, if do_all is set, the VRF are not reset in the child, and
the spawned processes are started with the wrong VRF context. This can
be triggered with this script:

	# ip -b - <<-'EOF'
		link add type vrf table 100
		link set vrf0 up
		link add type dummy
		link set dummy0 vrf vrf0 up
		netns add ns1
	EOF
	# ip -all -b - <<-'EOF'
		vrf exec vrf0 true
		netns exec setsid -f sleep 1h
	EOF
	# ip vrf pids vrf0
	  314  sleep
	# ps 314
	  PID TTY      STAT   TIME COMMAND
	  314 ?        Ss     0:00 sleep 1h

Refactor cmd_exec() and pass to it a function pointer which is called in
the child before the final exec. In the netns exec case the function just
resets the VRF and switches netns.

Doing it in the child is less error prone and safer, because the parent
environment is always kept unaltered.

After this refactor some utility functions became unused, so remove them.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-20 14:30:41 -07:00
Pete Morici b16f525323 Add support for configuring MACsec gcm-aes-256 cipher type.
Signed-off-by: Pete Morici <pmorici@dev295.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:55:51 -07:00
Andrea Claudi 8063feebba Makefile: use make -C
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover on testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:52:58 -07:00
Stephen Hemminger 77a380379f uapi: update headers and add if_link.h and if_infiniband.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:48:21 -07:00
Michael Forney 578cadcc68 ipmroute: Prevent overlapping storage of `filter` global
This variable has the same name as `struct xfrm_filter filter` in
ip/ipxfrm.c, but overrides that definition since `struct rtfilter`
is larger.

This is visible when built with -Wl,--warn-common in LDFLAGS:

	/usr/bin/ld: ipxfrm.o: warning: common of `filter' overridden by larger common from ipmroute.o

Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-18 09:43:29 -07:00
Hangbin Liu ca697cee4c ip: add a new parameter -Numeric
Add a new parameter '-Numeric' to show the number of protocol, scope,
dsfield, etc directly instead of converting it to human readable name.
Do the same on tc and ss.

This patch is based on David Ahern's previous patch.

Suggested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-18 08:37:47 -07:00
David Ahern e92d221022 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 07:29:40 -07:00
David Ahern 82cdb4d445 tools: Fix include path for generate_nlmsg
Compile of tools directory fails with:

make -C tools
    CC       generate_nlmsg
../../lib/libnetlink.c:28:27: fatal error: linux/nexthop.h: No such file or directory
 #include <linux/nexthop.h>
                           ^
compilation terminated.

Add local uapi to build path.

Fixes: 74829ca7dd ("libnetlink: Add helper to create nexthop dump request")
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:50:55 -07:00
Andrea Claudi 41bf0c69c0 Makefile: use make -C to change directory
make provides a handy -C option to change directory before reading
the makefiles or doing anything else.

Use that instead of the "cd dir && make && cd .." pattern, thus
simplifying sintax for some makefiles.

Changes from v1:
- Drop an obviously wrong leftover in testsuite/iproute2/Makefile

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-14 06:44:39 -07:00
Stephen Hemminger b0a09ace39 testsuite: intent if/else in Makefile
Indent both arms of if/else equally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:48:33 -07:00
Moshe Shemesh c934da8aaa devlink: mnlg: Catch returned error value of dumpit commands
Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-12 08:43:14 -07:00
David Ahern 2357abbbfa Merge branch 'nexthop-objects' into next
David Ahern  says:

====================

This set adds support for nexthop objects to the ip command. The syntax
for nexthop objects is identical to the current 'ip route .. nexthop ...'
syntax making it easy to convert existing use cases.

v2
- Fixed header use in rtnl_nexthopdump_req as noted by roopa
- made rth_del static per Stephen's request and fixed coding style
- removed print_nh_gateway and exported print_rta_gateway to reuse
  the iproute.c code (keeps consistency in output)
- added examples to commit message
- fixed monitor use when specific groups requested
- fixed usage in 'ip nexthop'
- added manpage

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:32:07 -07:00
David Ahern e7cd93e7af ipmonitor: Add nexthop option to monitor
Add capability to ip-monitor to listen and dump nexthop messages.
Since the nexthop group = 32 which exceeds the max groups bit
field, 2 separate flags are needed - one that defaults on to indicate
nexthop group is joined by default and a second that indicates a
specific selection by the user (e.g, ip mon nexthop route).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-11 10:31:30 -07:00
David Ahern 12387e2c14 ip route: Add option to use nexthop objects
Add nhid option for routes to use nexthop objects by id.

Example:
  $ ip nexthop add id 1 via 10.99.1.2 dev veth1
  $ ip route add 10.100.1.0/24 nhid 1
  $ ip route ls
  ...
  10.100.1.0/24 nhid 1 via 10.99.1.2 dev veth1

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:28 -07:00
David Ahern 42cce67e71 ip: Add man page for nexthop command
Document 'ip nexthop' options in a man page with a few examples.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:31:06 -07:00
David Ahern 63df8e8543 Add support for nexthop objects
Add nexthop subcommand to ip. Implement basic commands for creating,
deleting and dumping nexthop objects. Syntax follows 'nexthop' syntax
from existing 'ip route' command.

Examples:
1. Single path
    $ ip nexthop add id 1 via 10.99.1.2 dev veth1
    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link

2. ECMP
    $ ip nexthop add id 2 via 10.99.3.2 dev veth3
    $ ip nexthop add id 1001 group 1/2
      --> creates a nexthop group with 2 component nexthops:
          id 1 and id 2 both the same weight

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2

3. Weighted multipath
    $ ip nexthop add id 1002 group 1,10/2,20
      --> creates a nexthop group with 2 component nexthops:
          id 1 with a weight of 10 and id 2 with a weight of 20

    $ ip nexthop ls
    id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
    id 2 via 10.99.3.2 src 10.99.3.1 dev veth3 scope link
    id 1001 group 1/2
    id 1002 group 1,10/2,20

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:58 -07:00
David Ahern 48a1e96d90 ip route: Export print_rt_flags, print_rta_if and print_rta_gateway
Export print_rt_flags and print_rta_if for use by the nexthop
command.

Change print_rta_gateway to take the family versus rtmsg struct and
export for use by the nexthop command.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:55 -07:00
David Ahern 74829ca7dd libnetlink: Add helper to create nexthop dump request
Add rtnl_nexthopdump_req to initiate a dump request of nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:53 -07:00
David Ahern 10631938f1 uapi: Import nexthop object API
Add nexthop.h from kernel with the uapi for nexthop objects.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:50 -07:00
David Ahern 9860becfe3 libnetlink: Add helper to add a group via setsockopt
groups > 31 have to be joined using the setsockopt. Since the nexthop
group is 32, add a helper to allow 'ip monitor' to listen for nexthop
messages.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:48 -07:00
David Ahern 7392401027 lwtunnel: Pass encap and encap_type attributes to lwt_parse_encap
lwt_parse_encap currently assumes the encap attribute is RTA_ENCAP
and the type is RTA_ENCAP_TYPE. Change lwt_parse_encap to take these
as input arguments for reuse by nexthop code which has the attributes
as NHA_ENCAP and NHA_ENCAP_TYPE.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:46 -07:00
David Ahern 2360b8cb21 libnetlink: Set NLA_F_NESTED in rta_nest
Kernel now requires NLA_F_NESTED to be set on new nested
attributes. Set NLA_F_NESTED in rta_nest.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-11 10:30:39 -07:00
Mahesh Bandewar ba126dcad2 ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds
Inclusion of 'dev' is allowed by the syntax but not handled
correctly by the command. It produces no output for show
command and falsely successful for change command but does
not make any changes.

can be verified with the following steps
  # ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none
  # ip -6 tunnel show ip6tnl1
  <correct output>
  # ip -6 tunnel show dev ip6tnl1
  <no output but correct output after this change>
  # ip -6 tunnel change dev ip6tnl1 local 2001🔢:1 remote 2001🔢:2 encaplimit none ttl 127 tos inherit allow-localremote
  # echo $?
  0
  # ip -6 tunnel show ip6tnl1
  <no changes applied, but changes are correctly applied after this change>

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:43:09 -07:00
Matteo Croce 80a931d41c ip: reset netns after each command in batch mode
When creating a new netns or executing a program into an existing one,
the unshare() or setns() calls will change the current netns.
In batch mode, this can run commands on the wrong interfaces, as the
ifindex value is meaningful only in the current netns. For example, this
command fails because veth-c doesn't exists in the init netns:

    # ip -b - <<-'EOF'
        netns add client
        link add name veth-c type veth peer veth-s netns client
        addr add 192.168.2.1/24 dev veth-c
    EOF
    Cannot find device "veth-c"
    Command failed -:7

But if there are two devices with the same name in the init and new netns,
ip will build a wrong ll_map with indexes belonging to the new netns,
and will execute actions in the init netns using this wrong mapping.
This script will flush all eth0 addresses and bring it down, as it has
the same ifindex of veth0 in the new netns:

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
           valid_lft 3598sec preferred_lft 3598sec

    # ip -b - <<-'EOF'
        netns add client
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

    # ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
    4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
    5: veth-ns@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client

The same issue can be triggered by the netns exec subcommand with a
sligthy different script:

    # ip netns add client
    # ip -b - <<-'EOF'
        netns exec client true
        link add name veth0 type veth peer name veth1
        link add name veth-ns type veth peer name veth0 netns client
        link set veth0 down
        address flush veth0
    EOF

Fix this by adding two netns_{save,reset} functions, which are used
to get a file descriptor for the init netns, and restore it after
each batch command.
netns_save() is called before the unshare() or setns(),
while netns_restore() is called after each command.

Fixes: 0dc34c7713 ("iproute2: Add processless network namespace support")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-10 10:42:14 -07:00
David Ahern 9a4f0ba478 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:32:07 -07:00
Kevin Darbyshire-Bryant d7f2bccd0f tc: add support for action act_ctinfo
ctinfo is a tc action restoring data stored in conntrack marks to
various fields.  At present it has two independent modes of operation,
restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
marks into packet skb marks.

It understands a number of parameters specific to this action in
additional to the usual action syntax.  Each operating mode is
independent of the other so all options are optional, however not
specifying at least one mode is a bit pointless.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]

DSCP mode

dscp enables copying of a DSCP stored in the conntrack mark into the
ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
in the conntrack mark the DSCP value is located.  It must be 6
contiguous bits long. eg. 0xfc000000 would restore the DSCP from the
upper 6 bits of the conntrack mark.

The DSCP copying may be optionally controlled by a statemask.  The
statemask is a 32bit field, usually with a single bit set and must not
overlap the dscp mask.  The DSCP restore operation will only take place
if the corresponding bit/s in conntrack mark ANDed with the statemask
yield a non zero result.

eg. dscp 0xfc000000 0x01000000 would retrieve the DSCP from the top 6
bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
example.

CPMARK mode

cpmark enables copying of the conntrack mark to the packet skb mark.  In
this mode it is completely equivalent to the existing act_connmark
action.  Additional functionality is provided by the optional mask
parameter, whereby the stored conntrack mark is logically ANDed with the
cpmark mask before being stored into skb mark.  This allows shared usage
of the conntrack mark between applications.

eg. cpmark 0x00ffffff would restore only the lower 24 bits of the
conntrack mark, thus may be useful in the event that the upper 8 bits
are used by the DSCP function.

Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
		  [CONTROL] [index <INDEX>]
where :
	dscp MASK is the bitmask to restore DSCP
	     STATEMASK is the bitmask to determine conditional restoring
	cpmark MASK mask applied to restored packet mark
	ZONE is the conntrack zone
	CONTROL := reclassify | pipe | drop | continue | ok |
		   goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:24:38 -07:00
David Ahern ed624243da uapi: Import tc_ctinfo uapi
Add tc_ctinfo.h uapi file from kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 10:23:32 -07:00
David Ahern b2f8eb7f8a Update kernel headers
Update kernel headers to commit:
    ad3a9ee0b623 ("ocelot: remove unused variable 'rc' in vcap_cmd()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-06-10 09:39:08 -07:00
Davide Caratti 0ee4d17954 tc: simple: don't hardcode the control action
the following TDC test case:

 b776 - Replace simple action with invalid goto chain control

checks if the kernel correctly validates the 'goto chain' control action,
when it is specified in 'act_simple' rules. The test systematically fails
because the control action is hardcoded in parse_simple(), i.e. it is not
parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
Because of that, the following command:

 # tc action add action simple sdata "test" drop index 7

installs an 'act_simple' rule that never drops packets, and whose 'index'
is the first IDR available, plus an 'act_gact' rule with 'index' equal to
7, that drops packets.

Use parse_action_control_dflt(), like we did on many other TC actions, to
make the control action configurable also with 'act_simple'. The expected
results of test b776 are summarized below:

 iproute2
   v       kernel->| 5.1-rc2 (and previous)  | 5.1-rc3 (and subsequent)
 ------------------+-------------------------+-------------------------
 5.1.0             | FAIL (bad IDR)          | FAIL (bad IDR)
 5.1.0(patched)    | FAIL (no rule/bad sdata)| PASS

Changes since v1:
 - reword commit message, thanks Stephen Hemminger

Fixes: 087f46ee4e ("tc: introduce simple action")
CC: Andrea Claudi <aclaudi@redhat.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:43:08 -07:00
Roman Mashak fa49588973 tc: Fix binding of gact action by index.
The following operation fails:
% sudo tc actions add action pipe index 1
% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact index 1

Bad action type index
Usage: ... gact <ACTION> [RAND] [INDEX]
Where:  ACTION := reclassify | drop | continue | pass | pipe |
                  goto chain <CHAIN_INDEX> | jump <JUMP_COUNT>
        RAND := random <RANDTYPE> <ACTION> <VAL>
        RANDTYPE := netrand | determ
        VAL : = value not exceeding 10000
        JUMP_COUNT := Absolute jump from start of action list
        INDEX := index value used

However, passing a control action of gact rule during filter binding works:

% sudo tc filter add dev lo parent ffff: \
       protocol ip pref 10 u32 match ip src 127.0.0.2 \
       flowid 1:10 action gact pipe index 1

Binding by reference, i.e. by index, has to consistently work with
any tc action.

Since tc is sensitive to the order of keywords passed on the command line,
we can teach gact to skip parsing arguments as soon as it sees 'gact'
followed by 'index' keyword.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:31 -07:00
Parav Pandit 2cc10ce81d devlink: Increase bus, device buffer size to 64 bytes
Device name on mdev bus is 36 characters long which follow standard uuid
RFC 4122.
This is probably the longest name that a kernel will return for a
device.

Hence increase the buffer size to 64 bytes.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-06 14:41:17 -07:00
Davide Caratti 4ae441e3d1 man: tc-skbedit.8: document 'inheritdsfield'
while at it, fix missing square bracket near 'ptype' and a typo in the
action description (it's -> its).

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-04 09:39:53 -07:00
David Ahern 339b14ab5e Merge branch 'rdma-net-namespace' into next
Parav Pandit  says:

====================

RDMA subsystem can be running in either of the modes.
(a) Sharing RDMA devices among multiple net namespaces or
(b) Exclusive mode where RDMA device is bound to single net namespace

This patch series adds
(1) query command to query rdma subsystem sharing mode
(2) set command to change rdma subsystem sharing mode
(3) assign rdma device to a net namespace

rdma tool examples:
(a) Query current rdma subsys net namespace sharing mode
$ rdma sys show
netns shared

(b) Change rdma subsys mode to exclusive mode
$ rdma sys set netns exclusive

$ rdma sys show
netns exclusive

(c) Assign rdma device to a specific newly created net namespace
$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:55 -07:00
Parav Pandit c2ffce5d39 rdma: Add man page for rdma dev set netns command
Add man page to describe additional set netns command
for rdma device.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:33 -07:00
Parav Pandit d17a0248a2 rdma: Add an option to set net namespace of rdma device
Enrich rdmatool with an option to set network namespace of RDMA
device. After successful execution of it, rdma device will
be accessible only in assigned network namespace.

rdma tool command examples and output.

First set netns mode to exclusive.

$ rdma system set netns exclusive

Now create network namespace and assign RDMA device to this
network namespace.

$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:32 -07:00
Parav Pandit e861272015 rdma: Add man pages for rdma system commands
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:31 -07:00
Parav Pandit c4572a465b rdma: Add an option to query,set net namespace sharing sys parameter
Enrich rdmatool with an option to query rdma subsystem parameter
whether rdma devices are shared among multiple network namespaces
or exclusive to single network namespace.

rdma tool command examples and output.

$ rdma system show
netns shared

$ rdma system set netns exclusive

$ rdma system show
netns exclusive

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-31 15:10:29 -07:00
Nicolas Dichtel c442234858 iplink: don't try to get ll addr len when creating an iface
It will obviously fail. This is a follow up of the
commit 757837230a ("lib: suppress error msg when filling the cache").

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:03:20 -07:00
Nikolay Aleksandrov a9661b8b0f bridge: mdb: restore text output format
While I fixed the mdb json output, I did overlook the text output.
This patch returns the original text output format:
 dev <bridge> port <port> grp <mcast group> <temp|permanent> <flags> <timer>
Example (old format, restored by this patch):
 dev br0 port eth8 grp 239.1.1.11 temp

Example (changed format after the commit below):
 23: br0  eth8  239.1.1.11  temp

We had some reports of failing scripts which were parsing the output.
Also the old format matches the bridge mdb command syntax which makes
it easier to build commands out of the output.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-30 11:01:53 -07:00
David Ahern 2761769472 Update kernel headers
Update kernel headers to commit:
    1167187f2759 ("Merge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-29 12:36:58 -07:00
Lukasz Czapnik 767b6fd620 tc: flower: fix port value truncation
sscanf truncates read port values silently without any error. As sscanf
man says:
(...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
do not specify the ERANGE error.

Replace sscanf with safer get_be16 that returns error when value is out
of range.

Example:
tc filter add dev eth0 protocol ip parent ffff: prio 1 flower ip_proto
tcp dst_port 70000 hw_tc 1

Would result in filter for port 4464 without any warning.

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:27:01 -07:00
Nicolas Dichtel 757837230a lib: suppress error msg when filling the cache
Before the patch:
$ ip netns add foo
$ ip link add name veth1 address 2a:a5:5c:b9:52:89 type veth peer name veth2 address 2a:a5:5c:b9:53:90 netns foo
RTNETLINK answers: No such device
RTNETLINK answers: No such device

But the command was successful. This may break script. Let's remove those
error messages.

Fixes: 55870dfe7f ("Improve batch and dump times by caching link lookups")
Reported-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-28 12:23:52 -07:00
Stephen Hemminger 1bb38f6c5e uapi: minor upstream btf.h header change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-24 15:51:06 -07:00
Paolo Abeni 6eccf7ecdb m_mirred: don't bail if the control action is missing
The mirred act admits an optional control action, defaulting
to TC_ACT_PIPE. The parsing code currently emits an error message
if the control action is not provided on the command line, even
if the command itself completes with no error.

This change shuts down the error message, using the appropriate
parsing helper.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-22 11:51:31 -07:00
Stephen Hemminger cd35c95423 man: fix macaddr section of ip-link
The formatting of setting mac address was confusing.
Break lines and fix highlighting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-21 11:27:14 -07:00
Matteo Croce 8589eb4efd treewide: refactor help messages
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:

        ip/ip:
        add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
        Total: Before=672591, After=667898, chg -0.70%
        ip/rtmon:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
        Total: Before=48879, After=48825, chg -0.11%
        tc/tc:
        add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
        Total: Before=351912, After=346661, chg -1.49%
        bridge/bridge:
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
        Total: Before=70502, After=70043, chg -0.65%
        misc/lnstat:
        add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
        Total: Before=9960, After=9522, chg -4.40%
        tipc/tipc:
        add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
        Total: Before=79182, After=79138, chg -0.06%

While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:35:07 -07:00
David Ahern 2cc9b5f4fa Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-20 14:34:26 -07:00
Stephen Hemminger f99ea67684 rdma: update uapi headers
Based on 5.2-rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:39 -07:00
Gal Pressman 7087f7c0ce rdma: Update node type strings
Fix typo in usnic_udp node type and add a string for the unspecified
node type.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-18 06:38:35 -07:00
Stephen Hemminger b60ed9a372 uapi: merge bpf.h from 5.2
Upstream commit to fix spelling errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:53:07 -07:00
Stephen Hemminger 2f31cb4fd6 uapi: add sockios.h
Forgot to add this to earlier commit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:51:15 -07:00
Stephen Hemminger 441f67de19 mailmap: map David's mail address
Cleans up multiple mail addresses in shortlog output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:50:42 -07:00
Stephen Hemminger a99b08624d mailmap: add myself
Put entries in for past commit mail addresses

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-15 09:30:47 -07:00
Stephen Hemminger afa588490b uapi: update headers to import asm-generic/sockios.h
import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:56:15 -07:00
Stephen Hemminger 0812dc7025 uapi: add include/linux/net.h
All kernel headers must come from this repo,
and ss is including linux/net.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-13 14:54:26 -07:00
David Ahern d53d7ce382 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 12:01:01 -07:00
David Ahern 5106aeaaf2 Update kernel headers and add asm-generic/sockios.h
Update kernel headers to commit
    b970afcfcabd ("Merge tag 'powerpc-5.2-1'")

and import asm-generic/sockios.h to fix the compile errors from the
movement of timestamp macros.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-10 10:06:41 -07:00
Stephen Hemminger f9339f8a8f uapi: update to elf-em header
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-10 08:56:52 -07:00
Stephen Hemminger f9e2cf35eb Merge ../iproute2-next 2019-05-10 08:55:11 -07:00
Stephen Hemminger 3eea00d777 v5.1.0 2019-05-10 08:45:14 -07:00
Phil Sutter cd21ae4013 ip-xfrm: Respect family in deleteall and list commands
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.

Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 13:32:44 -07:00
Zhiqiang Liu 9bf2c538a0 ipnetns: use-after-free problem in get_netnsid_from_name func
Follow the following steps:
 # ip netns add net1
 # export MALLOC_MMAP_THRESHOLD_=0
 # ip netns list
then Segmentation fault (core dumped) will occur.

In get_netnsid_from_name func, answer is freed before
rta_getattr_u32(tb[NETNSA_NSID]), where tb[] refers to answer`s
content. If we set MALLOC_MMAP_THRESHOLD_=0, mmap will be adoped to
malloc memory, which will be freed immediately after calling free
func.  So reading tb[NETNSA_NSID] will access the released memory
after free(answer).

Here, we will call get_netnsid_from_name(tb[NETNSA_NSID]) before free(answer).

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reported-by: Huiying Kou <kouhuiying@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:36:18 -07:00
Ido Schimmel e73306048f devlink: Fix monitor command
The command is supposed to allow users to filter events related to
certain objects, but returns an error when an object is specified:

# devlink mon dev
Command "dev" not found

Fix this by allowing the command to process the specified objects.

Example:

# devlink/devlink mon dev &
# echo "10 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim10

# devlink/devlink mon port &
# echo "11 1" > /sys/bus/netdevsim/new_device
[port,new] netdevsim/netdevsim11/0: type notset flavour physical
[port,new] netdevsim/netdevsim11/0: type eth netdev eth1 flavour physical

# devlink/devlink mon &
# echo "12 1" > /sys/bus/netdevsim/new_device
[dev,new] netdevsim/netdevsim12
[port,new] netdevsim/netdevsim12/0: type notset flavour physical
[port,new] netdevsim/netdevsim12/0: type eth netdev eth2 flavour physical

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-05-06 08:35:44 -07:00
Vinicius Costa Gomes 92f4b6032e taprio: Add support for cycle_time and cycle_time_extension
This allows a cycle-time and a cycle-time-extension to be specified.

Specifying a cycle-time will truncate that cycle, so when that instant
is reached, the cycle will start from its beginning.

A cycle-time-extension may cause the last entry of a cycle, just
before the start of a new schedule (the base-time of the "admin"
schedule) to be extended by at maximum "cycle-time-extension"
nanoseconds. The idea of this feauture, as described by the IEEE
802.1Q, is too avoid too narrow gate states.

Example:

tc qdisc change dev IFACE parent root handle 100 taprio \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      cycle-time-extension 100000 \
	      cycle-time 9000000 \
	      base-time 12345678900000000

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Vinicius Costa Gomes 602fae856d taprio: Add support for changing schedules
This allows for a new schedule to be specified during runtime, without
removing the current one.

For that, the semantics of the 'tc qdisc change' operation in the
context of taprio is that if "change" is called and there is a running
schedule, a new schedule is created and the base-time (let's call it
X) of this new schedule is used so at instant X, it becomes the
"current" schedule. So, in short, "change" doesn't change the current
schedule, it creates a new one and sets it up to it becomes the
current one at some point.

In IEEE 802.1Q terms, it means that we have support for the
"Oper" (current and read-only) and "Admin" (future and mutable)
schedules.

Example of creating the first schedule, then adding a new one:

(1)
tc qdisc add dev IFACE parent root handle 100 taprio \
      	      num_tc 1 \
	      map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	      queues 1@0 \
	      sched-entry S 0x1 1000000 \
	      sched-entry S 0x0 2000000 \
	      sched-entry S 0x1 3000000 \
	      sched-entry S 0x0 4000000 \
	      base-time 100000000 \
	      clockid CLOCK_TAI

(2)
tc qdisc change dev IFACE parent root handle 100 taprio \
	      base-time 7500000000000 \
	      sched-entry S 0x0 5000000 \
              sched-entry S 0x1 5000000 \

It was necessary to fix a bug, so the clockid doesn't need to be
specified when changing the schedule.

Most of the changes are related to make it easier to reuse the same
function for printing the "admin" and "oper" schedules.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:15 -07:00
Paolo Abeni c865c52365 tc: add support for plug qdisc
sch_plug can be used to perform functional qdisc unit tests
controlling explicitly the queuing behaviour from user-space.

Plug support lacks since its introduction in 2012. This change
introduces basic support, to control the tc status.

v1 -> v2:
 - use the SPDX identifier

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:22:14 -07:00
David Ahern fd6580972b Update kernel headers
Update kernel headers to commit
   a734d1f4c2fc ("net: openvswitch: return an error instead of doing BUG_ON()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-04 09:13:26 -07:00
David Ahern 420b36a874 uapi: wrap SIOCGSTAMP and SIOCGSTAMPNS in ifndef
These warnings:
    ../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
    ../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are from kernel commit 0768e17073dc5 ("net: socket: implement 64-bit
timestamps"). This commit moved the definitions of SIOCGSTAMP and
SIOCGSTAMPNS from include/asm-generic/sockios.h to
include/uapi/linux/sockios.h. Older OS'es already define them in
/usr/include/asm-generic/sockios.h resulting in ugly compile errors now:

In file included from ll_types.c:24:0:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
 #define SIOCGSTAMP SIOCGSTAMP_OLD

In file included from /usr/include/x86_64-linux-gnu/asm/sockios.h:1:0,
                 from /usr/include/asm-generic/socket.h:5,
                 from /usr/include/x86_64-linux-gnu/asm/socket.h:1,
                 from /usr/include/x86_64-linux-gnu/bits/socket.h:368,
                 from /usr/include/x86_64-linux-gnu/sys/socket.h:38,
                 from ll_types.c:17:
/usr/include/asm-generic/sockios.h:11:0: note: this is the location of the previous definition
 #define SIOCGSTAMP 0x8906  /* Get stamp (timeval) */

so wrap them in #ifndef.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-03 08:23:08 -07:00
Josh Hunt 296b5de724 ss: add option to print socket information on one line
Multi-line output in ss makes it difficult to search for things with
grep. This new option will make it easier to find sockets matching
certain criteria with simple grep commands.

Example without option:
$ ss -emoitn
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port
ESTAB      0      0      127.0.0.1:13265              127.0.0.1:36743               uid:1974 ino:48271 sk:1 <->
	 skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.245/16.616 ato:40 mss:65483 cwnd:10 bytes_acked:41865496 bytes_received:21580440 segs_out:242496 segs_in:351446 data_segs_out:242495 data_segs_in:242495 send 511.3Mbps lastsnd:2383 lastrcv:2383 lastack:2342 pacing_rate 1022.6Mbps rcv_rtt:92427.6 rcv_space:43725 minrtt:0.007

Example with new option:
$ ss -emoitnO
State    Recv-Q Send-Q          Local Address:Port            Peer Address:Port
ESTAB    0      0                   127.0.0.1:13265              127.0.0.1:36743 uid:1974 ino:48271 sk:1 <-> skmem:(r0,rb2227595,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:7,7 rto:211 rtt:10.067/16.429 ato:40 mss:65483 pmtu:65535 rcvmss:536 advmss:65483 cwnd:10 bytes_sent:41868244 bytes_acked:41868244 bytes_received:21581866 segs_out:242512 segs_in:351469 data_segs_out:242511 data_segs_in:242511 send 520.4Mbps lastsnd:14355 lastrcv:14355 lastack:14314 pacing_rate 1040.7Mbps delivery_rate 74837.7Mbps delivered:242512 app_limited busy:1861946ms rcv_rtt:92427.6 rcv_space:43725 rcv_ssthresh:43690 minrtt:0.007

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-02 16:06:06 -07:00
Ido Schimmel 517ea57c6d devlink: Increase column size for larger shared buffers
With current number of spaces the output is mangled if the shared buffer
is congested.

Before:

# devlink sb occupancy show swp25
swp25:
  pool: 0:    33384960/39344256 1:          0/0       2:          0/0       3:          0/0
        4:          0/720     5:          0/0       6:          0/0       7:          0/0
        8:          0/288     9:          0/0      10:          0/0
  itc:  0(0): 33272064/39344256 1(0):       0/0       2(0):       0/0       3(0):       0/0
        4(0):       0/0       5(0):       0/0       6(0):       0/0       7(0):       0/0
  etc:  0(4):       0/720     1(4):       0/0       2(4):       0/0       3(4):       0/0
        4(4):       0/0       5(4):       0/0       6(4):       0/0       7(4):       0/0
        8(8):       0/288     9(8):       0/0      10(8):       0/0      11(8):       0/0
       12(8):       0/0      13(8):       0/0      14(8):       0/0      15(8):       0/0

After:

# devlink sb occupancy show swp25
swp25:
  pool: 0:      39070080/39344256   1:             0/0          2:             0/0          3:             0/0
        4:             0/720        5:             0/0          6:             0/0          7:             0/0
        8:             0/288        9:             0/0         10:             0/0
  itc:  0(0):   39062016/39344256   1(0):          0/0          2(0):          0/0          3(0):          0/0
        4(0):          0/0          5(0):          0/0          6(0):          0/0          7(0):          0/0
  etc:  0(4):          0/720        1(4):          0/0          2(4):          0/0          3(4):          0/0
        4(4):          0/0          5(4):          0/0          6(4):          0/0          7(4):          0/0
        8(8):          0/288        9(8):          0/0         10(8):          0/0         11(8):          0/0
       12(8):          0/0         13(8):          0/0         14(8):          0/0         15(8):          0/0

v2:
* Increase number of spaces to make the change more future-proof

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-30 11:23:08 -07:00
Nikolay Aleksandrov 09e0528cf9 ip: mroute: add fflush to print_mroute
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

After this patch ip monitor mroute &>log works properly.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-29 15:04:18 -07:00
Lucas Siba 2019-04-20 11:40 UTC 4d9e90f36b Update tc-bpf.8 man page examples
This patch updates the tc-bpf.8 example application for changes to the
struct bpf_elf_map definition. In it's current form, things compile, but
the resulting object file is rejected by the verifier when attempting to
load it through tc.

Signed-off-by: Lucas Siba <lucas.siba@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
[ dropped the unnecessary flags initialization on commit ]
2019-04-26 14:05:47 -07:00
David Ahern 10fb5faec1 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:13:54 -07:00
Mike Manning 3f2e457ae4 iplink_vlan: add support for VLAN bridge binding flag
This patch adds support for the VLAN bridge binding flag that is
provided in net-next kernel by the series merged by 1ab839281cf7
("net-support-binding-vlan-dev-link-state-to-vlan-member-bridge-ports")

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:12:58 -07:00
David Ahern 70de8a7fa7 Update kernel headers
Update kernel headers to commit
    148f025d41a8 ("Merge branch 'hns3-next'")

Note, these warnings:
../include/uapi/linux/sockios.h:42:0: warning: "SIOCGSTAMP" redefined
../include/uapi/linux/sockios.h:43:0: warning: "SIOCGSTAMPNS" redefined

are due to kernel commit
    0768e17073dc5 ("net: socket: implement 64-bit timestamps")

which moved the definitions from include/asm-generic/sockios.h
to include/uapi/linux/sockios.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-26 11:11:03 -07:00
Stephen Hemminger 38983334f6 tc/ematch: fix deprecated yacc warning
Newer versions of Bison deprecated some directives.

    YACC     emp_ematch.yacc.c
emp_ematch.y:11.1-14: warning: deprecated directive, use ‘%define parse.error verbose’ [-Wdeprecated]
 %error-verbose
 ^~~~~~~~~~~~~~
emp_ematch.y:12.1-22: warning: deprecated directive, use ‘%define api.prefix {ematch_}’ [-Wdeprecated]
 %name-prefix "ematch_"

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:10:22 -07:00
Thomas Haller 62de07faf7 iprule: always print realms keyword for rule
# rule add priority 10 realms 1/0xF
    # rule add priority 10 realms 0/0xF
    # ip rule
    10:     from all lookup main 15
    10:     from all lookup main realms 1/15

The previous behavior was there since the beginning.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 927632d4da iprule: refactor print_rule() to use leading space before printing attribute
When printing the actions, we avoid adding the trailing space after the
attribute. Possibly because we expect the action to be the last output
on the line and not end with a space.

But for FR_ACT_TO_TBL nothing is printed. That means, we add double
spaces if a protocol is printed as well:

    # ip rule add priority 10 protocol 10 type 1

will be printed as

    10:     from all lookup 1  proto mrt

The only visible effect of the patch is to avoid the double-space and
avoid a trailing space if the action is FR_ACT_TO_TBL.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 461f0405f3 iprule: avoid trailing space in print_rule() after printing protocol
It seems print_rule() tries to avoid a trailing space at the end
of the line. At least, when printing details about the actions,
they no longer append the space. Probably expecting to be the
last attribute that will be printed.

Don't let the protocol add the trailing space. The space at the end
of the line should be printed consistently (or not).

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Thomas Haller 6f87b544ca iprule: avoid printing extra space after gateway for nat action
For all other actions we avoid the trailing space, so do it here
as well.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-24 15:06:15 -07:00
Kristian Evensen 112112b8eb ip fou: Support binding FOU ports
This patch adds support for binding FOU ports using iproute2.
Kernel-support was added in 1713cb37bf67 ("fou: Support binding FoU
socket").

The parse function now handles new arguments for setting the
binding-related attributes, while the print function writes the new
attributes if they are set. Also, the man page has been updated.

v2->v3:
* Remove redundant ll_init_map()-calls (thanks David Ahern).

v1->v2 (all changes suggested by David Ahern):
* Fix reverse Christmas tree ordering.
* Remove redundant peer_port_set-variable, it is enough to check
peer_port.
* Add proper error handling of invalid local/peer addresses.
* Use interface name and not index.
* Remove updating fou-header file, it is already done.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-22 11:42:54 -07:00
Nikolay Aleksandrov 90306a1440 iplink: bridge: add support for vlan_stats_per_port
Add support for manipulating and showing the vlan_stats_per_port bridge
option which can be toggled only when there are no port VLANs
configured. Also update the man page with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:47:39 -07:00
Ido Schimmel 185ba5e2d4 ipneigh: Print neighbour offload indication
Print the offload indication in case it is set on the neighbour.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-21 06:23:23 -07:00
Nikolay Aleksandrov 6e982e7b9b bridge: vlan: fix standard stats output
Each of the commits below broke the vlan stats output in a different
way:
- 45fca4ed94 ("bridge: fix vlan show stats formatting")
 Added a second print of an interface name (e.g. eth4eth4)
- c7c1a1ef51 ("bridge: colorize output and use JSON print library")
 Broke normal vlan stats output by not printing a new line after them
 Also printed interfaces without any vlans when printing stats

This fix is not pretty but it brings back the previous behaviour.

Before this fix:
$ bridge -s vlan show
port             vlan id
br0br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetseth4eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packetsroot@debian:~/

After this fix:
$ bridge -s vlan show
port             vlan id
br0              1 PVID Egress Untagged
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
                 4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets
eth4             4
                   RX: 0 bytes 0 packets
                   TX: 0 bytes 0 packets

Fixes: 45fca4ed94 ("bridge: fix vlan show stats formatting")
Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:30:05 -07:00
Nikolay Aleksandrov b11b495e7c bridge: mdb: restore valid json output
Since the commit below mdb's json output has been invalid and also with
changed format. Restore it to a valid json like the previous format.
Also takes care of a double "Deleted" print when monitoring for changes.

Example bridge -p -d -j mdb show:
 [ {
        "mdb": [ {
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 4,
                "dev": "virbr0",
                "port": "vnet2",
                "grp": "ff02::1:fffb:1939",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::202",
                "state": "temp",
                "flags": [ ]
            },{
                "index": 6,
                "dev": "virbr1",
                "port": "vnet7",
                "grp": "ff02::1:ffd0:f61f",
                "state": "temp",
                "flags": [ ]
            } ],
        "router": {
            "virbr0": [ {
                    "port": "vnet1"
                },{
                    "port": "vnet0"
                } ],
            "virbr1": [ {
                    "port": "vnet5"
                } ]
        }
    } ]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:27:06 -07:00
Beniamino Galvani d6abae5a7a ip: add missing space after 'external' in detailed mode
Add a missing space after the 'external' keyword in the detailed mode
of tunnel links output:

 # ip -d link
 79: geneve1: <BROADCAST,MULTICAST> mtu 65465 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether da:e9:e4:2b:f9:d4 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65465
     geneve externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 80: vxlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 7a:a8:19:07:da:01 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     vxlan externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 84: gre1@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/none 00:00:00:00 brd 00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0
     gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 87: ip6gre1@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/gre6 :: brd :: promiscuity 0 minmtu 0 maxmtu 0
     ip6gre externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 88: ip6tnl1@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65407
     ip6tnl externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
 90: ipip1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 0 maxmtu 0
     ipip externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Fixes: 00ff4b8e31 ("ip/tunnel: Be consistent when printing tunnel collect metadata")
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-17 16:26:31 -07:00
David Ahern 188c7fe6ea Update kernel headers
Update kernel headers to commit
    6b0a7f84ea1f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 14:07:48 -07:00
David Ahern 43de4ef694 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-17 13:59:44 -07:00
Eyal Birger aed63ae1ac ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies
The XFRMA_IF_ID attribute is set in policies/states for them to be
associated with an XFRM interface (4.19+).

Add support for setting / displaying this attribute.

Note that 0 is a valid value therefore set XFRMA_IF_ID if any value
was provided in command line.

Tested-by: Antony Antony <antony@phenome.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:26:43 -07:00
Ralf Baechle 8391023680 ip: display netrom link type
For a NETROM "ip link show dev nr0" will show

4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/generic 88:98:6a:a4:84:40:0a brd 00:00:00:00:00:00:00

But rather link/netrom is expected to be displayed.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-11 15:25:50 -07:00
Matt Ellison 286446c1e8 ip: support for xfrm interfaces
Interfaces take a 'if_id' which is an interface id which can be set on
an xfrm policy as its interface lookup key (XFRMA_IF_ID).

Signed-off-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:05:00 -07:00
Toke Høiland-Jørgensen d5d27f27d8 q_cake: Add support for setting the fwmark option
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:01:31 -07:00
Stephen Hemminger 41fc3fa04c uapi: update bpf.h
Updated bpf.h from 5.1-rc

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-05 15:00:48 -07:00
David Ahern 806a553c81 Merge branch 'rdma-dynamic-link-create' into next
Steve Wise  says:

====================

This series adds rdmatool support for creating/deleting rdma links.
This will be used, mainly, by soft rdma drivers to allow adding/deleting
rdma links over netdev interfaces.  It provides the user side for
the following kernel changes merged in linux-5.1.

Changes since v2:

- move checks for required parameters in the parameter handlers
- move final 'link add' processing to link_add_netdev()
- added reviewed-by tags

Changes since v1:

- move error receive checking from rd_sendrecv_msg() to rd_recv_msg().
- Add rd->suppress_errors to allow control over whether errors when
  reading a response should be ignored.  Namely: resource queries can
  get errors like "none found" when querying for resources, and this
  error should not be displayed.  So on a rd object basis, error
  suppression can be controlled.
- Rebased on rdma/for-next UABI (no need to sync rdma_netlink.h now)
- use chains of struct rd_cmd and rd_exec_cmd vs open coding the parsing
  for the 'link add' command.
- minor nit resolution
- added .mailmap file.  If this is not desired for iproute2, then please
  drop the patch.

Changes since RFC:

- add rd_sendrecv_msg() and make use of it in dev_set as well
  as the new link commands.
- fixed problems with the man pages
- changed the command line to use "netdev" as the keyword
  for the network device, do avoid confused with the ib_device
  name.
- got rid of the "type" parameter for link delete.  Also pass
  down the device index instead of the name, using the common
  rd services for validating the device name and fetching the
  index.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:05:17 -07:00
Steve Wise 1d45bf724e rdma: man page update for link add/delete
Update the 'rdma link' man page with 'link add/delete' info.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:33 -07:00
Steve Wise 4336c5821a rdma: add 'link add/delete' commands
Add new 'link' subcommand 'add' and 'delete' to allow binding a soft-rdma
device to a netdev interface.

EG:

rdma link add rxe_eth0 type rxe netdev eth0
rdma link delete rxe_eth0

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:30 -07:00
Steve Wise 8f5cfd23cd rdma: add helper rd_sendrecv_msg()
This function sends the constructed netlink message and then
receives the response.

Change rd_recv_msg() to display any error messages.

Change 'rdma dev set' to use rd_sendrecv_msg().

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:25 -07:00
Steve Wise 65147bbe8f Add .mailmap file
.mailmap allows tracking multiple email addresses to the proper user name.

Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-04-03 12:04:00 -07:00
Leslie Monis 519ace17f9 tc: pie: update man page
Update man page to reflect the changes made in Linux.

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Leslie Monis 492ec9558b tc: pie: change maximum integer value of tc_pie_xstats->prob
tc_pie_xstats->prob has a maximum value of (2^64 - 1).

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-29 14:26:00 -07:00
Stephen Hemminger 6754e1d978 ip: fix typo in iplink_vlan usage message
Need to use bar "|" rather than slash to indicate alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-27 07:56:07 -07:00
Hoang Le 35114a4cfe tipc: add link broadcast man page
Add a man page describing tipc link broadcast command get and set

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:21 -07:00
Hoang Le 5027f233e3 tipc: add link broadcast get
The command prints the actually method that multicast
is running in the system.
Also 'ratio' value for AUTOSELECT method.

A sample usage is shown below:
$tipc link get broadcast
BROADCAST

$tipc link get broadcast
AUTOSELECT ratio:30%

$tipc link get broadcast -j -p
[ {
        "method": "AUTOSELECT"
    },{
        "ratio": 30
    } ]

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:09:16 -07:00
Hoang Le 0ea46945bc tipc: add link broadcast set method and ratio
The command added here makes it possible to forcibly configure the
broadcast link to use either broadcast or replicast, in addition to
the already existing auto selection algorithm.

A sample usage is shown below:
$tipc link set broadcast BROADCAST
$tipc link set broadcast AUTOSELECT ratio 25

$tipc link set broadcast -h
Usage: tipc link set broadcast PROPERTY

PROPERTIES
 BROADCAST         - Forces all multicast traffic to be
                     transmitted via broadcast only,
                     irrespective of cluster size and number
                     of destinations

 REPLICAST         - Forces all multicast traffic to be
                     transmitted via replicast only,
                     irrespective of cluster size and number
                     of destinations

 AUTOSELECT        - Auto switching to broadcast or replicast
                     depending on cluster size and destination
                     node number

 ratio SIZE        - Set the AUTOSELECT criteria, percentage of
                     destination nodes vs cluster size

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:49 -07:00
David Ahern cdeb2674aa Update kernel headers
Update kernel headers to
    fa7e428c6b7e ("openvswitch: add seqadj extension when NAT is used.")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-26 16:08:05 -07:00
Stephen Hemminger f76ad635f2 man: break long lines in man page sources
No impact for output, just easier to edit.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 10:05:31 -07:00
Tobias Jungel b5a754b1db ip: bridge: add mcast to unicast config flag
This adds configuration for the IFLA_BRPORT_MCAST_TO_UCAST flag that
allows multicast packets to be replicated as unicast packets.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-22 09:44:49 -07:00
David Ahern 3efdd43667 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-20 00:38:08 -07:00
Stephen Hemminger dd4a2b6833 uapi: bpf add set_ce
New api from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:55 -07:00
Stephen Hemminger 828132fdd1 uapi: in6.h add router alert isolate
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:37:28 -07:00
Stephen Hemminger a7cd7baded uapi: add CAKE FWMARK
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:36:56 -07:00
Stephen Hemminger 8e1554625d rdma: update uapi headers from 5.1-rc1
Update from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 10:34:32 -07:00
Stephen Hemminger 50cf634899 Merge branch 'master' of ../iproute2-next 2019-03-19 10:32:45 -07:00
Stephen Hemminger 93d0d92f61 v5.0.0 2019-03-19 10:06:19 -07:00
Matteo Croce a0a639d9c0 ip route: get: print JSON output when -j is given
The ip -j option to print output as JSON is ignored when using 'route get':

    $ ip -j route get 127.0.0.1
    local 127.0.0.1 dev lo src 127.0.0.1 uid 1000
        cache <local>

Enable JSON output in iproute_get(), and don't let print_cache_flags() close
the JSON output, as it's not always the last called JSON function.

Tested on different route types:

    $ ip -j -p route get 127.0.0.1
    [ {
            "type": "local",
            "dst": "127.0.0.1",
            "dev": "lo",
            "prefsrc": "127.0.0.1",
            "flags": [ ],
            "uid": 1000,
            "cache": [ "local" ]
        } ]

    $ ip -d -j -p route get 192.0.2.1
    [ {
            "type": "unicast",
            "dst": "192.0.2.1",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "table": "main",
            "prefsrc": "192.168.85.2",
            "flags": [ ],
            "uid": 1000,
            "cache": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:50:01 -07:00
Matteo Croce 0736617738 ip route: print route type in JSON output
ip route generates an invalid JSON if the route type has to be printed,
eg. when detailed mode is active, or the type is different that unicast:

    $ ip -d -j -p route show
    [ {"unicast",
            "dst": "192.168.122.0/24",
            "dev": "virbr0",
            "protocol": "kernel",
            "scope": "link",
            "prefsrc": "192.168.122.1",
            "flags": [ "linkdown" ]
        } ]

    $ ip -j -p route show
    [ {"unreachable",
            "dst": "192.168.23.0/24",
            "flags": [ ]
        },{"prohibit",
            "dst": "192.168.24.0/24",
            "flags": [ ]
        },{"blackhole",
            "dst": "192.168.25.0/24",
            "flags": [ ]
        } ]

Fix it by printing the route type as the "type" attribute:

    $ ip -d -j -p route show
    [ {
            "type": "unicast",
            "dst": "default",
            "gateway": "192.168.85.1",
            "dev": "wlp3s0",
            "protocol": "dhcp",
            "scope": "global",
            "metric": 600,
            "flags": [ ]
        },{
            "type": "unreachable",
            "dst": "192.168.23.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "prohibit",
            "dst": "192.168.24.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        },{
            "type": "blackhole",
            "dst": "192.168.25.0/24",
            "protocol": "boot",
            "scope": "global",
            "flags": [ ]
        } ]

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter <phil@nwl.cc>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:36 -07:00
Kevin 'ldir' Darbyshire-Bryant ef1e02e6ac tc: m_connmark: fix action error messages
action m_connmark returns error messages identifying itself as the
'simple' action instead of 'connmark' action. e.g.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
simple: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

In what is most likely a copy/paste error from the simple action example
code, fix connmark error messages to identify themselves as coming from
connmark.

tc filter add dev eth0 protocol all u32 match u32 0 0 flowid 1:1 \
	action connmark index wrong
connmark: Illegal "index"
bad action parsing
parse_action: bad value (3:connmark)!
Illegal "action"

While we're here also fixup the 'Illegal "Zone"' error code to say
'Illegal "zone"' instead of 'Illegal "index"'

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-19 09:49:07 -07:00
David Ahern 7081aa6556 Merge branch 'bond-bridge-xstats-json' into next
Nikolay Aleksandrov  says:

====================

This set adds json output support to the xstats API (patch 01) and then
adds json support to the bridge xstats output (patch 02) and adds xstats
output support (both plain text and json) for the bonding (patch 03).
It doesn't change the bridge's plain text output, but it fixes an
inconsistency that could happen if new bridge xstats attributes were
added (print the interface name once for each group of xstats attrs).

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 14:01:35 -07:00
Nikolay Aleksandrov 440c5075d6 ip: bond: add xstats support
Add bond and bond_slave xstats support with optional json output.
Example:
- Plain text:
$ ip link xstats type bond 802.3ad
 bond0
                    LACPDU Rx 2017
                    LACPDU Tx 2038
                    LACPDU Unknown type Rx 0
                    LACPDU Illegal Rx 0
                    Marker Rx 0
                    Marker Tx 0
                    Marker response Rx 0
                    Marker response Tx 0
                    Marker unknown type Rx 0

- JSON:
$ ip -j -p link xstats type bond 802.3ad
  [ {
        "ifname": "bond0",
        "802.3ad": {
            "lacpdu_rx": 219,
            "lacpdu_tx": 241,
            "lacpdu_unknown_rx": 0,
            "lacpdu_illegal_rx": 0,
            "marker_rx": 0,
            "marker_tx": 0,
            "marker_response_rx": 0,
            "marker_response_tx": 0,
            "marker_unknown_rx": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:16 -07:00
Nikolay Aleksandrov a9bc23a792 ip: bridge: add xstats json support
Add json support for bridge's xstats output.
The plain text output format should remain the same.
Note that this patch pulls the interface out of the attribute
loop, this was an oversight when the set was upstreamed. This does not
change the output format, but fixes it when new xstats attributes show
up.

Example:
$ ip -p -j link xstats type bridge
  [ {
        "ifname": "br0",
        "multicast": {
            "igmp_queries": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_reports": {
                "rx_v1": 0,
                "rx_v2": 32,
                "rx_v3": 0,
                "tx_v1": 0,
                "tx_v2": 0,
                "tx_v3": 0
            },
            "igmp_leaves": {
                "rx": 0,
                "tx": 0
            },
            "igmp_parse_errors": 0,
            "mld_queries": {
                "rx_v1": 33,
                "rx_v2": 0,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_reports": {
                "rx_v1": 66,
                "rx_v2": 2,
                "tx_v1": 0,
                "tx_v2": 0
            },
            "mld_leaves": {
                "rx": 0,
                "tx": 0
            },
            "mld_parse_errors": 0
        }
    } ]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:58:09 -07:00
Nikolay Aleksandrov 8ff3d1d3a3 ip: xstats: add json output support
This adds only initial object support if json argument is specified.
Later patches convert the current xstats users to json.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-15 13:55:57 -07:00
Stephen Hemminger f36f8fe535 ipaddress: print error message on stderr
Convention is to print error messages only on stderr.
Helps when scripting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-15 08:30:26 -07:00
Thomas Haller 546109a7cf iprule: fix printing hint about unresolved iifname and oifname
was displayed as

    10:     from all iif eth1 [detached] goto 10000unresolved proto mrt

now:

    10:     from all iif eth1 [detached] goto 10000 [unresolved] proto mrt

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-07 16:14:09 -08:00
David Ahern be029b3a58 Merge branch 'iproute2-master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:55:05 -08:00
Roopa Prabhu c5b176e5ba bridge: fdb: add support for src_vni option
We already print src_vni for a fdb entry when present.
This patch adds the ability to set src_vni on a fdb
entry. When not specified, kernel will use vni specified
on the vxlan device. This can be used on a vxlan fdb entry
when the vxlan device is in external or collect metadata
mode.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-03-05 07:52:34 -08:00
Petr Vorel 8b5f9338e5 man: Document COLORFGBG environment variable
Default colors are not contrast enough on dark backround
and this functionality, which uses more suitable colors
is hidden in the code.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
2019-03-01 11:09:18 -08:00
Dmytro Linkin 2f103545a5 tc/pedit: Fix wrong pedit ipv6 structure id
Tc pedit action with more than two ip6 munge in a row cause infinite
loop.

Example:

$ tc filter add dev eth0 protocol ipv6 parent ffff: \
flower ip_proto sctp \
    action pedit ex \
        munge ip6 hoplimit set 0x1 \
        munge ip6 src set 2001:0db8:0:f101::1 \
        munge that cause infinite loop

The example command never returns, instead of failing with parse error
as expected. Pedit ipv6 structure has wrong id, which leads to the
creation linked list with one node in tc/m_pedit.c:get_pedit_kind(),
referring to itself. This node is created if command have two ip6 munge
in a row, and any third ip6 munge will cause infinite loop.
Changing this id from "ipv6" to "ip6" solves the problem.

Fixes: f3e1b2448a ("pedit: Introduce ipv6 support")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-03-01 11:05:00 -08:00
David Ahern 83a8b8d9f1 Merge branch 'devlink-health' into next
Aya Levin  says:

====================

This series adds support for devlink health commands:
 devlink health show     [ DEV reporter REPORTER_NAME ]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME { grace_period | auto_recover } { msec | boolean }

The first patch refactors the validation of input parameters, which
grow way too long. Second and third patches fix bugs that were
discovered during the devlink health development. The forth patch adds
helper functions which enable output of value and labels separately.
Patches 5-10 add the devlink health functionality by command, the last
is the man page.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 08:00:19 -08:00
Aya Levin 3147e0d372 devlink: Add devlink-health man page
Add a man page describing devlink health's command set. Also add a
reference link from devlink main man page.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:21 -08:00
Aya Levin b18d89195b devlink: Add devlink health set command
Add devlink set command which enables the user to configure parameters
related to the devlink health mechanism per reporter.
1) grace_period [msec] time interval between auto recoveries.
2) auto_recover [true/false] whether the devlink should execute automatic
recover on error.
Add a helper function to retrieve a boolean value as an input parameter.
Example:
$ devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$ devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:16 -08:00
Aya Levin 04b583d0ff devlink: Add devlink health dump clear command
Add devlink dump clear command which deletes the last saved dump file.
Clearing the last saved dump enables a new dump file to be saved.
Example:
$ devlink health dump clear pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:10 -08:00
Aya Levin 041e6e651a devlink: Add devlink health dump show command
Add devlink dump show command which displays the last saved dump.
Devlink health saves a single dump. If a dump is not already stored
by the devlink for this reporter, devlink generates a new dump. The dump
can be generated automatically when a reporter reports on an
error or manually by user's request.
The dump's output is defined by the reporter. The command uses the
infra structure for flexible format output introduced in previous patch.
Example:
$ devlink health dump show pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:56:04 -08:00
Aya Levin 7b8baf834d devlink: Add devlink health diagnose command
Add devlink health diagnose command: enabling retrieval of diagnostics data
by the user on a reporter on a device. The command's output is a
free text defined by the reporter.

This patch also introduces an infra structure for flexible format
output. This allow the command to display different data fields
according to the reporter.
Example:
$ devlink health diagnose pci/0000:00:0a.0 reporter tx
SQs:
  sqn: 4403 HW state: 1 stopped: false
  sqn: 4408 HW state: 1 stopped: false
  sqn: 4413 HW state: 1 stopped: false
  sqn: 4418 HW state: 1 stopped: false
  sqn: 4423 HW state: 1 stopped: false

$ devlink health diagnose pci/0000:00:0a.0 reporter tx -jp
{
 "SQs":[
      {
       "sqn":4403,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4408,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4413,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4418,
       "HW state":1,
       "stopped":false
     },
      {
       "sqn":4423,
       "HW state":1,
       "stopped":false
     }
   ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:57 -08:00
Aya Levin 9083a1344d devlink: Add devlink health recover command
Add devlink health recover command which enables the user to initiate a
recovery on a reporter (if a recovery cb was supplied by the reporter).
This operation will increment the recoveries counter displayed in the
show command.
Example:
$ devlink health recover pci/0000:00:09.0 reporter tx

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:50 -08:00
Aya Levin 2f1242efe9 devlink: Add devlink health show command
Add devlink health show command which displays status and configuration
info on a specific reporter on a device or dump the info on all
reporters on all devices. Add helper functions to display status and
dump's time stamp.
Example:
$ devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
 name tx
  state healthy error 0 recover 1 last_dump_date 2019-02-14 last_dump_time 10:10:10 grace_period 600 auto_recover true
$ devlink health show pci/0000:00:09.0 reporter tx -jp
{
 "health":{
  "pci/0000:00:0a.0":[
     {
     "name":"tx",
     "state":"healthy",
     "error":0,
     "recover":1,
     "last_dump_date":"2019-Feb-14",
     "last_dump_time":"10:10:10",
     "grace_period":600,
     "auto_recover":true
    }
  ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:41 -08:00
Aya Levin 844a61764c devlink: Add helper functions for name and value separately
Add a new helper functions which outputs only values (without name
label) for different types: boolean, uint, uint64, string and binary.
In addition add a helper function which prints only the name label.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:35 -08:00
Aya Levin 8257e6c49c devlink: Fix boolean JSON print
This patch removes the inverted commas from boolean values in JSON
format: true/false instead of "true"/"false".

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:55:30 -08:00
Aya Levin 86648a1960 devlink: Fix print of uint64_t
This patch prints uint64_t with its corresponding format and avoid implicit
cast to uint32_t.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:54:32 -08:00
Aya Levin ae72e65518 devlink: Refactor validation of finding required arguments
Introducing argument's metadata structure matching a bitmap flag per
required argument and an error message if missing. Using this static
array to refactor validation of finding required arguments in devlink
command line and to ease further maintenance.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-28 07:53:36 -08:00
Leon Romanovsky 359f69f76f rdma: Add the prefix for driver attributes
There is a need to distinguish between driver vs. general exposed
attributes. The most common use case is to expose some internal
garbage under extremely common and sexy name, e.g. pi, ci e.t.c

In order to achieve that, we will add "drv_" prefix to all strings
which were received through RDMA_NLDEV_ATTR_DRIVER_* attributes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>a
Tested-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:47 -08:00
Jakub Kicinski d326d79c22 devlink: add support for updating device flash
Add new command for updating flash of devices via devlink API.
Example:

$ cp flash-boot.bin /lib/firmware/
$ devlink dev flash pci/0000:05:00.0 file flash-boot.bin

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:25:14 -08:00
David Ahern 41fda879a1 Update kernel headers
Update kernel headers to commit:
    ff8285f81822 ("net: sched: pie: fix 64-bit division")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-27 08:23:22 -08:00
David Ahern 65a94784fb Merge branch 'rdma-object-ids' into next
Leon Romanovsky says:

====================

This series adds ability to present and query all known to rdmatool
object by their respective, unique IDs (e.g. pdn. mrn, cqn e.t.c).
All objects which have "parent" object has this information too.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:13:21 -08:00
Leon Romanovsky 78728b7ee0 rdma: Provide and reuse filter functions
Globally replace all filter function in safer variants of those
is_filtered functions, which take into account the availability/lack
of netlink attributes.

Such conversion allowed to fix a number of places in the code, where
the previous implementation didn't honor filter requests if netlink
attribute wasn't present.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:24 -08:00
Leon Romanovsky 5a823593d6 rdma: Perform single .doit call to query specific objects
If user provides specific index, we can speedup query
by using .doit callback and save full dump and filtering
after that.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:15 -08:00
Leon Romanovsky 127ff95610 rdma: Unify netlink attribute checks prior to prints
Place check if netlink attribute available in general place,
instead of doing the same check in many paces.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:09 -08:00
Leon Romanovsky 6da9d2517c rdma: Move QP code to separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:05 -08:00
Leon Romanovsky f9a73796d1 rdma: Place PD parsing print routine into separate function
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:09:00 -08:00
Leon Romanovsky 46695227d6 rdma: Move MR code to be suitable for per-line parsing
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:46 -08:00
Leon Romanovsky 83ea72289e rdma: Refactor CQ prints
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:40 -08:00
Leon Romanovsky 7d06b31f0e rdma: Simplify CM_ID print code
Refactor our the CM_ID print code.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:33 -08:00
Leon Romanovsky 05846c9cd3 rdma: Simplify code to reuse existing functions
Remove duplicated functions in favour general res_print_uint() call.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:28 -08:00
Leon Romanovsky 835d83216b rdma: Properly mark RDMAtool license
RDMA subsystem is dual-licensed with "GPL-2.0 OR Linux-OpenIB" proper
license and Mellanox submission are supposed to have this type of license.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:07:19 -08:00
Leon Romanovsky 687daf98f9 rdma: Move resource QP logic to separate file
Logically separate resource QP logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:56 -08:00
Leon Romanovsky 438fac3a25 rdma: Move out resource CM-ID logic to separate file
Logically separate resource CM-ID logic to separate file,
in order to make CM-ID specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:48 -08:00
Leon Romanovsky fcdd2e0c68 rdma: Move out resource CQ logic to separate file
Logically separate resource CQ logic to separate file,
in order to make CQ specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:40 -08:00
Leon Romanovsky 42ed283e4a rdma: Refactor out resource MR logic to separate file
Logically separate resource MR logic to separate file,
in order to make MR specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:25 -08:00
Leon Romanovsky cc6131276c rdma: Move resource PD logic to separate file
Logically separate resource PD logic to separate file,
in order to make PD specific logic self-contained.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:06:12 -08:00
Leon Romanovsky 96f59e7fdc rdma: Provide parent context index for all objects except CM_ID
Allow users to correlate allocated object with relevant parent

[leonro@server ~]$ rdma res show pd
dev mlx5_0 users 5 pid 0 comm [ib_core] pdn 1
dev mlx5_0 users 7 pid 0 comm [ib_ipoib] pdn 2
dev mlx5_0 users 0 pid 0 comm [mlx5_ib] pdn 3
dev mlx5_0 users 2 pid 548 comm ibv_rc_pingpong ctxn 0 pdn 4

[leonro@server ~]$ rdma res show cq cqn 0-100
dev mlx5_0 cqe 2047 users 6 poll-ctx UNBOUND_WORKQUEUE pid 0 comm [ib_core] cqn 2
dev mlx5_0 cqe 255 users 2 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 3
dev mlx5_0 cqe 511 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 4
dev mlx5_0 cqe 255 users 1 poll-ctx DIRECT pid 0 comm [ib_ipoib] cqn 5
dev mlx5_0 cqe 255 users 0 poll-ctx SOFTIRQ pid 0 comm [mlx5_ib] cqn 6
dev mlx5_0 cqe 511 users 2 pid 548 comm ibv_rc_pingpong cqn 7 ctxn 0

[leonro@server ~]$ rdma res show mr
dev mlx5_0 mrlen 4096 pid 548 comm ibv_rc_pingpong mrn 4 pdn 0

[leonro@nps-server-14-015 ~]$ /images/leonro/src/iproute2/rdma/rdma res show qp
link mlx5_0/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_0/1 lqpn 8 type UD state RTS sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_0/1 lqpn 9 pdn 4 rqpn 0 type RC state INIT rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 548 comm ibv_rc_pingpong

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:05:43 -08:00
Leon Romanovsky 1dc035865d rdma: Provide unique indexes for all visible objects
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:04:02 -08:00
Leon Romanovsky beac6a3990 rdma: Remove duplicated print code
There is no need to keep same print functions for
uint32_t and uint64_t, unify them into one function.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:58 -08:00
Leon Romanovsky a985cc06bd rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
f2a0e45f36b0 RDMA/nldev: Don't expose number of not-visible entries

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-24 07:03:14 -08:00
David Ahern 55870dfe7f Improve batch and dump times by caching link lookups
ip route uses ll_name_to_index and ll_index_to_name to convert between
device names and indices. At the moment both use for the ioctl based glibc
functions if_nametoindex and if_indextoname and does not cache the result.
When using a batch file or dumping large number of routes this means the
same device lookups can be done repeatedly adding unnecessary overhead
(socket + ioctl + close for each device lookup).

Add a new function, ll_link_get, to send a netlink based RTM_GETLINK. If
successful, cache the result in idx_head and name_head so future lookups
can re-use the entry. Update ll_name_to_index and ll_index_to_name to use
ll_link_get and only fallback to the glibc functions if it fails.

With this change the time to install 720,022 routes with 2 ecmp nexthops
where the nexthop device is given is reduced from 31.4 seconds to 19.2
seconds. A dump of those routes drops from 13.3 to 2.8 seconds.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:20 -08:00
David Ahern db1aafd883 ip link: Drop cache entry on any changes
Remove any entry from the link cache when the link is modified.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:18 -08:00
David Ahern 25c6339b22 ll_map: Add function to remove link cache entry by index
Add ll_drop_by_index to remove an entry from the link cache.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:51:15 -08:00
David Ahern 9f78e995a8 Merge branch 'iproute2-master' into next
Conflicts:
	misc/ss.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-22 18:50:39 -08:00
Stefano Brivio aa5bd6a252 ss: Render buffer to output every time a number of chunks are allocated
Eric reported that, with 10 million sockets, ss -emoi (about 1000 bytes
output per socket) can easily lead to OOM (buffer would grow to 10GB of
memory).

Limit the maximum size of the buffer to five chunks, 1M each. Render and
flush buffers whenever we reach that.

This might make the resulting blocks slightly unaligned between them, with
occasional loss of readability on lines occurring every 5k to 50k sockets
approximately. Something like (from ss -tu):

[...]
CLOSE-WAIT   32       0           192.168.1.50:35232           10.0.0.1:https
ESTAB        0        0           192.168.1.50:53820           10.0.0.1:https
ESTAB       0        0           192.168.1.50:46924            10.0.0.1:https
CLOSE-WAIT  32       0           192.168.1.50:35228            10.0.0.1:https
[...]

However, I don't actually expect any human user to scroll through that
amount of sockets, so readability should be preserved when it matters.

The bulk of the diffstat comes from moving field_next() around, as we now
call render() from it. Functionally, this is implemented by six lines of
code, most of them in field_next().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:45:45 -08:00
Thomas De Schampheleire 9700927a00 ss: fix compilation under glibc < 2.18
Commit c759116a0b introduced support for
AF_VSOCK. This define is only provided since glibc version 2.18, so
compilation fails when using older toolchains.

Provide the necessary definitions if needed.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:40:52 -08:00
Stephen Hemminger 6f618a6a82 uapi: update inet_diag_info.h
Upstream changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:24:07 -08:00
Vivien Didelot 02723cf230 bridge: make mcast_flood description consistent
This patch simply changes the description of the mcast_flood flag
with "flood" instead of "be flooded with" to avoid confusion, and be
consistent with the description of the flooding flag, which "Controls
whether a given port will *flood* unicast traffic for which there is
no FDB entry."

At the same time, fix the documentation for the "flood" flag which
is incorrectly described as "flooding on" or "flooding off".

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:22:05 -08:00
Jiri Pirko 0e7e181945 devlink: relax dpipe table show dependency on resources
Dpipe table show command has a depencency on getting resources.
If resource get command is not supported by the driver, dpipe table
show fails. However, resource is only additional information
in dpipe table show output. So relax the dependency and let
the dpipe tables be shown even if resources get command fails.

Fixes: ead180274c ("devlink: Add support for resource/dpipe relation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:21:19 -08:00
Phil Sutter d7cf2416fc ip-address: Use correct max attribute value in print_vf_stats64()
IFLA_VF_MAX is larger than the highest valid index in vf array.

Fixes: a1b99717c7 ("Add displaying VF traffic statistics")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-21 14:16:08 -08:00
Thomas Haller f5f8e96953 ip-rule: fix json key "to_tbl" for unspecific rule action
The key should not be called "to_tbl" because it is exactly
not a FR_ACT_TO_TBL action. Change it to "action".

    # ip rule add blackhole
    # ip -j rule | python -m json.tool
    ...
    {
        "priority": 0,
        "src": "all",
        "to_tbl": "blackhole"
    },

This is an API break of JSON output as it was added in v4.17.0.
Still change it as the API is relatively new and unstable.

Fixes: 0dd4ccc56c ("iprule: add json support")

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:21:06 -08:00
Luca Boccassi c2f9dc14c4 ip route: get: allow zero-length subnet mask
A /0 subnet mask is theoretically valid, but ip route get doesn't allow
it:

$ ip route get 1.0.0.0/0
need at least a destination address

Change the check and remember whether we found an address or not, since
according to the documentation it's a mandatory parameter.

$ ip/ip route get 1.0.0.0/0
1.0.0.0 via 192.168.1.1 dev eth0 src 192.168.1.91 uid 1000
    cache

Reported-by: Clément Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-19 15:19:31 -08:00
Matteo Croce 619765fe14 iplink: document XDP subcommand to force the XDP mode.
When attaching an eBPF program to a device, ip link can force the XDP mode
by using the xdp{generic,drv,offload} keyword instead of just 'xdp'.
Document this behaviour also in the help output.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Fixes: 14683814 ("bpf: add xdpdrv for requesting XDP driver mode")
Fixes: 1b5e8094 ("bpf: allow requesting XDP HW offload")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 14:02:44 -08:00
Konstantin Khlebnikov 0f3f0ca3a2 ss: add option --tos for requesting ipv4 tos and ipv6 tclass
Also show socket class_id/priority used by classful qdisc.
Kernel report this together with tclass since commit
("inet_diag: fix reporting cgroup classid and fallback to priority")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:59:11 -08:00
Eric Dumazet bb5ae621d0 lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.

Commit 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.

See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.

Fixes: 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-13 13:51:44 -08:00
Davide Caratti ca81444303 use print_{,h}hu instead of print_uint when format specifier is %{,h}hu
in this way, a useless cast to unsigned int is avoided in bpf_print_ops()
and print_tunnel().

Tested with:
 # ./tdc.py -c bpf

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-10 19:00:59 -08:00
Marcos Antonio Moraes 9e46c5c206 tc: use bits not mbits/sec in rate percent
As /sys/class/net/<iface>/speed indicates a value in Mbits/sec, the
conversion is necessary to create the correct limits.

This guarantees the same result for the following commands in an
1000Mbit/sec device:

tc class add ... htb rate 500Mbit
tc class add ... htb rate 50%

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Marcos Antonio Moraes <marcos.antonio@digirati.com.br>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-08 09:59:45 -08:00
Stephen Hemminger 817204d0b0 tc: avoid problems with hard coded rate string length
The parse_percent_rate function assumed the buffer was 20 characters.
Better to pass length in case the size ever changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:49:47 -08:00
Stephen Hemminger 2d603d55a8 tc: fix memory leak in error path
If value passed to parse_percent was not valid, it would
leak the dynamic allocation from sscanf.

Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-06 10:41:58 -08:00
Jakub Kicinski 05bc89e95e devlink: add info subcommand
Add support for reading the device serial number, driver name
and various versions.  Example:

$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
  driver nfp
  serial_number 16240145
  versions:
      fixed:
        board.id AMDA0081-0001
        board.rev 15
        board.vendor SMA
        board.model hydrogen
      running:
        fw.mgmt 010181.010181.0101d4
        fw.cpld 0x1030000
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536
      stored:
        fw.mgmt 010181.010181.0101d4
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536

$ devlink -jp dev info pci/0000:82:00.0
{
    "info": {
        "pci/0000:82:00.0": {
            "driver": "nfp",
            "serial_number": "16240145",
            "versions": {
                "fixed": {
                    "board.id": "AMDA0081-0001",
                    "board.rev": "15",
                    "board.vendor": "SMA",
                    "board.model": "hydrogen"
                },
                "running": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.cpld": "0x1030000",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                },
                "stored": {
                    "fw.mgmt": "010181.010181.0101d4",
                    "fw.app": "abm-d372b6",
                    "fw.undi": "0.0.2",
                    "chip.init": "AMDA-0081-0001  20160318164536"
                }
            }
        }
    }
}

v5:
 - remove spurious new line.
v4:
 - more commit message improvements.
v3:
 - show up-to-date output in the commit message.
v2 (Jiri):
 - remove filtering;
 - add example in the commit message.
RFCv2:
 - make info subcommand of dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:50 -08:00
Jakub Kicinski 9d886c6609 devlink: report cell size
Print the value of DEVLINK_ATTR_SB_POOL_CELL_SIZE, if reported.

Example:
pci/0000:82:00.0:
  sb 1 pool 0 type egress size 40945664 thtype static cell_size 2048
  sb 2 pool 0 type egress size 258867200 thtype static cell_size 10240
...

v3: - don't double space.
v2: - fix spelling.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:46:22 -08:00
David Ahern 6b2d60bdfc Update kernel headers
Update kernel headers to commit:
bfbae2eafe05 ("Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-06 08:45:41 -08:00
Yonghong Song 3da6d055d9 bpf: add btf func and func_proto kind support
The issue is discovered for bpf selftest test_skb_cgroup.sh.
Currently we have,
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  Object has unknown BTF type: 13!
  [PASS]

In the above the BTF type 13 refers to BTF kind
BTF_KIND_FUNC_PROTO.
This patch added support of BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC during type parsing.
With this patch, I got
  $ ./test_skb_cgroup_id.sh
  Wait for testing link-local IP to become available ... OK
  [PASS]

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:29:20 -08:00
Ido Schimmel 264be1d887 bridge: fdb: Fix FDB dump with strict checking disabled
While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.

Fix this by using RTM_GETNEIGH instead.

Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002

After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent

Fixes: 05880354c2 ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:27:28 -08:00
Chris Mi 17ed56fdf3 libnetlink: linkdump_req: AF_PACKET family also expects ext_filter_mask
Without this fix, the VF info can't be showed using command
"ip link".

146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
    vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")

Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-02-05 15:25:43 -08:00
Davide Caratti e8a3d76919 tc: add 'kind' property to 'csum' action
unlike other TC actions already supporting JSON printout, 'csum' does not
print the value of TCA_KIND in the 'kind' property: remove 'csum' word
from 'csum' property, and add a separate 'kind' property containing the
action name. The human-readable printout is preserved.

Tested with:
 # ./tdc.py -c csum

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:38 -08:00
Davide Caratti 52d57f6bbd tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.

Example using eBPF:

 # tc actions flush action bpf
 # tc action add action bpf object bpf/action.o section 'action-ok'
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bpf_name": "action.o:[action-ok]",
         "prog": {
           "id": 33,
           "tag": "a04f5eef06a7f555",
           "jited": 1
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 1,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Example using cBPF:

 # tc actions flush action bpf
 # a=$(mktemp)
 # tcpdump -ddd not ether proto 0x888e >$a
 # tc action add action bpf bytecode-file $a index 42
 # rm $a
 # tc -j action list action bpf | jq
 [
   {
     "total acts": 1
   },
   {
     "actions": [
       {
         "order": 0,
         "kind": "bpf",
         "bytecode": {
           "length": 4,
           "insns": [
             {
               "code": 40,
               "jt": 0,
               "jf": 0,
               "k": 12
             },
             {
               "code": 21,
               "jt": 0,
               "jf": 1,
               "k": 34958
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 0
             },
             {
               "code": 6,
               "jt": 0,
               "jf": 0,
               "k": 262144
             }
           ]
         },
         "control_action": {
           "type": "pipe"
         },
         "index": 42,
         "ref": 1,
         "bind": 0
       }
     ]
   }
 ]

Tested with:
 # ./tdc.py -c bpf

Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-03 09:10:10 -08:00
Björn Töpel 2abc3d76e3 ss: add AF_XDP support
AF_XDP is an address family that is optimized for high performance
packet processing.

This patch adds AF_XDP support to ss(8) so that sockets can be queried
and monitored.

Example:
$ sudo ss --xdp -e -p -m
Recv-Q      Send-Q           Local Address:Port             Peer Address:Port

0           0                   enp134s0f0:q20                          *
 users:(("xdpsock",pid=17787,fd=3)) ino:39424 sk:4
        rx(entries:2048)
        tx(entries:2048)
        umem(id:1,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:7,
qid:20,zc:0,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)
0           0                    enp24s0f0:q0                           *
 users:(("xdpsock",pid=17780,fd=3)) ino:37384 sk:5
        rx(entries:2048)
        tx(entries:2048)
        umem(id:0,size:8388608,num_pages:2048,chunk_size:2048,headroom:0,ifindex:6,
qid:0,zc:1,refs:1)
        fr(entries:2048)
        cr(entries:2048) skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-30 20:57:45 -08:00
David Ahern f79b7733b4 Update kernel headers and add xdp_diag.h
Update kernel headers to commit:
c829f5f52db9 ("cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()")

and import xdp_diag.h for the next patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:34:40 -08:00
Matteo Croce e3dbcb2a12 netns: add subcommand to attach an existing network namespace
ip tracks namespaces with dummy files in /var/run/netns/, but can't see
namespaces created with other tools.
Creating the dummy file and bind mounting the correct procfs entry will
make ip aware of that namespace.
Add an ip netns subcommand to automate this task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-29 18:18:03 -08:00
Stephen Hemminger 6f1940da8e tc: replace left side comparison
The kernel (and iproute2) don't use the if (NULL == x) style
and instead prefer if (!x)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-28 08:51:03 -08:00
Hans Dedecker 2874714662 f_flower: fix build with musl libc
XATTR_SIZE_MAX requires the usage of linux/limits.h; let's include it

Signed-off-by: Hans Dedecker <dedeckeh@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:20:03 +13:00
wenxu 3d65cefbef iproute: Set ip/ip6 lwtunnel flags
ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

User can set flags with key, csum, seq
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 key csum dev gretap

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-25 09:17:27 +13:00
David Ahern b45664e064 Merge 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-22 08:30:38 -08:00
Jakub Kicinski 8513f4a926 ip route: get: only set RTM_F_LOOKUP_TABLE flag for IPv4
Kernel ignores the RTM_F_LOOKUP_TABLE flag for all families
but IPv4.  Don't set it, otherwise it may fall foul of
strict checking policies.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:13 +13:00
Adi Nissim dc0332b1e8 tc: m_tunnel_key: Allow key-less tunnels
Change the id parameter of the tunnel_key set action from mandatory to
optional.

Some tunneling protocols (e.g. GRE) specify the id as an optional field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:04:07 +13:00
Stephen Hemminger 3bc2dc7668 uapi: in.h change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-22 16:03:31 +13:00
Benedict Wong a6af9f2e61 xfrm: add option to hide keys in state output
ip xfrm state show currently dumps keys unconditionally. This limits its
use in logging, as security information can be leaked.

This patch adds a nokeys option to ip xfrm ( state show | monitor ), which
prevents the printing of keys. This allows ip xfrm state show to be used
in logging without exposing keys.

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:31:20 -08:00
Cong Wang b0ca46a1f8 tc: add hit counter for matchall
Cc: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:30:07 -08:00
David Ahern dad02ef478 Update kernel headers
Update kernel headers to commit
28f9d1a3d4fe ("Merge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:29:26 -08:00
Leon Romanovsky b058f969df rdma: Add unbound workqueue to list of poll context types
Kernel commit f794809a7259 ("IB/core: Add an unbound WQ type to the new CQ API")
added new CQ poll context type, reflect this change in rdmatool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-21 08:19:44 -08:00
Leon Romanovsky 54a0ade6d4 clang-format: add configuration file
The codebase of iproute2 follows Linux kernel coding style,
so it will be very helpful to reuse existing clang configuration
file to reliably format code.

For more information see kernel commit d4ef8d3ff005
("clang-format: add configuration file").

Updated upto commit v5.0-rc1 with small number of ForEachMacros.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-17 13:38:23 -08:00
Luca Boccassi 0cf061183e Makefile: check manpages for syntax errors
Pass the same parameters Lintian uses in Debian.

$ make check
<...>
Checking manpages for syntax errors...
<standard input>:48: warning: macro `Q' not defined
Error in tc-taprio.8
Makefile:27: recipe for target 'check' failed

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi 8242808ced man: tc-taprio.8: fix syntax error
.Q does not exist so groff complains and the "queues" word is actually
not displayed.

Fixes: 579acb4bc5 ("taprio: Add manpage for tc-taprio(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
Luca Boccassi cffeeb3946 man: ss.8: more line breaks
groff stiff complains about unbreakable lines:
  96: warning [p 2, 3.0i]: can't break line

Indent it some more.

Fixes: 7f5047524c ("man: ss.8: break and indent long line")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-14 08:01:51 -08:00
David Ahern 3d14706e54 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-07 16:30:13 -08:00
Dmitry V. Levin db4ad742e1 configure: fix typo in check_xt_old_internal_h
Fixes: 377a09902a ("configure: Minor code cleanup")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 14:42:01 -08:00
Stephen Hemminger 80e5ddec14 rdma: update uapi headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-01-07 11:41:39 -08:00
Stephen Hemminger e1ccc46bdd uapi: update headers from 4.21-rc1
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2019-01-07 11:39:26 -08:00
Stephen Hemminger 724ec5aeb0 Merge ../iproute2-next 2019-01-07 11:36:41 -08:00
Stephen Hemminger 97864a5af3 v4.20.0 2019-01-07 10:24:02 -08:00
Tobias Jungel c9159af51a ipneigh: print dst for AF_BRIDGE
In case a neighbour message is of family AF_BRIDE the NDA_DST attribute
was not printed so far. With this patch the family is evaluated to pass
the correct family to format_host_rta.

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
2019-01-07 10:22:03 -08:00
David Ahern 97b44d571d libnetlink: linkdump_req is done for AF_BRIDGE as well
The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.

Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-07 08:36:58 -08:00
David Ahern dfa2c3787f Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iprule.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:22:47 -08:00
David Ahern 6267d9533b Merge branch 'strict-updates' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:19:37 -08:00
David Ahern 05880354c2 bridge: fdb: Fix filtering with strict checking disabled
Older kernels expect an ifinfomsg struct as the ancillary header, and
after kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header") can handle either ifinfomsg or ndmsg. Strict data checking only
allows ndmsg.

Use the new RTNL_HANDLE_F_STRICT_CHK flag to know which header to send.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2019-01-04 12:17:19 -08:00
David Ahern 285033bfeb libnetlink: Add RTNL_HANDLE_F_STRICT_CHK flag
Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.

Extracted from patch from Ido to fix filtering with strict checking
enabled.

Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:17 -08:00
David Ahern 66b4199f22 bridge: Update fdb show to use rtnl_neighdump_req
Add fdb_dump_filter to set filter attributes in dump request
and convert fdb_show to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:15 -08:00
David Ahern 101ec10a76 ip neigh: Convert do_show_or_flush to use rtnl_neighdump_req
Add ipneigh_dump_filter to add filter attributes to the neighbor
dump request and update do_show_or_flush to use rtnl_neighdump_req.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:13 -08:00
David Ahern f255ab1225 libnetlink: Add filter function to rtnl_neighdump_req
Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.

Signed-off-by: David Ahern <dsahern@gmail.com>
2019-01-04 12:17:11 -08:00
Leon Romanovsky f0cabaca38 rdma: Fix incorrectly handled NLA validation
mnl_attr_type_valid() receives maximum attribute type, which means that
we were supposed to supply the latest valid netlink attribute and not
the number of attributes. Such coding mistake caused to failures while
NLA attributes were extended.

Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 22:15:13 -08:00
wenxu cb65a9cb81 iprule: Add tun_id filed in the selector
ip rule add from all iif gretap tun_id 2000 lookup 200

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 22:13:13 -08:00
Eric Dumazet 72cdb77d1a nstat: fix load_ugly_table() limits
A recent change reduced max line length from 4096 to 2048 bytes,
but we already have lines above the 2048 threshold, and we keep
adding more SNMP counters in linux.

Switch to getline() and do not worry about future kernel changes.

Fixes: da8034a019 ("misc: avoid snprintf warnings in ss and nstat")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-31 21:45:53 -08:00
Ido Schimmel 66e8e73edc bridge: fdb: Use 'struct ndmsg' for FDB dumping
Since commit aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on
socket") iproute2 uses strict checking on kernels that support it. This
causes FDB dumping to fail [1], as iproute2 uses 'struct ifinfomsg'
whereas the kernel expects 'struct ndmsg'.

Note that with this change iproute2 continues to work on old kernels
that do not support strict checking, but contain the fix introduced in
kernel commit bd961c9bc664 ("rtnetlink: fix rtnl_fdb_dump() for ndmsg
header").

[1]
# bridge fdb show
[ 5365.137224] netlink: 4 bytes leftover after parsing attributes in process `bridge'.
Error: bytes leftover after parsing attributes.
Dump terminated

Fixes: aea41afcfd ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-30 16:56:34 -08:00
Michael Guralnik 40fc8c2cec rdma: Add print of link CapabilityMask2 flags
CapabilityMask2 is defined in IBTA spec as a member of PortInfo.
Add translation to string of new CapabilityMask2 expansion of link caps.

The flags are concatenated to current caps print as seen in this example
printing EXT_INFO flag:

root@server-22 $ rdma -d link
1/1: mlx5_0/1: subnet_prefix fe80:0000:0000:0000 lid 2 sm_lid 2 lmc 0
	state ACTIVE physical_state LINK_UP
caps: <SM, TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS,
	CAP_MASK2, CM, DEVICE_MGMT, VENDOR_CLASS, CAP_MASK_NOTICE,
	CLIENT_REG, OTHER_LOCAL_CHANGES, MULT_PKER_TRAP, EXT_INFO>

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:41:21 -08:00
David Ahern 0c187b7f24 Merge branch 'strict-dumps' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:44 -08:00
David Ahern 6b83edc061 neighbor: Add support for protocol attribute
Add support to set protocol on neigh entries and to print the protocol
on dumps.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:37:12 -08:00
David Ahern 8d4f35de17 ip route: Rename do_ipv6 to dump_family
do_ipv6 is really the preferred dump family. Rename it to make
that apparent.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:51 -08:00
David Ahern aea41afcfd ip bridge: Set NETLINK_GET_STRICT_CHK on socket
iproute2 has been updated for the new strict policy in the kernel. Add a
helper to call setsockopt to enable the feature. Add a call to ip.c and
bridge.c

The setsockopt fails on older kernels and the error can be safely ignored
- any new fields or attributes are ignored by the older kernel.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:36:29 -08:00
David Ahern 8847097850 ip address: Set device index in dump request
Add a filter function to rtnl_addrdump_req to set device index in the
address dump request if the user is filtering addresses by device. In
addition, add a new ipaddr_link_get to do a single RTM_GETLINK request
instead of a device dump yet still store the data in the linfo list.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:49 -08:00
David Ahern 7ca9cee8d8 ip address: Split ip_linkaddr_list into link and addr functions
Split ip_linkaddr_list into one function that generates a list of devices
and a second that generates the list of addresses.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:35:14 -08:00
David Ahern e41ede8939 mroute: Add table id attribute for kernel side filtering
Similar to 'ip route' add the table id to the dump request for
kernel side filtering if it is supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:50 -08:00
David Ahern 98ce99273f mroute: fix up family handling
Only ipv4 and ipv6 have multicast routing. Set family
accordingly and just return for other cases.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:34:28 -08:00
David Ahern c7e6371bc4 ip route: Add protocol, table id and device to dump request
Add protocol, table id and device to dump request if set in filter. If
kernel side filtering is supported it is used to reduce the amount of
data sent to userspace.

Older kernels do not parse attributes on a route dump request, so these
are silently ignored and ip will do the filtering in userspace.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:59 -08:00
David Ahern 43fd93ae46 ip route: Remove rtnl_rtcache_request
Add a filter option to rtnl_routedump_req and use it to set rtm_flags
removing the need for rtnl_rtcache_request for dump requests.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:34 -08:00
David Ahern d97b16b2c9 libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask
Only AF_UNSPEC handled by rtnl_dump_ifinfo expects an ext_filter_mask
on a dump request. Update the linkdump request functions to only set
and send ext_filter_mask for AF_UNSPEC.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:33:05 -08:00
David Ahern 92e03242c4 libnetlink: Use NLMSG_LENGTH to set nlmsg_len
Change nlmsg_len from sizeof(req) to use NLMSG_LENGTH on the header.
2 of the inner headers are not 4-byte aligned, so add a 0-length buf
after the header with the __aligned(NLMSG_ALIGNTO) to ensure the size
of the request is large enough. Use NLMSG_ALIGN in NLMSG_LENGTH to set
nlmsg_len.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:57 -08:00
David Ahern 2750252d7e libnetlink: dump extack string in done message
Print any extack message that has been appended to a NLMSG_DONE message.
To avoid duplication, move the existing print code to a new helper.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-27 15:32:31 -08:00
David Ahern fdce94d0d1 Update kernel headers
Update kernel headers to commit
ce28bb445388 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-22 07:36:52 -08:00
Petr Vorel 261a5290dd testsuite: Fix colorize
bash and dash require for escape sequence to use 'echo -e' or printf
(but working on zsh). Choosing printf as it's implementation is IMHO
more portable than echo implementations.
dash also require to use \033[0; as escape sequence instead of \e[0;

NOTE: \e[0; kept in lib/color.c as it's not problematic for C code
(working when run ip on various shells).

Fixes: 7e2f71b4 ("testsuite: colorize test result output")

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-20 20:16:28 -08:00
Stephen Hemminger c579ec14a7 uapi/iptunnel: make TUNNEL_FLAGS available
ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap Key example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

In the lwtunnel situation, some TUNNEL_FLAGS should can be set by
userspace

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:19:33 -08:00
Stephen Hemminger 2db63d290b uapi/netlink.h: rename NETLINK_DUMP_STRICT_CHK -> NETLINK_GET_STRICT_CHK
NETLINK_DUMP_STRICT_CHK can be used for all GET requests,
dumps as well as doit handlers.  Replace the DUMP in the
name with GET make that clearer.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:18:29 -08:00
Stephen Hemminger ef7f9fae2e uapi/in.h: Allow class-e address assignment
While most distributions long ago switched to the iproute2 suite
of utilities, which allow class-e (240.0.0.0/4) address assignment,
distributions relying on busybox, toybox and other forms of
ifconfig cannot assign class-e addresses without this kernel patch.

While CIDR has been obsolete for 2 decades, and a survey of all the
open source code in the world shows the IN_whatever macros are also
obsolete... rather than obsolete CIDR from this ioctl entirely, this
patch merely enables class-e assignment, sanely.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-20 09:17:05 -08:00
David Ahern 17689d3075 Update kernel headers
Update kernel headers to commit
   055722716c39 ("tipc: fix uninitialized value for broadcast retransmission")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:47:29 -08:00
Stephen Hemminger db9af1f1e3 testsuite: drop unrunnable test
The classifier testbed test never worked and was always being
skipped. It depended on some files it tests/cls which never made
it into the iproute2 git repository.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:10:36 -08:00
Stephen Hemminger e5cd5a51f9 doc: remove trailing whitespace
Run whitespace scrubbing script to remove unnecessary trailing
blanks at end of line and end of file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:38 -08:00
David Ahern 6065ddfaa7 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-19 12:02:17 -08:00
Petr Vorel 8b2ea19276 examples: Remove cbq.init-v0.7.3
This script is obsolete.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel bb955fd127 examples: Remove dhcp-client-script
This script is obsolete.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel 377a09902a configure: Minor code cleanup
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel fce84d6450 configure: Remove non-posix shell expansion
+ change shebang to /bin/sh

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel 3de834e6e2 configure: Remove unused function check_prog()
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel ec7cac05ff tests: Use /bin/sh shebang
Bashisms for tests were removed in ecd44e68 ("tests: Remove
bashisms (s/source/.)"), so no need to use bash shebang.

+ remove trailing whitespace.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Petr Vorel ee32695387 man: rtpr: Rename s/bash/shell/
ip/rtpr mentioned in man as bash script is actually posix shell script
(doesn't require to use bash).

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 10:52:35 -08:00
Luca Boccassi 85bcb524a2 testsuite: remove gre kmods if the test loads them
The tunnel test leaves behind link devices created by the GRE kernel
modules:

$ ip -br link
...
gre0@NONE    DOWN 0.0.0.0 <NOARP>
gretap0@NONE DOWN 00:00:00:00:00:00 <BROADCAST,MULTICAST>
erspan0@NONE DOWN 00:00:00:00:00:00 <BROADCAST,MULTICAST>
ip6tnl0@NONE DOWN :: <NOARP>
ip6gre0@NONE DOWN 00:00:00:00:

$ lsmod | grep gre
ip6_gre      40960  0
ip6_tunnel   40960  1 ip6_gre
ip_gre       32768  0
ip_tunnel    24576  1 ip_gre
gre          16384  2 ip6_gre,ip_gre

Check beforehand if the gre kernel module is loaded, and if not unload
them all at the end of the test. This should avoid causing problems if
a user is already using GRE for other purposes.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi eaed928b64 testsuite: delete dummy interface after default route test
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi 61f9ade9fb testsuite: declare dependency between $(TESTS) and generate_nlmsg
Parallel make from the top level directory fails since tests are at the
same time as generate_nlmsg:

$ make check -j4

...

cd testsuite && make && make alltests
echo "Entering iproute2" && cd iproute2 && make configure && cd ..;
Entering iproute2
make -C tools
Removing results dir ...
make[1]: ./tools/generate_nlmsg: Command not found
make[1]: ./tools/generate_nlmsg: Command not found
Makefile:64: recipe for target 'ip/netns/set_nsid_batch.t' failed
make[1]: *** [ip/netns/set_nsid_batch.t] Error 127
make[1]: ./tools/generate_nlmsg: Command not found
make[1]: *** Waiting for unfinished jobs....
Makefile:64: recipe for target 'ip/netns/set_nsid.t' failed
make[1]: *** [ip/netns/set_nsid.t] Error 127
Makefile:64: recipe for target 'ip/link/show_dev_wo_vf_rate.t' failed
make[1]: *** [ip/link/show_dev_wo_vf_rate.t] Error 127
    CC       generate_nlmsg
Makefile:123: recipe for target 'check' failed
make: *** [check] Error 2

Add an explicit dependency in testuite/Makefile's $(TESTS) rule so
that the tool correctly gets compiled before any test runs.

Fixes: 3537633dcf ("testsuite: Generate generate_nlmsg when needed")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Luca Boccassi 0115d55e9f Makefile: have check target depend on all
Otherwise it will simply fail immediately from a just-cleaned
workspace:

$ make check -j1
cd testsuite && make && make alltests
echo "Entering iproute2" && cd iproute2 && make configure && cd ..;
Entering iproute2
make -C tools
Makefile:3: ../../config.mk: No such file or directory
make[2]: *** No rule to make target '../../config.mk'.  Stop.

Fixes: 8804a8c0d3 ("Makefile: Add check target")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 09:58:10 -08:00
Masatake YAMATO cec6b03124 man: ss: fix typos about wscale
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-12-18 07:50:41 -08:00
Stephen Hemminger 738aebe52b drop support for DECnet
DECnet belongs in the history museum of dead protocols along
with Appletalk and IPX.

Linux support has outlived its natural life and the time has
come to remove it from iproute2. Dead code is a source
of bugs and exploits.

If anyone actually has DECnet running on some old distribution
they can just keep to the old version of iproute2.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-13 12:50:01 -08:00
Syrone Wong 6ddb36c3a9 tc: fix xtables incorrect usage of LDFLAGS
The incorrect setting of LDFLAGS causes error below:

> em_ipt.o: In function `em_ipt_print_epot':
> em_ipt.c:(.text.em_ipt_print_epot+0x2e): undefined reference to
> `xtables_init_all'

em_ipt.c gets involved when TC_CONFIG_XT=y, which requires xtables,
while tc/Makefile doesn't pass flags correctly. It adds '-lxtables'
to LDFLAGS instead of LDLIBS.

Fixes: dd296215 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")

Signed-off-by: Syrone Wong <wong.syrone@gmail.com>
Acked-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:38:43 -08:00
Stephen Hemminger 3a1f602ade remove redundant long int
Using unsigned long is sufficient no need to be more
verbose and use unsigned long int.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:36:59 -08:00
Leon Romanovsky 378dd31b4b rdma: Fix broken 32-bit compilation
Allow compilation of rdmatool on 32-bits platforms.

rdma
    CC       rdma.o
    CC       utils.o
    CC       dev.o
    CC       link.o
In file included from rdma.h:26:0,
                 from dev.c:12:
dev.c: In function 'dev_caps_tostr':
../include/utils.h:269:38: warning: left shift count >= width of type [-Wshift-count-overflow]
 #define BIT(nr)                 (1UL << (nr))
                                      ^
rdma.h:32:61: note: in expansion of macro 'BIT'
 #define RDMA_BITMAP_ENUM(name, bit_no) RDMA_BITMAP_##name = BIT(bit_no),
                                                             ^~~

Fixes: 40df8263a0 ("rdma: Add dev object")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-13 11:34:44 -08:00
Stephen Hemminger 90c5c969f0 fix print_0xhex on 32 bit
The argument to print_0xhex is converted to unsigned long long
so the format string give for normal printout has to be some
variant of %llx. Otherwise, bogus values will be printed on
32 bit platforms.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 14:20:32 -08:00
Stephen Hemminger 33fde2b600 lib/bpf: fix build warning if no elf
Function was not used unlesss HAVE_ELF causing:

bpf.c:105:13: warning: ‘bpf_map_offload_neutral’ defined but not used [-Wunused-function]

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 13:50:17 -08:00
Stephen Hemminger 79940533c0 ipmacsec: fix warning on 32bit platform
On some 32 bit platforms, the printf was causing warning:
ipmacsec.c: In function ‘getattr_u64’:
ipmacsec.c:655:47: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘unsigned int’ [-Wformat=]
   fprintf(stderr, "invalid attribute length %lu\n",

Resolve by computing length as size_t first.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 13:47:58 -08:00
Stephen Hemminger 028766aed2 uapi: update bpf header
Changes from 4.20-rc6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-10 09:22:23 -08:00
David Ahern fbe7da2306 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:02:08 -08:00
Shalom Toledo a463fd4fa4 devlink: Add support for 'fw_load_policy' generic parameter
Add string to uint conversion for 'fw_load_policy' generic parameter.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:40 -08:00
Shalom Toledo 2557dca2b0 devlink: Add string to uint{8,16,32} conversion for generic parameters
Allow setting u{8,16,32} generic parameters as a well defined strings in
devlink user space tool.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-07 13:00:38 -08:00
Stephen Hemminger a9c49b8f8f rdma: align uapi headers with 4.20-rc5
Upstream headers were updated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:25:59 -08:00
Hoang Le 853adffe13 tipc: fix misalignment printout in non-JSON output
In the commit 1304f50a5b ("tipc: JSON support for showing nametable"),
introduced misalignment in the columns of the printout in non-JSON mode
compare to the list header. Add one space per column to make alignment
with the list header.

before:
$tipc name show
Type       Lower      Upper      Scope    Port       Node
1         1         1         node    4071367628

after:
$tipc name show
Type       Lower      Upper      Scope    Port       Node
1          1          1          node     4071367628

Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:24:01 -08:00
Martin Jeřábek 2e320d8b7e ip: iplink_can.c: fix json formatting
Previously the CAN state was always printed in human-readable txt format,
resulting in invalid JSON.

Signed-off-by: Martin Jeřábek <martin.jerabek01@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-07 09:22:29 -08:00
Petr Machata 4ac4217464 testsuite: Add a test for batch processing
Test that when a second or following command in a batch fails, tc
reports it correctly. This is a test for the previous patch.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:28:31 -08:00
Petr Machata 0951cbcddf libnetlink: Process further iovs on no error
When no error is reported in the first iov, do not prematurely return,
but process further iovs. This fixes batch processing.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:28:31 -08:00
Emeric Dupont a7a7e45017 iproute2: Installation errors without libmnl
When performing make install in iproute2 (current git master),
     if $(HAVE_MNL) is not selected, some Makefiles try to call
     install with an empty target, which causes a non-critical make error.

Signed-off-by: Emeric Dupont <emeric.dupont@zii.aero>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:27:08 -08:00
Stephen Hemminger f3188cfa39 devlink: don't need to call pkg-config twice
pkg-config for libmnl is already done in config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-04 14:19:33 -08:00
Amritha Nambiar 8930840e67 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v6:
Modified to change json output format as object for sport/dport.

 "dst_port":{
           "start":2000,
           "end":6000
 },
 "src_port":{
           "start":50,
           "end":60
 }

v5:
Simplified some code and used 'sscanf' for parsing. Removed
space in output format.

v4:
Added man updates explaining filtering based on port ranges.
Removed 'range' keyword.

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:02:58 -08:00
David Ahern dd7d522a67 Revert "tc: flower: Classify packets based port ranges"
This reverts commit e20e50b0c1.

Inadvertently pushed v3 of this patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 16:01:07 -08:00
David Ahern fb417073a3 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-12-03 15:39:29 -08:00
Andrea Claudi b876b7e2b4 l2tp: Fix printing of cookie and peer_cookie values
print_cookie() invocations miss %s format specifier.
While at it, align printout to the previous lines.

Fixes: 98453b6580 ("ip/l2tp: add JSON support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:35:58 -08:00
Eric Dumazet 3adcbf3757 tc: add a missing space between rate estimator and backlog
When a rate estimator is active, "tc -s qd" displays
something like :

rate 12616bit 11ppsbacklog 0b 0p requeues 2

instead of :

rate 12616bit 11pps backlog 0b 0p requeues 2

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:34:05 -08:00
Phil Sutter 6495bca92e ssfilter: Fix for inverted last expression
When fixing for shift/reduce conflicts, possibility to invert the last
expression by prefixing with '!' or 'not' was accidentally removed.

Fix this by allowing for expr to be an inverted expr so that any
reference to it in exprlist accepts the inverted prefix.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: b2038cc0b2 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-12-03 14:33:19 -08:00
Eric Dumazet 5eead6270a ss: add support for bytes_sent, bytes_retrans, dsack_dups and reord_seen
Wei Wang added these fields in linux-4.19

Tested:

ss -ti ...

	ts sack cubic wscale:8,8 rto:7 rtt:2.678/0.267 mss:1428 pmtu:1500
    rcvmss:536 advmss:1428 cwnd:91 ssthresh:65
(*) bytes_sent:17470606104 bytes_retrans:2856
    bytes_acked:17470483297
    segs_out:12234320 segs_in:622983
    data_segs_out:12234318 send 388.2Mbps lastrcv:986784 lastack:1
    pacing_rate 465.8Mbps delivery_rate 162.7Mbps
    delivered:12234235 delivered_ce:3669056
    busy:986784ms unacked:84 retrans:0/2
(*) dsack_dups:2
    rcv_space:14280 rcv_ssthresh:65535 notsent:2016336 minrtt:0.183

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-29 10:49:46 -08:00
Phil Sutter 7ab8f249aa man: ip-route.8: Fix ENCAP references in synopsis
The different encapsulation types are described in ENCAP_*
non-terminals, but ENCAP definition lists them without the ENCAP_
prefix. Fix this for consistency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 16:00:18 -08:00
Roopa Prabhu f38e278b84 bridge: make -c match -compressvlans first instead of -color
commit c7c1a1ef51 ("bridge: colorize output and use JSON print library")
broke previous use of -c to represent compressvlans. This restores
previous use of -c to represent compressvlans. Understand the original
motivation to use -c to represent color consistently everywhere but
there are apps and network interface managers out there that are already
using -c to prepresent compressed vlans.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:59:21 -08:00
Eric Dumazet 2f4d834b99 ss: add support for delivered and delivered_ce fields
Kernel support was added in linux-4.18 in commit feb5f2ec6464
("tcp: export packets delivery info")

Tested:

ss -ti
...
ESTAB   0 2270520      [2607:f8b0:8099:e16::]:47646   [2607:f8b0:8099:e18::]:38953
	 ts sack cubic wscale:8,8 rto:7 rtt:2.824/0.278 mss:1428
     pmtu:1500 rcvmss:536 advmss:1428 cwnd:89 ssthresh:62 bytes_acked:2097871945
    segs_out:1469144 segs_in:65221 data_segs_out:1469142 send 360.0Mbps lastsnd:2
    lastrcv:99231 lastack:2 pacing_rate 431.9Mbps delivery_rate 246.4Mbps
(*) delivered:1469099 delivered_ce:424799
    busy:99231ms unacked:44 rcv_space:14280 rcv_ssthresh:65535
    notsent:2207688 minrtt:0.228

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:57:30 -08:00
Phil Sutter b2ec8f4314 man: rdma: Add reference to rdma-resource.8
All rdma-related man pages list each other in SEE ALSO section, only
rdma-resource.8 is missing. Add it for the sake of consistency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-28 15:56:31 -08:00
Eric Dumazet 6d03d6f7d9 man: tc: update man page for fq packet scheduler
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-25 09:40:36 -08:00
Eric Dumazet 55e106c480 tc: fq: support ce_threshold attribute
Kernel commit 48872c11b772 ("net_sched: sch_fq: add dctcp-like marking")
added support for TCA_FQ_CE_THRESHOLD attribute.

This patch adds iproute2 support for it.

It also makes sure fq_print_xstats() can deal with smaller tc_fq_qd_stats
structures given by older kernels.

Usage :

FQATTRS="ce_threshold 4ms"
TXQS=8

for ETH in eth0
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
  tc qd add dev $ETH parent 1:$i fq $FQATTRS
 done
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:30:24 -08:00
Stephen Hemminger ce5071eda6 drop support for IPX
IPX has been depracted then removed from upstream kernels.
Drop support from ip route as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:27:56 -08:00
David Ahern 27f4e51003 Merge branch 'tc-gred-json-perf-vq-config' into iproute2-next
Jakub Kicinski  says:

====================

This set brings GRED support up to date with recent kernel changes.
In particular the new netlink attributes for more fine-grained stats
and per-virtual queue flags.

To make GRED usable in modern deployments the patch set starts with
adding JSON output.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:17:30 -08:00
Jakub Kicinski f7a8749aff tc: gred: allow controlling and dumping per-DP RED flags
Kernel now support setting ECN and HARDDROP flags per-virtual
queue.  Allow users to tweak the settings, and print them on
dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:40 -08:00
Jakub Kicinski 2d7c564a1e tc: gred: support controlling RED flags
Kernel GRED qdisc supports ECN marking, and the harddrop flag
but setting and dumping this flag is not possible with iproute2.
Add the support.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:36 -08:00
Jakub Kicinski fdaff63c6a tc: gred: use extended stats if available
Use the extended attributes with extra and better stats, when
possible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:19 -08:00
Jakub Kicinski c3e1cd28c1 tc: gred: separate out stats printing
Printing GRED statistics is long and deserves a function on its own.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:09 -08:00
Jakub Kicinski 6475e6a580 tc: gred: jsonify GRED output
Make GRED dump JSON-compatible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:11:04 -08:00
Jakub Kicinski 33021752cd tc: move RED flag printing to helper
Number of qdiscs use the same set of flags to control shared RED
implementation.  Add a helper for printing those flags.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:10:58 -08:00
Jakub Kicinski b640e85d2d json: add %hhu helpers
Add helpers for printing char-size values.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:09:53 -08:00
Jakub Kicinski c8f201e3d2 tc: gred: remove unclear comment
The comment about providing a proper message seems similar to
the comment in the kernel which says:

    /* hack -- fix at some point with proper message
       This is how we indicate to tc that there is no VQ
       at this DP */

it's unclear what that message would be, and whether it's needed.
Remove the confusing comment.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:08:16 -08:00
David Ahern 6ae54b1326 Revert "rdma: make local functions static"
This reverts commit e99c4443ae.

Patch added to iproute2-master breaks builds of -next because of a
more recent patch in -next that relies on the exports. Revert the
offending patch. Unfortunately this leaves a window where builds
break.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:17 -08:00
David Ahern 0868c8ab07 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-24 07:06:11 -08:00
Quentin Monnet 1a7d3ad8a5 bpf: initialise map symbol before retrieving and comparing its type
In order to compare BPF map symbol type correctly in regard to the
latest LLVM, commit 7a04dd84a7 ("bpf: check map symbol type properly
with newer llvm compiler") compares map symbol type to both NOTYPE and
OBJECT. To do so, it first retrieves the type from "sym.st_info" and
stores it into a temporary variable.

However, the type is collected from the symbol "sym" before this latter
symbol is actually updated. gelf_getsym() is called after that and
updates "sym", and when comparison with OBJECT or NOTYPE happens it is
done on the type of the symbol collected in the previous passage of the
loop (or on an uninitialised symbol on the first passage). This may
eventually break map collection from the ELF file.

Fix this by assigning the type to the temporary variable only after the
call to gelf_getsym().

Fixes: 7a04dd84a7 ("bpf: check map symbol type properly with newer llvm compiler")
Reported-by: Ron Philip <ron.philip@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-21 09:36:30 -08:00
Nicolas Dichtel ebe3ce2fcc ipnetns: parse nsid as a signed integer
Don't confuse the user, nsid is a signed integer, this kind of command
should return an error: 'ip netns set foo 0xffffffff'.

Also, a valid value is a positive value. To let the kernel chooses a value,
the keyword 'auto' must be used.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-21 09:35:37 -08:00
Amritha Nambiar e20e50b0c1 tc: flower: Classify packets based port ranges
Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v3:
Modified flower_port_range_attr_type calls.

v2:
Addressed Jiri's comment to sync output format with input

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:34:56 -08:00
David Ahern 8d42678dfb Update kernel headers
Update kernel headers to
  b1a200484143 ("net-next/hinic: fix a bug in rx data flow")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-20 14:33:09 -08:00
Stephen Hemminger e99c4443ae rdma: make local functions static
Several functions only used inside utils.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 946a135c58 tc/pedit: use structure initialization
The pedit callback structure table should be iniatialized using
structure initialization to avoid structure changes problems.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 9e96e71594 tc/action: make variables static
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 42d9eed451 tc/meta: make meta_table static and const
The mapping table is only used by em_meta.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 9455bec52a tc/util: make local functions static
The tc util library parse/print has functions only used locally
(and some dead code removed).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 33043dfc9c tc/ematch: make local functions static
The print handling is only used in tc/m_ematch.c

Remove unused function to print_ematch_tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 7527b221d6 tc/pedit: make functions static
The parse and pack functions are only used by the pedit routines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 3f7bd0fd90 ss: make local variables static
Several variables only used in this code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger a38fadf401 tc/police: make print_police static
print_police function only used by m_police.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 7e569d92a9 tc/class: make filter variables static
Only used in this file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger f63c3f9a81 tipc: make cmd_find static
Function only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger babc56b68c tc: drop unused name_to_id function
Not used in current code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger fa92d8cb09 ipxfrm: make local functions static
Make functions only used in ipxfrm.c static.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 3e4b255ca9 ipmonitor: make local variable static
prefix_banner only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 086277b591 ip: make flag names const/static
The table of filter flags is only used in ipaddress

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger d63786c642 bridge: make local variables static
enable_color and set_color_palette only used here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 13530c46b5 genl: remove dead code
The function genl_ctrl_resolve_family is defined but never used
in current code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 1d2fac4145 libnetlnk: unused and local functions cleanup
rntl_talk_extack and parse_rtattr_index not used in current code.
rtnl_dump_filter_l is only used in this file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger cc5b7e37ac lib/ll_map: make local function static
ll_idx_a2n is only used in ll_map.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger f7bf88dfd5 lib/color: make local functions static
color_enable etc, only used here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger b8795a3208 lib/utils: make local functions static
Some of the print/parsing is only used internally.
Drop unused get_s8/get_s16.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 07b20a6197 lib/ll_addr: whitespace and indent cleanup
Run old ll_addr through kernel Lindent.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:42:44 -08:00
Stephen Hemminger 57ddc275f5 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/shemminger/iproute2 2018-11-19 11:40:37 -08:00
Phil Sutter 133db49b49 ip-address: Fix filtering by negated address flags
When disabling a flag, one needs to AND with the inverse not the flag
itself. Otherwise specifying for instance 'home -nodad' will effectively
clear the flags variable.

While being at it, simplify the code a bit by merging common parts of
negated and non-negated case branches. Also allow for the "special
cases" to be inverted, too.

Fixes: f73ac674d0 ("ip: change flag names to an array")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:38:24 -08:00
Stephen Hemminger 87e3ec0e2f testsuite: drop old kernel configs
The testsuite directory had several ancient kernel configs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-19 11:37:08 -08:00
Phil Sutter 05d978e085 ip-route: Fix nexthop encap parsing
When parsing nexthop parameters, a buffer of 4k bytes is provided. Yet,
in lwt_parse_encap() and some functions called by it, buffer size was
assumed to be 1k despite the actual size was provided. This led to
spurious buffer size errors if the buffer was filled by previous nexthop
parameters to exceed that 1k boundary.

Fixes: 1e5293056a ("lwtunnel: Add encapsulation support to ip route")
Fixes: 5866bddd9a ("ila: Add support for ILA lwtunnels")
Fixes: ed67f83806 ("ila: Support for checksum neutral translation")
Fixes: 86905c8f05 ("ila: support for configuring identifier and hook types")
Fixes: b15f440e78 ("lwt: BPF support for LWT")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:18:59 -08:00
Stephen Hemminger 7e2f71b431 testsuite: colorize test result output
When running testsuite it is easy as a human to miss failure.
Add symbol colorizing to SKIPED/PASS/FAIL output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:18:03 -08:00
Phil Sutter 6cd959bb12 man: ip-route.8: Document nexthop limit
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-14 11:13:24 -08:00
Stefano Brivio 6d1af9abc5 testsuite: ss: Fix spacing in expected output for ssfilter.t
Since commit 00240899ec ("ss: Actually print left delimiter for
columns") changes spacing in ss output, we also need to adjust for that in
the ss filter test.

Fixes: 00240899ec ("ss: Actually print left delimiter for columns")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-12 08:30:34 -08:00
Luca Boccassi ed27d27a5c testsuite: correctly use CC macros for generate_nlmsg
It's $(QUIET_CC)$(CC) not $(QUIET_CC), copy-paste error. CI does
verbose build so it slipped through.

Fixes: 6e7d347aab ("testsuite: build generate_nlmsg with QUIET_CC")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:58:55 -08:00
Stefano Brivio 64dbd03ea1 iplink_geneve: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:47 -08:00
Stefano Brivio 3d98eba4fe iplink_vxlan: Add DF configuration
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.

v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
  (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:51:12 -08:00
David Ahern 3a7246dce4 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-09 08:50:50 -08:00
Stephen Hemminger 9abb69f8a0 uapi: sctp header change
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:18:26 -08:00
Jakub Kicinski 9c5f4251d6 tc: f_u32: allow skip_hw and skip_sw flags to be last
u32 uses NEXT_ARG() incorrectly when parsing skip_hw and skip_sw
flags.  NEXT_ARG() ensures there is another argument on the command
line, and is used in handling <keyword> <value> syntax to move past
<keyword> and ensure there is a <value> to read.

Commit 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw
flags") seems to have copy pasted the handling from the previous
command - "police", which needs an extra parameter and is kind of
special due to the use of parse_police() helper.

The combination of NEXT_ARG() and continue worked fine as long as
skip_sw/skip_hw wasn't last, e.g.:

$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
    u32 match ip6 priority 0xa0 0xe0 skip_hw action pass

But would fail if it was last:

$ tc filter add dev dummy0 ingress prio 101 protocol ipv6 \
    u32 match ip6 priority 0xa0 0xe0 flowid :1 skip_hw
Command line is not complete. Try option "help"

Remove the NEXT_ARG()s and the continues, and let the argc--; argv++;
at the end of the loop do its job.

Fixes: 5e5b3008d1 ("tc: f_u32: Add support for skip_hw and skip_sw flags")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:12:29 -08:00
Luca Boccassi 1a03ac6b05 Pass CPPFLAGS to the compiler
When building Debian packages pre-processor flags are passed via
CPPFLAGS, as the convention indicates. Specifically, the hardening
-D_FORTIFY_SOURCE=2 flag is used.
Pass CPPFLAGS to all calls of QUIET_CC together with CFLAGS.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:07:18 -08:00
Luca Boccassi 6e7d347aab testsuite: build generate_nlmsg with QUIET_CC
Follow the standard pattern, and respect user's verbosity setting.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:07:18 -08:00
Alex Vesker 995015be31 devlink: Add missing region option to devlink man page
The region field was not added to the devlink man page.

Fixes: 8b4fbf0bed ("devlink: Add support for devlink-region access")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:04:33 -08:00
Luca Boccassi a6bb5b9e7c Fix warning in tc-skbprio.8 manpage
". If" gets interpreted as a macro, so move the period to the previous
line:

  33: warning: macro `If' not defined

Fixes: 141b55f854 ("Add SKB Priority qdisc support in tc(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:03:40 -08:00
Luca Boccassi 7f5047524c man: ss.8: break and indent long line
Fixes groff warning:
  ss.8 92: warning [p 2, 2.8i]: can't break line

And makes the line also more readable.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 08:02:43 -08:00
Roopa Prabhu a795211fe5 bridge: fdb: remove redundant dev string in show output
After commit 4abb8c723a ("bridge: fdb: Fix for missing
keywords in non-JSON output"), I am seeing a double print for dev
in bridge fdb show. eg:
"44:38:39:00:6a:82 dev dev bridge vlan 1 master bridge permanent"

this patch removes the redundant print.

Fixes: 4abb8c723a ("bridge: fdb: Fix for missing keywords in non-JSON output")
CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-09 07:50:01 -08:00
Leon Romanovsky e89feffae3 rdma: Document IB device renaming option
[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-05 19:12:03 -08:00
Luca Boccassi 6d2fd4a53f Include bsd/string.h only in include/utils.h
This is simpler and cleaner, and avoids having to include the header
from every file where the functions are used. The prototypes of the
internal implementation are in this header, so utils.h will have to be
included anyway for those.

Fixes: 508f3c231e ("Use libbsd for strlcpy if available")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:38:32 -08:00
Stephen Hemminger 39776a8665 uapi: update headers to 4.20-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:37:41 -08:00
Leon Romanovsky 4ee770eec9 rdma: Refresh help section of resource information
After commit 4060e4c0d2 ("rdma: Add PD resource tracking
information"), the resource information shows PDs and MRs,
but help pages didn't fully reflect it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-05 08:36:36 -08:00
David Ahern cde2f706aa Merge branch 'rdma-dev-rename' into iproute2-next
Leon Romanovsky  says:

====================

An example:

[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:41:12 -07:00
Leon Romanovsky 4443c9c6a0 rdma: Add an option to rename IB device interface
Enrich rdmatool with an option to rename IB devices,
the command interface follows Iproute2 convention:
"rdma dev set [OLD-DEVNAME] name NEW-DEVNAME"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:56 -07:00
Leon Romanovsky a14ceed325 rdma: Introduce command execution helper with required device name
In contradiction to various show commands, the set command explicitly
requires to use device name as an argument. Provide new command
execution helper which enforces it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:18 -07:00
Leon Romanovsky 3fb00075d9 rdma: Update kernel include file to support IB device renaming
Bring kernel header file changes upto commit 05d940d3a3ec
("RDMA/nldev: Allow IB device rename through RDMA netlink")

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:38:15 -07:00
David Ahern b2e8bf1584 ip rule: Add ipproto and port range to filter list
Allow ip rule dumps and flushes to filter based on ipproto, sport
and dport. Example:

$ ip ru ls ipproto udp
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001
$ ip ru ls dport 53
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:37:14 -07:00
David Ahern 70bc1f6550 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-11-02 09:34:18 -07:00
David Ahern 2380120926 ip rule: Require at least one argument for add
'ip rule add' with no additional arguments just adds another rule
for the main table - which exists by default. Require at least
1 argument similar to delete.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:49:48 -07:00
David Ahern b65b4c0870 ip rule: Honor filter arguments on flush
'ip ru flush' currently removes all rules with priority > 0 regardless
of any other command line arguments passed in. Update flush_rule to
call filter_nlmsg to determine if the rule should be flushed or not.
This enables rule flushing such as 'ip ru flush table 1001' and
'ip ru flush pref 99'.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:49:48 -07:00
Luca Boccassi 508f3c231e Use libbsd for strlcpy if available
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-11-01 12:47:03 -07:00
Yonghong Song 7a04dd84a7 bpf: check map symbol type properly with newer llvm compiler
With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
  -bash-4.4$ cat t.c
  __attribute__((section("maps"))) int g;
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 2 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 g

The following llvm commit enables BPF target to generate
proper symbol type and size.
  commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
  Author: Yonghong Song <yhs@fb.com>
  Date:   Wed Sep 19 16:04:13 2018 +0000

      [bpf] Symbol sizes and types in object file

      Clang-compiled object files currently don't include the symbol sizes and
      types.  Some tools however need that information.  For example, ctfconvert
      uses that information to generate FreeBSD's CTF representation from ELF
      files.
      With this patch, symbol sizes and types are included in object files.

      Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
      Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>

Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but OBJECT.
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 3 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS t.c
       2: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 g

This patch makes sure bpf library accepts both NOTYPE and OBJECT types
of global map symbols.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-31 08:27:07 -07:00
Stefano Brivio 00240899ec ss: Actually print left delimiter for columns
While rendering columns, we use a local variable to keep track of the
field currently being printed, without touching current_field, which is
used for buffering.

Use the right pointer to access the left delimiter for the current column,
instead of always printing the left delimiter for the last buffered field,
which is usually an empty string.

This fixes an issue especially visible on narrow terminals, where some
columns might be displayed without separation.

Reported-by: YoyPa <yoann.p.public@gmail.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: YoyPa <yoann.p.public@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-31 08:11:11 -07:00
Tobias Jungel 45fca4ed94 bridge: fix vlan show stats formatting
The output of -statistics vlan show was broken previous change for json
output. This aligns the format to vlan show.

v2: fixed too greedy deletion that caused a -Wmaybe-uninitialized

Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-29 09:58:03 -07:00
Peter Korsgaard f900a21611 utils.h: provide fallback CLOCK_TAI definition
q_{etf,taprio}.c uses CLOCK_TAI, which isn't exposed by glibc < 2.21 or
uClibc, breaking the build. Provide a fallback definition like it is done
for IPPROTO_MPLS and others.

Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-29 09:54:52 -07:00
David Ahern 6e221408e6 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-23 10:55:09 -07:00
Hangbin Liu 35b857f9c6 ip/geneve: fix ttl inherit behavior
Currently when we add geneve with "ttl inherit", we only set ttl to 0, which
is actually use whatever default value instead of inherit the inner protocol's
ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_GENEVE_TTL_INHERIT in kernel commit 52d0d404d39dd ("geneve: add ttl
inherit support"). Now let's use "ttl inherit" to inherit the inner
protocol's ttl, and use "ttl auto" to means "use whatever default value",
the same behavior with ttl == 0.

v2:
1) remove IFLA_GENEVE_TTL_INHERIT defination in if_link.h as it's already
   updated.
2) Still use addattr8() so we can enable/disable ttl inherit, as Michal
   suggested.

v3: Update man page

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-23 10:53:16 -07:00
Stephen Hemminger 2808241af5 v4.19.0 2018-10-23 10:14:57 -07:00
Phil Sutter 737b8258b3 tc: htb: Print default value in hex
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

As requested, also change JSON output to print the value as hex string.

Fixes: f354fa6aa5 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-23 10:07:10 -07:00
Phil Sutter 6358bbc381 tc: Remove pointless assignments in batch()
All these assignments are later overwritten without reading in between,
so just drop them.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 8d05f33a38 tipc: Drop unused variable 'genl'
Although initialized by call to libmnl, the variable is used only in a
call to sizeof(). Drop it and call sizeof with its type instead.

Fixes: f043759dd4 ("tipc: add new TIPC configuration tool")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 3b5c5ef0a7 ip-route: Fix parse_encap_seg6() srh parsing
In case caller did not specify 'segs' parameter, parse_srh() would read
garbage while iterating over 'segbuf'. Avoid this by initializing
'segbuf' to an empty string.

Fixes: e8493916a8 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter cdefe1d8e4 rdma: Don't pass garbage to rd_check_is_filtered()
Variables 'src_port' and 'dst_port' are initialized only if attributes
RDMA_NLDEV_ATTR_RES_SRC_ADDR or RDMA_NLDEV_ATTR_RES_DST_ADDR are
present. Make sure to pass them over to rd_check_is_filtered() only if
that is the case.

Fixes: 9a362cc71a ("rdma: Add CM_ID resource tracking information")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter e5da392ff8 ip-route: Fix for memleak in error path
If call to rta_addattr_l() failed, parse_encap_seg6() would leak memory.
Fix this by making sure calls to free() are not skipped.

Fixes: bd59e5b151 ("ip-route: Fix segfault with many nexthops")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:43 -07:00
Phil Sutter 3b0070f6b1 rdma: Fix for ineffective check in add_filter()
With 'name' field defined as array in struct filters, it will always
contain a value irrespective of whether a name was assigned or not.

Fix this by turning the field into a const char pointer.

Fixes: 1174be72d1 ("rdma: Add filtering infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
Phil Sutter b1ffc1f465 devlink: Fix error reporting in cmd_resource_set()
resource_path_parse() returns either zero or a negative error code,
hence the negated value must be passed to strerror().

Fixes: 8cd6440958 ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
Petr Vorel 92885e1973 testsuite: Fix make check when need build generate_nlmsg
make check from top level Makefile defines several flags which break
building generate_nlmsg:

$ make check
make -C tools
gcc  -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -I../include/uapi -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -DNETNS_RUN_DIR=\"/var/run/netns\" -DNETNS_ETC_DIR=\"/etc/netns\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DHAVE_SETNS -DHAVE_SELINUX -DHAVE_ELF -DHAVE_LIBMNL -I/usr/include/libmnl -DNEED_STRLCPY -DHAVE_LIBCAP ../lib/libutil.a ../lib/libnetlink.a -lselinux -lelf -lmnl -lcap  -I../../include -include../../include/uapi/linux/netlink.h -o generate_nlmsg generate_nlmsg.c ../../lib/libnetlink.c -lmnl
gcc: error: ../lib/libutil.a: No such file or directory
gcc: error: ../lib/libnetlink.a: No such file or directory
make[2]: *** [Makefile:5: generate_nlmsg] Error 1
make[1]: *** [Makefile:40: generate_nlmsg] Error 2

To fix it reset CFLAGS in sub Makefile and remove LDLIBS entirely (as
required -lmnl flag was specified in 5dc2204c ("testsuite: add libmnl").

Fixes: 8804a8c0 ("Makefile: Add check target")

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 10:05:18 -07:00
David Ahern 260137e24d iplink: Remove flags argument from iplink_get
iplink_get has 1 caller and the flags arg is 0, so just remove it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-22 09:45:25 -07:00
David Ahern cd554f2c2f Tree wide: Drop sockaddr_nl arg
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-22 09:43:48 -07:00
David Ahern 9d16a1de1f Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-22 09:43:33 -07:00
Stephen Hemminger 95debca728 util: spelling fix 2018-10-18 13:23:38 -07:00
Stephen Hemminger c60683e246 tipc: spelling fix 2018-10-18 13:23:30 -07:00
Stephen Hemminger 94b0c90152 ip: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:23:11 -07:00
Stephen Hemminger f5a398bf17 tc: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:22:51 -07:00
Stephen Hemminger 5cc4639471 config: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:22:25 -07:00
Stephen Hemminger ab7318f9d4 examples: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:18:56 -07:00
Stephen Hemminger 9d715cf65a doc/man: spelling fixes
Use ispell and codespell to find/fix spelling errors in documentation
and man pages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-18 13:15:45 -07:00
Phil Sutter 0b9b0d08c2 ip-addrlabel: Fix printing of label value
Passing the return value of RTA_DATA() to rta_getattr_u32() is wrong
since that function will call RTA_DATA() by itself already.

Fixes: a7ad1c8a68 ("ipaddrlabel: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-16 11:51:05 -07:00
Lorenzo Bianconi c7a3b22961 utils: fix get_rtnl_link_stats_rta stats parsing
iproute2 walks through the list of available tunnels using netlink
protocol in order to get device info instead of reading
them from proc filesystem. However the kernel reports device statistics
using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
IFLA_PROTINFO one but iproutes expects these info in
IFLA_STATS64/IFLA_STATS attributes.
The issue can be triggered with the following reproducer:

$ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
$ip -6 -d -s tunnel show ip6d0
ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
Dump terminated

Fix the issue introducing IFLA_INET6_STATS attribute parsing

Fixes: 3e95393871 ("iptunnel/ip6tunnel: Use netlink to walk through
tunnels list")

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
2018-10-15 09:40:15 -07:00
Lorenzo Bianconi 9e030e77f2 uapi: add snmp header file
Introduce snmp header file. It will be used in subsequent patch in
order to parse device statistics reported in
IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS netlink attributes

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:37:42 -07:00
Sabrina Dubroca 9b45f8ec13 macsec: fix off-by-one when parsing attributes
I seem to have had a massive brainfart with uses of
parse_rtattr_nested(). The rtattr* array must have MAX+1 elements, and
the call to parse_rtattr_nested must have MAX as its bound. Let's fix
those.

Fixes: b26fc590ce ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:35:48 -07:00
Sabrina Dubroca 45ec4771d4 json: make 0xhex handle u64
Stephen converted macsec's sci to use 0xhex, but 0xhex handles
unsigned int's, not 64 bits ints. Thus, the output of the "ip macsec
show" command is mangled, with half of the SCI replaced with 0s:

# ip macsec show
11: macsec0: [...]
    cipher suite: GCM-AES-128, using ICV length 16
    TXSC: 0000000001560001 on SA 0

# ip -d link show macsec0
11: macsec0@ens3: [...]
    link/ether 52:54:00:12:01:56 brd ff:ff:ff:ff:ff:ff promiscuity 0
    macsec sci 5254001201560001 [...]

where TXSC and sci should match.

Fixes: c0b904de62 ("macsec: support JSON")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:32:18 -07:00
Phil Sutter 4abb8c723a bridge: fdb: Fix for missing keywords in non-JSON output
While migrating to JSON print library, some keywords were dropped from
standard output by accident. Add them back to unbreak output parsers.

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-15 09:23:55 -07:00
David Ahern 0d30c1f8d4 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:31:37 -07:00
Nikolay Aleksandrov d13d52d0d5 bridge: add support for backup port
This patch adds support for the new backup port option that can be set
on a bridge port. If the port's carrier goes down all of the traffic
gets redirected to the configured backup port. We add the following new
arguments:
$ ip link set dev brport type bridge_slave backup_port brport2
$ ip link set dev brport type bridge_slave nobackup_port

$ bridge link set dev brport backup_port brport2
$ bridge link set dev brport nobackup_port

The man pages are updated respectively.
Also 2 minor style adjustments:
- add missing space to bridge man page's state argument
- use lower starting case for vlan_tunnel in ip-link man page (to be
consistent with the rest)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:26:46 -07:00
Roopa Prabhu 4c45b684f9 ipneigh: support for NTF_EXT_LEARNED flag on neigh entries
Adds new option extern_learn to set NTF_EXT_LEARNED flag
on neigh entries.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-13 19:24:45 -07:00
Stephen Hemminger bfb3bf189f libnetlink: use local variable
Now that err->error is in local variable, use it consistently.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-09 09:46:11 -07:00
Vlad Buslov 8c50b728b2 libnetlink: fix use-after-free of message buf
In __rtnl_talk_iov() main loop, err is a pointer to memory in dynamically
allocated 'buf' that is used to store netlink messages. If netlink message
is an error message, buf is deallocated before returning with error code.
However, on return err->error code is checked one more time to generate
return value, after memory which err points to has already been
freed. Save error code in temporary variable and use the variable to
generate return value.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-09 09:41:03 -07:00
Jakub Kicinski 650a10e032 tc: jsonify output of q_fifo
Print limits correctly in JSON context.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-08 09:22:22 -07:00
David Ahern f8805f567e Merge branch 'taprio-scheduler' into iproute2-next
Vinicius Costa Gomes  says:

====================

This is the iproute2 side of the taprio v1 series.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:37:57 -07:00
Vinicius Costa Gomes 579acb4bc5 taprio: Add manpage for tc-taprio(8)
This documents the parameters and provides an example of usage.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:32:16 -07:00
Vinicius Costa Gomes 0dd1644935 tc: Add support for configuring the taprio scheduler
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
          num_tc 3 \
	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	  queues 1@0 1@1 2@2 \
	  base-time 1528743495910289987 \
	  sched-entry S 01 300000 \
	  sched-entry S 02 300000 \
	  sched-entry S 04 300000 \
	  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

     sched-entry <CMD> <GATE MASK> <INTERVAL>

The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

 	      base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:32:08 -07:00
Jesus Sanchez-Palencia d791f3ad86 libnetlink: Add helper for getting a __s32 from netlink msgs
This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:31:35 -07:00
Vinicius Costa Gomes de63cd9044 include: Add helper to retrieve a __s64 from a netlink msg
This allows signed 64-bit integers to be retrieved from a netlink
message.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:30:30 -07:00
Vinicius Costa Gomes a066bac8a2 utils: Implement get_s64()
Add this helper to read signed 64-bit integers from a string.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:30:28 -07:00
David Ahern 720a44a751 Update kernel headers
update kernel headers to commit:
72438f8cef4e ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-07 10:29:25 -07:00
Vlad Buslov f6b498f957 tc: flower: expose hardware offload count
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-10-07 10:14:09 -07:00
Hangbin Liu 952a7a1931 vxlan: show correct ttl inherit info
We should only show ttl inherit when IFLA_VXLAN_TTL_INHERIT supplied.
Otherwise show the ttl number, or auto when it is 0.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-10-04 09:20:45 -07:00
David Ahern 169e667c93 Merge branch 'hdrs-for-dump-req' into iproute2-next
David Ahern says:

====================

iproute2 currently uses ifinfomsg as the header for all dumps using the
wilddump headers. This is wrong as each message type actually has its own
header type. While the kernel has traditionally let it go as it for the
most part only uses the family entry, the use of kernel side filters is
increasing to alter what is returned on a request. The kernel side filters
really need to use the proper header type.

To that end, fix iproute2 to use the proper header struct for the GET type.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:41:52 -07:00
David Ahern 56eeeda978 libnetlink: Rename rtnl_wilddump_stats_req_filter to rtnl_statsdump_req_filter
rtnl_wilddump_stats_req_filter only takes RTM_GETSTATS as the type argument
so rename to rtnl_statsdump_req_filter for consistency with other request
functions and hardcode the type argument.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:36 -07:00
David Ahern 31ae2912f7 libnetlink: Rename rtnl_wilddump_* to rtnl_linkdump_*
Rename rtnl_wilddump_req_filter to rtnl_linkdump_req_filter,
rtnl_wilddump_request to rtnl_linkdump_req and
rtnl_wilddump_req_filter_fn to rtnl_linkdump_req_filter_fn.

In all cases drop the type argument which at this point is only
RTM_GETLINK and hardcode in the functions.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:08 -07:00
David Ahern efb0b383d9 libnetlink: Convert GETNSID dumps to use rtnl_nsiddump_req
Add rtnl_nsiddump_req for namespace id dumps using the proper rtgenmsg
as the header. Convert existing RTM_GETNSID dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:04 -07:00
David Ahern ff41db8a75 libnetlink: Convert GETNEIGHTBL dumps to use rtnl_neightbldump_req
Add rtnl_neightbldump_req for neighbor table dumps using the proper ndtmsg
as the header. Convert existing RTM_GETNEIGHTBL dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:39:02 -07:00
David Ahern 9e0ab19c4d libnetlink: Convert GETNEIGH dumps to use rtnl_neighdump_req
Add rtnl_neighdump_req for neighbor dumps using the proper ndmsg
as the header. Convert existing rtnl_wilddump_request for RTM_GETNEIGH
to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:59 -07:00
David Ahern b05d9a3d58 libnetlink: Convert GETRULE dumps to use rtnl_ruledump_req
Add rtnl_ruledump_req for fib fule dumps using the proper fib_rule_hdr
as the header. Convert existing RTM_GETRULE dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:56 -07:00
David Ahern ddee16bc96 libnetlink: Convert GETNETCONF dumps to use rtnl_netconfdump_req
Add rtnl_netconfdump_req for netconf dumps using the proper netconfmsg
as the header. Convert existing RTM_GETNETCONF dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:34 -07:00
David Ahern 9dbe6df411 libnetlink: Convert GETMDB dumps to use rtnl_mdbdump_req
Add rtnl_mdbdump_req for mdb dumps using the proper br_port_msg as
the header. Convert existing RTM_GETMDB dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:31 -07:00
David Ahern 393600231a libnetlink: Convert GETADDRLABEL dumps to use rtnl_addrlbldump_req
Add rtnl_addrlbldump_req for address label dumps using the proper
ifaddrlblmsg as the header. Convert existing RTM_GETADDRALBEL dumps
to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:29 -07:00
David Ahern bfb27dfaac libnetlink: Convert GETROUTE dumps to use rtnl_routedump_req
Add rtnl_routedump_req for route dumps using the proper rtmsg
as the header. Convert existing RTM_GETROUTE dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:27 -07:00
David Ahern 46917d0895 libnetlink: Convert GETADDR dumps to use rtnl_addrdump_req
Add rtnl_addrdump_req for address dumps using the proper ifaddrmsg
as the header. Convert existing RTM_GETADDR dumps to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 18:38:21 -07:00
Eelco Chaudron 5ac138324e tc_util: Add support for showing TCA_STATS_BASIC_HW statistics
Add support for showing hardware specific counters to easy
troubleshooting hardware offload.

$ tc -s filter show dev enp3s0np0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
        action order 1: mirred (Egress Redirect to device eth1) stolen
        index 1 ref 1 bind 1 installed 0 sec used 0 sec
        Action statistics:
        Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 187542 bytes 4077 pkt
        Sent hardware 534697200 bytes 8911620 pkt
        backlog 0b 0p requeues 0
        cookie 89173e6a44447001becfd486bda17e29

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:45:33 -07:00
Pieter Jansen van Vuuren 56155d4df8 tc: f_flower: add geneve option match support to flower
Allow matching on options in Geneve tunnel headers.

The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.

e.g.
 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev geneve0 ingress
 # tc filter add dev geneve0 protocol ip parent ffff: \
     flower \
       enc_src_ip 10.0.99.192 \
       enc_dst_ip 10.0.99.193 \
       enc_key_id 11 \
       geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
       ip_proto udp \
       action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-02 14:39:55 -07:00
Roopa Prabhu 51eb02254b ipneigh: update man page and help for router
While at it also add missing text for proxy in the man page.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-10-01 17:36:35 -07:00
Nikolay Aleksandrov c3ded6e4a0 bridge: fdb: add support for sticky flag
Add support for the new sticky flag that can be set on fdbs and update the
man page.

CC: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 10:52:22 -07:00
David Ahern d9c0be4e97 Update kernel headers
Update kernel headers to commit
a804e5e218754 ("selftests: forwarding: test for bridge sticky flag")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 10:51:15 -07:00
Roopa Prabhu c2cd14acc7 ipneigh: support setting of NTF_ROUTER on neigh entries
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 09:53:08 -07:00
David Ahern 7b2e200679 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-28 09:52:41 -07:00
Stephen Hemminger b45e300024 libnetlink: don't return error on success
Change to error handling broke normal code.

Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 10:08:48 +02:00
Stephen Hemminger 5dc2204c01 testsuite: add libmnl
Supporting external ack requires libmnl now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:59:37 +02:00
Petr Vorel 8804a8c0d3 Makefile: Add check target
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:56:40 +02:00
Lorenzo Bianconi c1360e3b48 iplink_vxlan: take into account preferred_family creating vxlan device
Take into account the configured preferred_family if neither saddr or
daddr are provided since otherwise vxlan kernel module will use IPv4 as
default remote inet family neglecting the one provided by userspace.
This behaviour was originally in commit 97d564b90c ("vxlan: use
preferred address family when neither group or remote is specified").
The issue can be triggered with the following reproducer:

$ip -6 link add vxlan1 type vxlan id 42 dev enp0s2 \
     proxy nolearning l2miss l3miss
$bridge fdb add 46:47:1f:a7:1c:25 dev vxlan1 dst 2000::2
RTNETLINK answers: Address family not supported by protocol

Fixes: 1e9b8072de ("iplink_vxlan: Get rid of inet_get_addr()")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-25 09:52:56 +02:00
Hangbin Liu fa1e658e84 iplink: fix incorrect any address handling for ip tunnels
After commit d42c7891d2 ("utils: Do not reset family for default, any,
all addresses"), when call get_addr() for any/all addresses, we will set
addr->flags to ADDRTYPE_INET_UNSPEC if family is AF_INET/AF_INET6, which
makes is_addrtype_inet() checking passed and assigns incorrect address
to kernel. The ip link cmd will return error like:

]# ip link add ipip1 type ipip local any remote 1.1.1.1
RTNETLINK answers: Numerical result out of range

Fix it by using is_addrtype_inet_not_unspec() to avoid unspec addresses.

geneve, vxlan are not affected as they use AF_UNSPEC family when call
get_addr()

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: d42c7891d2 ("utils: Do not reset family for default, any, all addresses")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 11:28:33 -07:00
Stephen Hemminger 11152f0a0d Makefile: add help target
Add help target to Makefile

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 09:15:26 -07:00
Petr Vorel 133c1a6c87 testsuite: Warn about empty $(IPVERS)
alltests target requires having symlink created by configure target
(default target). Without that there is no test being run.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Petr Vorel 3537633dcf testsuite: Generate generate_nlmsg when needed
Commit 886f2c43 added generate_nlmsg.c. Running alltests
target, which uses the binary required to run 'make -C tools' before.

Fixes: 886f2c43 testsuite: Generate nlmsg blob at runtime

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Petr Vorel f15836faec testsuite: Fix missing generate_nlmsg
Commit ad23e152 caused generate_nlmsg to be always missing:

$ make alltests
make: ./tools/generate_nlmsg: Command not found

Create testclean: to remove only results directory.

Fixes: ad23e152 testsuite: remove all temp files and implement make clean

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-21 08:59:52 -07:00
Hangbin Liu 88272775e2 iplink: add ipvtap support
IPVLAN and IPVTAP are using the same functions and parameters. So we can
just add a new link_util with id ipvtap. Others are the same.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-20 17:53:56 -07:00
David Ahern 34212c73b7 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	ip/iproute_lwtunnel.c

In addition to merge conflict between bd59e5b151 and 94a8722f2f,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-09-20 17:53:27 -07:00
Leon Romanovsky d090fbf33b rdma: Fix representation of PortInfo CapabilityMask
The port capability mask represents IBTA PortInfo specification,
but as it is written in description of kernel commit 2f944c0fbf58
("RDMA: Fix storage of PortInfo CapabilityMask in the kernel"),
the bit 26 was mistakenly overwritten.

The rdmatool followed it too and mislead users by presenting wrong
value. Since it never showed proper value, we update the whole
port_cap_mask to comply with IBTA and show real HW values.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-17 08:59:13 -07:00
Stephen Hemminger c60389e4f9 libnetlink: fix leak and using unused memory on error
If an error happens in multi-segment message (tc only)
then report the error and stop processing further responses.
This also fixes refering to the buffer after free.

The sequence check is not necessary here because the
response message has already been validated to be in
the window of the sequence number of the iov.

Reported-by: Mahesh Bandewar <mahesh@bandewar.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Mahesh Bandewar <maheshb@google.com>
2018-09-17 08:58:21 -07:00
Toke Høiland-Jørgensen 2153e01f36 q_cake: Also print nonat, nowash and no-ack-filter keywords
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-14 11:32:46 -07:00
Hangbin Liu 92bba4ed40 bridge/mdb: fix missing new line when show bridge mdb
The bridge mdb show is broken on current iproute2. e.g.
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp 34: br0  veth0_br  224.1.1.1  temp

After fix:
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp
34: br0  veth0_br  224.1.1.1  temp

v2: Use json print lib as Stephen suggested.
v3: No need to use is_json_context() as print_string() could handle both cases.
v4: use new function print_nl() to print new line in non-json mode.

Reported-by: Ying Xu <yinxu@redhat.com>
Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-13 16:02:33 -07:00
Toke Høiland-Jørgensen b914fe5f1c q_cake: Add printing of no-split-gso option
When the GSO splitting was turned into dual split-gso/no-split-gso options,
the printing of the latter was left out. Add that, so output is consistent
with the options passed.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-12 12:59:38 -07:00
Stephen Hemminger b85076cd74 lib: introduce print_nl
Common pattern in iproute commands is to print a line seperator
in non-json mode. Make that a simple function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-11 08:29:33 -07:00
Phil Sutter bd59e5b151 ip-route: Fix segfault with many nexthops
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:

| for i in `seq 37`; do
| 	nhs="nexthop via 1111::$i "$nhs
| done
| ip -6 route add 3333::/64 $nhs

The related code was broken in multiple ways:

* parse_one_nh() assumed that rta points to 4kB of storage but caller
  provided just 1kB. Fixed by passing 'len' parameter with the correct
  value.

* Error checking of rta_addattr*() calls in parse_one_nh() and called
  functions was completely absent, so with above fix in place output
  flood would occur due to parser looping forever.

While being at it, increase message buffer sizes to 4k. This allows for
at most 144 nexthops.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 12:14:50 -07:00
Caleb Raitto 40c2916fda tc/mqprio: Print extra info on invalid args.
Print the name of the argument that wasn't understood.

Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 12:14:00 -07:00
Stephen Hemminger ae775666cf genl: remove unnecessary extern
extern not necessary on function prototype.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:53:07 -07:00
Stephen Hemminger ad618b7984 tc/fifo: remove unnecessary prototype
The prototype for prio_print_opt is already in tc_util.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:50:22 -07:00
Stephen Hemminger 0f36267485 bridge: fix vlan show formatting
The output of vlan show was broken previous change to use json_print.
Clean the code up and return to original format.

Note: the JSON syntax has changed to make the bridge vlan
show more like other outputs (e.g. ip -j li show).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:48:06 -07:00
Stephen Hemminger 2ed82667b8 bridge: use print_json for some outputs
Rather than using is_json_context(), use the print_string functions
which handle both cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Stephen Hemminger f5fc738736 bridge: minor change to mdb print
Get port ifname once rather than on both sides of if(is_json_context).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Caleb Raitto 781ee3270d man: Change numtc to num_tc
The argument parser only accepts num_tc:

https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/tc/q_mqprio.c#n55

Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-09-10 11:47:11 -07:00
Stephen Hemminger 27886a1241 uapi: update ib_verbs
Merge current uapi from 4.19-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-31 15:03:49 -07:00
David Ahern b555ff737a Merge branch 'netem-slot-param' into iproute2-next
Yousuk Seung  says:

====================

This series adds support for the new "slot" netem parameter for
slotting. Slotting is an approximation of shared media that gather up
packets within a varying delay window before delivering them nearly at
once.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:08:43 -07:00
Yousuk Seung 588dd51e2c q_netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.

Syntax:
   slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
      [bytes MAX_BYTES]

The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.

Examples:
  Normal distribution delay with mean = 800us and stdev = 100us.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us

  Optionally set the max slot size in bytes and/or packets.
  > tc qdisc add dev eth0 root netem slot distribution normal \
    800us 100us bytes 64k packets 42

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:08:19 -07:00
Dave Taht b6268fbd58 q_netem: support delivering packets in delayed time slots
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
        slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:07:46 -07:00
Dave Taht abf70ef494 tc: support conversions to or from 64 bit nanosecond-based time
Using a 32 bit field to represent time in nanoseconds results in a
maximum value of about 4.3 seconds, which is well below many observed
delays in WiFi and LTE, and barely in the ballpark for a trip past the
Earth's moon, Luna.

Using 64 bit time fields in nanoseconds allows us to simulate
network diameters of several hundred light-years. However, only
conversions to and from ns, us, ms, and seconds are provided.

The iproute2 64 bit api uses signed values for time. Being able to
represent positive or negative time allows us to calculate +/- deltas
between, for example, the CLOCK_TAI and CLOCK_REALTIME clocks.

Time related utility functions in tc_util.c are moved to lib/utils.c.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:04:38 -07:00
David Ahern c4e0ea8e9b Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-30 11:04:05 -07:00
Florent Fourcot 2bfe28710e tc/htb: remove unused variable
Since introduction of htb module, this variable has never been used.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 08:00:45 -07:00
Mahesh Bandewar 5d5586b058 iproute: make clang happy
These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:58:09 -07:00
Mahesh Bandewar a5aaca9be2 ipmaddr: use preferred_family when given
When creating socket() AF_INET is used irrespective of the family
that is given at the command-line (with -4, -6, or -0). This change
will open the socket with the preferred family.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:57:11 -07:00
Stephen Hemminger 0ebb420929 uapi: update bpf headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:55:49 -07:00
Cong Wang 0bab7630e3 ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
make them available in ss -e output.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:53:39 -07:00
Stefan Bader 1a75322c5a iprule: Fix destination prefix output
When adding support for JSON output the new code for printing
the destination prefix adds a stray blank character before
the bitmask. This causes some user-space parsing to fail.

Current output:
  ...: from x.x.x.x/l to y.y.y.y /l
Previous output:
  ...: from x.x.x.x/l to y.y.y.y/l

Fixes: 0dd4ccc5 "iprule: add json support"
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-30 07:51:00 -07:00
Toke Høiland-Jørgensen 6526e604cf q_cake: Add description of the tc filter override mechanism to man page
Since CAKE now has three different settings that can be overridden by tc
filters (priority and host and flow hashes), documenting how they work is
probably a good idea.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:15:03 -07:00
Luca Boccassi 88ecd4873b testsuite: run dmesg with sudo
Some distributions like Debian nowadays restrict the dmesg command to
root-only. Run it with sudo in the testsuite.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Luca Boccassi 012895ce4e testsuite: let make compile build the netlink helper
The generate_nlmsg binary is required but make -C testsuite compile
does not build it. Add the necessary includes and C*FLAGS to the tools
Makefile and have the compile target build it.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Luca Boccassi ad23e152b8 testsuite: remove all temp files and implement make clean
Some generated test files were not removed, including one executable in
the testsuite/tools directory.
Ensure make clean from the top level directory works for the testsuite
subdirs too, and that all the files are removed.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:14:09 -07:00
Stefan Bader 1019364964 testsuite: Handle large number of kernel options
Once there are more than a certain number of kernel config options
set (this happened for us with kernel 4.17), the method of passing
those as command line arguments exceeds the maximum number of
arguments the shell supports. This causes the whole testsuite to
fail.
Instead, create a temporary file and modify its contents so that
the config option variables are exported. Then this file can be
sourced in before running the tests.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-24 23:13:26 -07:00
Stephen Hemminger a8e9f4ae14 tc: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:01:31 -07:00
Stephen Hemminger 51070e8f18 genl: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:01:01 -07:00
Stephen Hemminger cf7fe23859 bridge: drop extern from function prototypes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 16:00:38 -07:00
Stephen Hemminger 84fb55ede1 ip: drop extern from function prototype
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 15:58:50 -07:00
Phil Sutter 515a766cd2 lib: Make check_enable_color() return boolean
As suggested, turn return code into true/false although it's not checked
anywhere yet.

Fixes: 4d82962ccc ("Merge common code for conditionally colored output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 08:55:16 -07:00
Phil Sutter ff1ab8edf8 Make colored output configurable
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-20 08:54:06 -07:00
Shmulik Ladkani 94a8722f2f iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'
This allows the user to specify the LWTUNNEL_IP_SRC/LWTUNNEL_IP6_SRC
when setting an lwtunnel encapsulation route.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-17 13:17:03 -07:00
Phil Sutter 644b9c238c ip: Add missing -M flag to help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-17 09:19:54 -07:00
Stephen Hemminger bb616be341 Merge branch 'ipmonitor' 2018-08-16 10:30:10 -07:00
Stephen Hemminger 3e5f8e0ab6 genl: code cleanup
Run through checkpatch and cleanup line wraps etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:28:13 -07:00
Phil Sutter d559db725c man: ss.8: Describe --events option
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 6417c06b59 rtmon: List options in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 71170d854e man: rtacct.8: Fix nstat options
Add missing --pretty and --json options, correct --zero to --zeros and
correct the mess around --scan/--interval including broken man page
formatting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter a486d25b9c man: ifstat.8: Document --json and --pretty options
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter d94974bc91 genl: Fix help text
The '| help' part was misleading: In fact, 'genl help' does not work but
'genl <OBJECT> help' does. Fix the help text to make that clear.

In addition to that, list -Version and -help flags as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:02 -07:00
Phil Sutter 29b1430ba9 man: devlink.8: Document -verbose option
This was the only bit missing in comparison to devlink help text.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter bb75b9bf2f devlink: trivial: Make help text consistent
Typically the part of the flag in brackets completes the leading part
instead of repeating it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter f9ff0cd69c bridge: trivial: Make help text consistent
Change curly braces into brackets for -json option in help text to be
consistent with the rest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Phil Sutter 05758f5c7b man: bridge.8: Document -oneline option
Copied the description from ip.8.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 10:25:01 -07:00
Stephen Hemminger e09d21f675 ipmonitor: decode DELNETCONF message
When device is deleted DELNETCONF is sent, but ipmonitor
was unable to decode it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 09:50:34 -07:00
Stephen Hemminger 40443f49b3 ip: convert monitor to switch
The decoding of netlink message types is natural for a C
switch statement.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-16 09:49:00 -07:00
David Ahern db71144c0c Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 14:32:10 -07:00
Phil Sutter d67eb4fbf8 testsuite: Add a first ss test validating ssfilter
This tests a few ssfilter expressions by selecting sockets from a TCP
dump file. The dump was created using the following command:

| ss -ntaD testsuite/tests/ss/ss1.dump

It is fed into ss via TCPDIAG_FILE environment variable.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 744bd07662 testsuite: Prepare for ss tests
This merges the shared bits from ts_tc() and ts_ip() into a common
function for being wrapped by the first ones and adds a third ts_ss()
for testing ss commands.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 38d209ecf2 ss: Review ssfilter
The original problem was ssfilter rejecting single expressions if
enclosed in braces, such as:

| sport = 22 or ( dport = 22 )

This is fixed by allowing 'expr' to be an 'exprlist' enclosed in braces.
The no longer required recursion in 'exprlist' being an 'exprlist'
enclosed in braces is dropped.

In addition to that, a few other things are changed:

* Remove pointless 'null' prefix in 'appled' before 'exprlist'.
* For simple equals matches, '=' operator was required for ports but not
  allowed for hosts. Make this consistent by making '=' operator
  optional in both cases.

Reported-by: Samuel Mannehed <samuel@cendio.se>
Fixes: b2038cc0b2 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:25:18 -07:00
Phil Sutter 8a03a2f36f man: ip-route: Clarify referenced versions are Linux ones
Versioning scheme of Linux and iproute2 is similar, therefore the
referenced kernel versions are likely to confuse readers. Clarify this
by prefixing each kernel version by 'Linux' prefix.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-15 14:23:48 -07:00
Stephen Hemminger a84639fcb2 Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2018-08-15 14:21:45 -07:00
David Ahern ea9f9b910e Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:56:30 -07:00
Phil Sutter 4d82962ccc Merge common code for conditionally colored output
Instead of calling enable_color() conditionally with identical check in
three places, introduce check_enable_color() which does it in one place.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:55:27 -07:00
Phil Sutter 5332148deb bridge: Fix check for colored output
There is no point in calling enable_color() conditionally if it was
already called for each time '-color' flag was parsed. Align the
algorithm with that in ip and tc by actually making use of 'color'
variable.

Fixes: e9625d6aea ("Merge branch 'iproute2-master' into iproute2-next")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:54:51 -07:00
Phil Sutter 0d0e0e0bef tc: Fix typo in check for colored output
The check used binary instead of boolean AND, which means colored output
was enabled only if the number of specified '-color' flags was odd.

Fixes: 2d165c0811 ("tc: implement color output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-15 09:54:32 -07:00
Nishanth Devarajan 141b55f854 Add SKB Priority qdisc support in tc(8)
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.

Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-14 07:06:43 -07:00
Stephen Hemminger fa6b90904a Merge branch 'master' of git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2018-08-13 12:17:53 -07:00
Stephen Hemminger 31ad498a01 v4.18.0 2018-08-13 12:11:32 -07:00
Tobias Klauser b3b7c2a71b tc: bpf: update list of archs with eBPF support in manpage
Update the list of architectures supporting eBPF JIT as of Linux 4.18.
Also mention the Linux version where support for a particular
architecture was introduced. Finally, reformat the list of architectures
as a bullet list in order to make it more readable.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-13 12:09:11 -07:00
David Ahern c044be6b34 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:47:21 -07:00
Toke Høiland-Jørgensen 23a67b008a sch_cake: Make gso-splitting configurable
This patch makes sch_cake's gso/gro splitting configurable
from userspace.

To disable breaking apart superpackets in sch_cake:

tc qdisc replace dev whatever root cake no-split-gso

to enable:

tc qdisc replace dev whatever root cake split-gso

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-13 07:41:44 -07:00
Stephen Hemminger d97e266e5d ip: show min and max mtu
Add min/max MTU to the link details

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:24:31 -07:00
David Ahern 74eb09ad56 Update kernel headers
Update kernel headers to commit
78cbac647e61 (Merge branch 'ip-faster-in-order-IP-fragments'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:23:31 -07:00
Guillaume Nault bbc1cd0d27 l2tp: drop lns_mode
This option is never set.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Guillaume Nault 6022f4dd38 l2tp: drop mtu
This option can't be set by user and is never printed.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Guillaume Nault 99d6ff2101 l2tp: drop data_seq
This option can't be set by user and is never printed. Furthermore,
L2TP_ATTR_DATA_SEQ has always been a noop in Linux.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:05:11 -07:00
Keara Leibovitz e8bd395508 tc: fix bugs for tcp_flags and ip_attr hex output
Fix hex output for both the ip_attr and tcp_flags print functions.

Sample usage:

$ $TC qdisc add dev lo ingress
$ $TC filter add dev lo parent ffff: prio 3 proto ip flower ip_tos 0x8/32
$ $TC fitler add dev lo parent ffff: prio 5 proto ip flower ip_proto tcp \
	tcp_flags 0x909/f00

$ $TC filter show dev lo parent ffff:

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  eth_type ipv4
  ip_tos 0x8/32
  not_in_hw
filter protocol ip pref 5 flower chain 0
filter protocol ip pref 5 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  tcp_flags 0x909/f00
  not_in_hw

$ $TC -j filter show dev lo parent ffff:

[{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":3,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_tos":"0x8/32"
    },
    "not_in_hw":true
    }
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0
},{
    "protocol":"ip",
    "pref":5,
    "kind":"flower",
    "chain":0,
    "options": {
	"handle":1,
	"keys": {
	    "eth_type":"ipv4",
	    "ip_proto":"tcp",
	    "tcp_flags":"0x909/f00"
	},
	"not_in_hw":true
    }
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-08-12 14:04:00 -07:00
Matteo Croce d56c7dde9d ip link: don't stop batch processing
When 'ip link show dev DEVICE' is processed in a batch mode, ip exits
and stop processing further commands.
This because ipaddr_list_flush_or_save() calls exit() to avoid printing
the link information twice.
Replace the exit with a classic goto out instruction.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-08 09:24:47 -07:00
Stephen Hemminger d66fdfda71 tc: flush after each command in batch mode
After each command flush output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-08 09:23:48 -07:00
Lubomir Rintel 3655f788d3 lib/namespace: avoid double-mounting a /sys
This partly reverts 8f0807023d, bringing
back the umount(/sys) attempt.

In a LXC container we're unable to umount the sysfs instance, nor mount
a read-write one. We still are able to create a new read-only instance.

Nevertheless, it still makes sense to attempt the umount() even though
the sysfs is mounted read-only. Otherwise we may end up attempting to
mount a sysfs with the same flags as is already mounted, resulting in
an EBUSY error (meaning "Already mounted").

Perhaps this is not a very likely scenario in real world, but we hit
it in NetworkManager test suite and makes netns_switch() somewhat more
robust. It also fixes the case, when /sys wasn't mounted at all.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:40:12 -07:00
Stephen Hemminger e5faf729cb ip: show min and max mtu
Add min/max MTU to the link details

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:30:19 -07:00
Stephen Hemminger c8f7a754ed ip/address: fix bracketing in help message
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-27 13:26:21 -07:00
David Ahern a0bc57e1ef Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	include/uapi/linux/bpf.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:08:04 -07:00
Jiri Pirko afcd06991d tc: introduce support for chain templates
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 10:00:28 -07:00
Eran Ben Elisha 8c7acf3a7a ip: Add violation counters to VF statisctics
Extend VFs statistics by receive and transmit violation counters.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       2
    TX: bytes  packets  errors  dropped carrier collsns
    1406       17       0       0       0       0
    vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off
    RX: bytes  packets  mcast   bcast   dropped
    1666       29       14         32      0
    TX: bytes  packets   dropped
    2880       44       2412

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 09:59:36 -07:00
David Ahern 8b099da560 Update kernel headers
Update kernel headers to commit
aea5f654e6b7 ("net/sched: add skbprio scheduler")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-25 09:58:00 -07:00
Stephen Hemminger 7327f78565 rdam: uapi update ib_user_verbs.h
Merge in latest santized kernel header.
Put sanitized version of current ib_user_verbs.h.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:49:20 -07:00
Stephen Hemminger 7c16a8da6b uapi: fix tcp.h repair
Upstream define for TCP_REPAIR changed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:47:22 -07:00
David Ahern 7f57c8b726 devlink: CTRL_ATTR_FAMILY_ID is a u16
CTRL_ATTR_FAMILY_ID is a u16, not a u32. Update devlink accordingly.

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-23 13:44:36 -07:00
David Ahern 5f9c8c6a16 Merge branch 'tc-tunnels-tos-ttl' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:43 -07:00
Or Gerlitz 761ec9e29f tc/flower: Add match on encapsulating tos/ttl
Add matching on tos/ttl of the IP tunnel headers.

For example, here's decap rule that matches on the tunnel tos:

tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
   enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
   src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50  \
   action tunnel_key unset \
   action mirred egress redirect dev eth0_0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:59:11 -07:00
Or Gerlitz 9f89b0cc0e tc/act_tunnel_key: Enable setup of tos and ttl
Allow to set tos and ttl for the tunnel.

For example, here's encap rule that sets tos to the tunnel:

tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
   src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
   action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
   action mirred egress redirect dev vxlan_sys_4789

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:58:31 -07:00
David Ahern 204db84eb8 Update kernel headers
Update kernel headers to
a3eed83a1895 ("Merge branch 'qed-Add-support-for-phy-module-query'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:57:23 -07:00
Toke Høiland-Jørgensen 77c9fbd06e q_cake: Rename autorate_ingress parameter to use dash as word separator
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-20 08:46:42 -07:00
Jesus Sanchez-Palencia b625e36108 tc: Do not use addattr_nest_compat on mqprio and netem
Here we are partially reverting commit c14f9d92ee
"treewide: Use addattr_nest()/addattr_nest_end() to handle nested
attributes" .

As discussed in [1], changing from the 'manually' coded version that
used addattr_l() to addattr_nest_compat() wasn't functionally
equivalent, because now the messages have extra fields appended to it.

This introduced a regression since the implementation of parse_attr()
from both mqprio and netem can't handle this new message format.

Without this fix, mqprio returns an error. netem won't return an error
but its internal configuration ends up wrong.

As an example, this can be reproduced by the following commands when
this patch is not applied:

 1) mqprio
$ tc qdisc replace dev enp3s0 parent root handle 100 mqprio \
	num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	queues 1@0 1@1 2@2 hw 0

RTNETLINK answers: Numerical result out of range

 2) netem
$ tc qdisc add dev enp3s0 root netem rate 5kbit 20 100 5 \
	distribution normal latency 1 1

$ tc -s qdisc

(...)
qdisc netem 8001: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us
 Sent 402 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

With this patch applied, the tc -s qdisc command above for netem instead
reads:

(...)
qdisc netem 8002: dev enp3s0 root refcnt 9 limit 1000 delay 0us  0us \
	rate 5Kbit packetoverhead 20 cellsize 100 celloverhead 5
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
(...)

[1] https://patchwork.ozlabs.org/patch/867860/#1893405

Fixes: c14f9d92ee ("treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-19 15:50:07 -07:00
Toke Høiland-Jørgensen 714444c0cb Add support for CAKE qdisc
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:23:46 -07:00
Alex Vesker 8b4fbf0bed devlink: Add support for devlink-region access
Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.

This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.

A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.

The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.

New commands added:
 devlink region show [ DEV/REGION ]
 devlink region delete DEV/REGION snapshot SNAPSHOT_ID
 devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
 devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
                                address ADDRESS length length

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:20:15 -07:00
Qiaobin Fu 697dce7b3a net:sched: add action inheritdsfield to skbedit
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v4:
* Make tc use netlink helper functions

v3:
* Make flag represented in JSON output as a null value

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-19 09:17:56 -07:00
Mathieu Xhonneux 04cb3c0d43 ip: add support for seg6local End.BPF action
This patch adds support for the End.BPF action of the seg6local
lightweight tunnel. Functions from the BPF lightweight tunnel are
re-used in this patch. Example:

$ ip -6 route add fc00::18 encap seg6local action End.BPF endpoint
obj my_bpf.o sec my_func dev eth0

$ ip -6 route show fc00::18
fc00::18  encap seg6local action End.BPF endpoint my_bpf.o:[my_func]
dev eth0 metric 1024 pref medium

v2: - re-use of print_encap_bpf_prog instead of fprintf
    - introduction of "endpoint" keyword for more consistency with
      others parameters

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:56:18 -07:00
Serhey Popovych 8df708afd6 ipaddress: Fix and make consistent label match handling
Since commit 9516823051 ("ipaddress: Improve print_linkinfo()") we
return -1 instead of 0 when ip-address(8) label does not match network
device name as we did before change. This causes regression when trying
to output ip address matching label:

     # ip addr add 192.168.192.1/24 dev lo label lo:1
     # ip addr show label lo:1
     <no output>

This is special case and return 0 from print_linkinfo() earlier to match
only filter.ifindex and filter.up if given, but not rest fields in
@filter. Then call print_selected_addrinfo() without calling
print_link_stats() in ipaddr_list_flush_or_save().

Later print_selected_addrinfo() calls print_addrinfo() that finally
matches IFA_LABEL attribute in netlink buffer with filter.label using
ifa_label_match_rta().

On the other hand there is three conditions checked in print_linkinfo()
to determine label special case:

    1) filter.label != NULL
    2) filter.family == AF_UNSPEC || filter.family == AF_PACKET
    3) fnmatch(filter.label, name, 0)

With 1) it is ok to check if filtering by label is on by given pattern
in @filter.label.

Since label is IPv4 specific and AF_PACKET is for printing ip-link(8)
information (see ipaddr_link_list()::ipaddress.c as example) checking
for AF_PACKET in 2) doesn't take much sense: better to defer these
checks to print_addrinfo() determine valid combinations before calling
ifa_label_match_rta() to finally match IFA_LABEL to pattern in
filter.label.

For 3) we have following call for test case:

    fnmatch(pattern, string, flags) ->
      fnmatch(filter.label, name, 0) ->
        fnmatch("lo:1", "lo", 0) == FNM_NOMATCH (1) or non-zero on error

To support special case in print_linkinfo() for filtering by label we
only need to check if label pattern is given in filter.label and return
0 to skip print_link_stats() in ipaddr_list_flush_or_save(): actual
filtering will be done in print_addrinfo().

Before commit 9516823051 ("ipaddress: Improve print_linkinfo()"):
-------------------------------------------------------------------

$ ip addr sh label lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                          fnmatch("lo", "lo", 0) == 0
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip addr sh label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip -4 addr sh label lo:1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN \
group default qlen 1000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                             filter.family == AF_INET
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

After this change applied:
--------------------------

$ ip/ip addr show label lo
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
$ ip/ip addr show label 'lo:*'
    inet 192.168.192.1/24 scope global lo:1
        valid_lft forever preferred_lft forever
$ ip/ip addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever
$ ip/ip -4 addr show label lo:1
    inet 192.168.192.1/24 scope global lo:1
       valid_lft forever preferred_lft forever

Note that we no longer show link information as we did previously:
    we are filtering by "label" pattern, not showing by "dev".

Fixes: commit 9516823051 ("ipaddress: Improve print_linkinfo()")
Reported-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-18 15:52:55 -07:00
David Ahern b05a68f721 Merge branch 'bpf-btf' into iproute2-next
Daniel Borkmann  says:

====================

Main part of this set is to: i) avoid strict af_alg kernel dependency,
ii) add loader support for bpf to bpf calls and iii) add btf loader
support with an option to annotate maps. For details please see the
individual patches. Thanks!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:39:06 -07:00
Daniel Borkmann f823f36012 bpf: implement btf handling and map annotation
Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.

One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.

The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.

Example with test_xdp_noinline.o from kernel selftests:

  [...]

  struct ctl_value {
        union {
                __u64 value;
                __u32 ifindex;
                __u8 mac[6];
        };
  };

  struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
        .type		= BPF_MAP_TYPE_ARRAY,
        .key_size	= sizeof(__u32),
        .value_size	= sizeof(struct ctl_value),
        .max_entries	= 16,
        .map_flags	= 0,
  };
  BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);

  [...]

Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 7.0.0svn
    Optimized build.
    Default target: x86_64-unknown-linux-gnu
    Host CPU: skylake

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
  [...]

  # clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
    llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
  # pahole -J test_xdp_noinline.o

Checking pahole dump of BPF object file:

  # file test_xdp_noinline.o
  test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
  # pahole test_xdp_noinline.o
  [...]
  struct ctl_value {
	union {
		__u64              value;                /*     0     8 */
		__u32              ifindex;              /*     0     4 */
		__u8               mac[0];               /*     0     0 */
	};                                               /*     0     8 */

	/* size: 8, cachelines: 1, members: 1 */
	/* last cacheline: 8 bytes */
  };

Now loading into kernel and dumping the map via bpftool:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  [...]
  # bpftool prog show id 227
  227: xdp  tag a85e060c275c5616  gpl
      loaded_at 2018-07-17T14:41:29+0000  uid 0
      xlated 8152B  not jited  memlock 12288B  map_ids 381,385,386,382,384,383
  # bpftool map dump id 386
   [{
        "key": 0,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
        "key": 1,
        "value": {
            "": {
                "value": 0,
                "ifindex": 0,
                "mac": []
            }
        }
    },{
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:44 -07:00
Daniel Borkmann b5cb33aec6 bpf: implement bpf to bpf calls support
Implement missing bpf to bpf calls support. The loader will
recognize .text section and handle relocation entries that
are emitted by LLVM.

First step is processing of map related relocation entries
for .text section, and in a second step loader will copy .text
section into program section and adjust call instruction
offset accordingly.

Example with test_xdp_noinline.o from kernel selftests:

 1) Every function as __attribute__ ((always_inline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:233 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 233
  [...]
  1669: (2d) if r3 > r2 goto pc+4
  1670: (79) r2 = *(u64 *)(r10 -136)
  1671: (61) r2 = *(u32 *)(r2 +0)
  1672: (63) *(u32 *)(r1 +0) = r2
  1673: (b7) r0 = 1
  1674: (95) exit        <-- 1674 insns total

 2) Every function as __attribute__ ((noinline)), rest
    left unchanged:

  # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:236 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
  [...]
  # bpftool prog dump xlated id 236
  [...]
  1000: (bf) r1 = r6
  1001: (b7) r2 = 24
  1002: (85) call pc+3   <-- pc-relative call insns
  1003: (1f) r7 -= r0
  1004: (bf) r0 = r7
  1005: (95) exit
  1006: (bf) r0 = r1
  1007: (bf) r1 = r2
  1008: (67) r1 <<= 32
  1009: (77) r1 >>= 32
  1010: (bf) r3 = r0
  1011: (6f) r3 <<= r1
  1012: (87) r2 = -r2
  1013: (57) r2 &= 31
  1014: (67) r0 <<= 32
  1015: (77) r0 >>= 32
  1016: (7f) r0 >>= r2
  1017: (4f) r0 |= r3
  1018: (95) exit        <-- 1018 insns total

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:43 -07:00
Daniel Borkmann 6e5094dbb7 bpf: remove strict dependency on af_alg
Do not bail out when AF_ALG is not supported by the kernel and
only do so when a map is requested in object ns where we're
calculating the hash. Otherwise, the loader can operate just
fine, therefore lets not fail early when it's not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:40 -07:00
Daniel Borkmann 282a1fe1f8 bpf: move bpf_elf_map fixup notification under verbose
No need to spam the user with this if it can be fixed gracefully
anyway. Therefore, move it under verbose option.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:38:38 -07:00
David Ahern 5081979176 Import btf.h from kernel headers
Import btf.h from kernel headers at commit
    2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
which is the last sync point.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 19:37:50 -07:00
Roopa Prabhu 9c6a6d84ee ipneigh: exclude NTF_EXT_LEARNED from default filter
NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-17 18:57:21 -07:00
Jakub Kicinski da083b5a48 iplink: add support for reporting multiple XDP programs
Kernel now supports attaching XDP programs in the driver
and hardware at the same time.  Print that information
correctly.

In case there are multiple programs attached kernel will
not provide IFLA_XDP_PROG_ID, so don't expect it to be
there (this also improves the printing for very old kernels
slightly, as it avoids unnecessary "prog/xdp" line).

In short mode preserve the current outputs but don't print
IDs if there are multiple.

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpoffload/id:11 qdisc [...]

and:

6: netdevsim0: <BROADCAST,NOARP> mtu 1500 xdpmulti qdisc [...]

ip link output will keep using prog/xdp prefix if only one program
is attached, but can also print multiple program lines:

    prog/xdp id 8 tag fc7a51d1a693a99e jited

vs:

    prog/xdpdrv id 8 tag fc7a51d1a693a99e jited
    prog/xdpoffload id 9 tag fc7a51d1a693a99e

JSON output gains a new array called "attached" which will
contain the full list of attached programs along with their
attachment modes:

        "xdp": {
            "mode": 3,
            "prog": {
                "id": 11,
                "tag": "fc7a51d1a693a99e",
                "jited": 0
            },
            "attached": [ {
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

In case there are multiple programs attached the general "xdp"
section will not contain program information:

        "xdp": {
            "mode": 4,
            "attached": [ {
                    "mode": 1,
                    "prog": {
                        "id": 10,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 1
                    }
                },{
                    "mode": 3,
                    "prog": {
                        "id": 11,
                        "tag": "fc7a51d1a693a99e",
                        "jited": 0
                    }
                } ]
        },

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:10:03 -07:00
Jianbo Liu 1f0a5dfd38 tc: flower: Add support for QinQ
To support matching on both outer and inner vlan headers,
we add new cvlan_id/cvlan_prio/cvlan_ethtype for inner vlan header.

Example:
# tc filter add dev eth0 protocol 802.1ad parent ffff: \
    flower vlan_id 1000 vlan_ethtype 802.1q \
        cvlan_id 100 cvlan_ethtype ipv4 \
    action vlan pop \
    action vlan pop \
    action mirred egress redirect dev eth1

# tc filter show dev eth0 ingress
filter protocol 802.1ad pref 1 flower chain 0
filter protocol 802.1ad pref 1 flower chain 0 handle 0x1
  vlan_id 1000
  vlan_ethtype 802.1Q
  cvlan_id 100
  cvlan_ethtype ip
  eth_type ipv4
  in_hw

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:03:50 -07:00
David Ahern 3eebc1d4f4 Update kernel headers
Update kernel headers to commit
2aa4a3378ad0 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-15 13:02:51 -07:00
David Ahern 5910422d21 Merge branch 'tc-etf' into iproute2-next
Jesus Sanchez-Palencia  says:

====================

fixes since v3:
 - Add support for clock names with the "CLOCK_" prefix;
 - Print clock name on print_opt();
 - Use strcasecmp() instead of strncasecmp().

The ETF (earliest txtime first) qdisc was recently merged into net-next
[1], so this patchset adds support for it through the tc command line
tool.

An initial man page is also provided.

The first commit in this series is adding an updated version of
include/uapi/linux/pkt_sched.h and is not meant to be merged. It's
provided here just as a convenience for those who want to easily build
this patchset.

[1] https://patchwork.ozlabs.org/cover/938991/

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:51:46 -07:00
Jesus Sanchez-Palencia 85d699c3a8 man: Add initial manpage for tc-etf(8)
Add an initial manpage for tc-etf covering all config options, basic
concepts and operation modes.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:53 -07:00
Vinicius Costa Gomes 7da5ef2200 tc: Add support for the ETF Qdisc
The "Earliest TxTime First" (ETF) queueing discipline allows precise
control of the transmission time of packets by providing a sorted
time-based scheduling of packets.

The syntax is:

tc qdisc add dev DEV parent NODE etf delta <DELTA>
                     clockid <CLOCKID> [offload] [deadline_mode]

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-11 17:50:10 -07:00
Stephen Hemminger b49759c0e7 tc: don't double print rate
Conversion to print stats in JSON forgot to remove existing
fprintf.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-09 09:53:45 -07:00
Jesus Sanchez-Palencia 4df5bb1be0 man: Fix typos on tc-cbs
Fix 2 typos on the man page of the CBS qdisc.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
fumihiko kakuma d529ea2ff4 tc: Fix the bug not to display prio and quantum options of htb
A commandline like 'tc -d class show dev dev-name' does not
display value of prio and quantum option when we use htb qdisc.
This patch fixes the bug.

Signed-off-by: Fumihiko Kakuma <kakuma@valinux.co.jp>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Roi Dayan 425dcc2741 tc: Fix output of ip attributes
Example output is of tos and ttl.
Befoe this fix the format used %x caused output of the pointer
instead of the intended string created in the out variable.

Fixes: e28b88a464 ("tc: jsonify flower filter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:57:45 -07:00
Stephen Hemminger dc3ef235f3 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-07-07 09:56:27 -07:00
Simon Horman 6217917a38 tc: m_tunnel_key: Add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 09:10:05 -07:00
Moshe Shemesh 13925ae9eb devlink: Add param command support
Add support for configuration parameters set and show.
Each parameter can be either generic or driver-specific.
The user can retrieve data on these configuration parameters by devlink
param show command and can set new value to a configuration parameter
by devlink param set command.
The configuration parameters can be set in different configuration
modes:
  runtime - set while driver is running, no reset required.
  driverinit - applied while driver initializes, requires restart
               driver by devlink reload command.
  permanent - written to device's non-volatile memory, hard reset
              required to apply.

New commands added:
  devlink dev param show [DEV name PARAMETER]
  devlink dev param set DEV name PARAMETER value VALUE
			    cmode { permanent | driverinit | runtime }

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:43:28 -07:00
David Ahern 22ddbd8204 Update kernel headers
Update kernel headers to commit
ab8565af68001 ("Merge branch 'IP-listification-follow-ups'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 08:42:22 -07:00
Nikolay Aleksandrov 05001bcfab bridge: add support for isolated option
This patch adds support for the new isolated port option which, if set,
would allow the isolated ports to communicate only with non-isolated
ports and the bridge device. The option can be set via the bridge or ip
link type bridge_slave commands, e.g.:
$ ip link set dev eth0 type bridge_slave isolated on
$ bridge link set dev eth0 isolated on

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-06 07:58:41 -07:00
David Ahern f2bfb31bef Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-21 08:12:39 -07:00
Keara Leibovitz 4757a54799 tc: jsonify nat action
Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
	continue index 99
~$ $TC -j actions ls action nat

[{
	"total acts": 2
}, {
	"actions": [{
		"order": 0,
		"type": "nat",
		"direction": "egress",
		"old_addr": "10.10.10.1/32",
		"new_addr": "20.20.20.2",
		"control_action": {
			"type": "pass"
		},
		"index": 2,
		"ref": 1,
		"bind": 0
	}, {
		"order": 1,
		"type": "nat",
		"direction": "ingress",
		"old_addr": "100.100.100.1/32",
		"new_addr": "200.200.200.2",
		"control_action": {
			"type": "continue"
		},
		"index": 99,
		"ref": 1,
		"bind": 0
	}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-20 10:20:34 -07:00
Eric S. Raymond a85f921ae5 devlink.8, translate unparseable callout syntax to parseable form.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:41:41 -07:00
Vlad Buslov b133392468 tc: fix batch force option
When sending accumulated compound command results an error, check 'force'
option before exiting. Move return code check after putting batch bufs and
freeing iovs to prevent memory leak. Break from loop, instead of returning
error code to allow cleanup at the end of batch function. Don't reset ret
code on each iteration.

Fixes: 485d0c6001 ("tc: Add batchsize feature for filter and actions")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-20 09:32:36 -07:00
Subash Abhinov Kasiviswanathan 2ecb61a0c2 ip-xfrm: Add support for OUTPUT_MARK
This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output-

(with mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with output-mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(no mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

v2->v3: Fix one help formatting error as mentioned by Lorenzo.
Keep mark and output-mark on the same line and add man page info as
mentioned by David.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-18 06:37:00 -07:00
Daniele Palmas 46c16a5d1e ip: add rmnet initial support
This patch adds basic support for Qualcomm rmnet devices.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:15:14 -07:00
Patrick Talbert cad73425d8 ipaddress: strengthen check on 'label' input
As mentioned in the ip-address man page, an address label must
be equal to the device name or prefixed by the device name
followed by a colon. Currently the only check on this input is
to see if the device name appears at the beginning of the label
string.

This commit adds an additional check to ensure label == dev or
continues with a colon.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:14:19 -07:00
Hoang Le 5887ff0922 rdma: sync some IP headers with glibc
In the commit 9a362cc71a, new userspace header:
  (i.e rdma/rdma_user_cm.h -> linux/in6.h)
is included before the kernel space header:
  (i.e utils.h -> resolv.h -> netinet/in.h).

This leads to unsynchronous some IP headers and compiler got failure
with error: redefinition of some structs IP.

In this commit, just reorder this including to make them in-sync.

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-15 11:11:51 -07:00
Hoang Le a56e0db7e8 tipc: JSON support for tipc link printouts
Add json output support for tipc link command

Example output:
$tipc -j -p link list

[ {
        "broadcast-link": "up",
        "1.1.1:bridge-1.1.104:eth0": "up",
        "1.1.1:bridge-1.1.105:eth0": "up",
        "1.1.1:bridge-1.1.106:eth0": "up"
    } ]

--------------------
$tipc -j -p link stat show link broadcast-link

[ {
        "link": "broadcast-link",
        "window": 50,
        "rx packets": {
            "rx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "tx packets": {
            "tx packets": 0,
            "fragments": 0,
            "fragmented": 0,
            "bundles": 0,
            "bundled": 0
        },
        "rx naks": {
            "rx naks": 0,
            "defs": 0,
            "dups": 0
        },
        "tx naks": {
            "tx naks": 0,
            "acks": 0,
            "retrans": 0
        },
        "congestion link": 0,
        "send queue max": 0,
        "avg": 0
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:59 -07:00
Hoang Le 1304f50a5b tipc: JSON support for showing nametable
Add json output support for nametable show

Example output:
$tipc -j -p nametable show

[ {
        "type": 0,
        "lower": 16781313,
        "upper": 16781313,
        "scope": "zone",
        "port": 0,
        "node": ""
    },{
        "type": 0,
        "lower": 16781416,
        "upper": 16781416,
        "scope": "cluster",
        "port": 0,
        "node": ""
    } ]

v2:
    Replace variable 'json_flag' by 'json' declared in include/utils.h
    Add new parameter '-pretty' to support pretty output

v3:
    Update manual page

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-13 20:45:38 -07:00
Donald Sharp a313455c6c iproute2: Add support for a few routing protocols
Add support for:

BGP
ISIS
OSPF
RIP
EIGRP

Routing protocols to iproute2.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-11 11:18:30 -07:00
David Ahern ee095a417e Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-10 07:30:32 -07:00
Stephen Hemminger 776f1813b5 uapi: update headers from linux-net
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:27:13 -07:00
Stephen Hemminger 17678d3059 Merge ../iproute2-next 2018-06-08 10:27:04 -07:00
Stephen Hemminger 2d3dd6f6c1 v4.17.0 2018-06-08 10:11:50 -07:00
Stephen Hemminger 4be85d574e uapi: update bpf.h to include padding
Last minute upstream 4.17 change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:09:21 -07:00
Hoang Le 313ce6949c tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK
In the commit 94f6a80 on next-net, TIPC_NLA_LINK_NAME attribute should be
retrieved and validated via TIPC_NLA_LINK nesting entry in
tipc_nl_node_get_link().
According to that commit, TIPC_NLA_LINK_NAME value passing via
tipc link get command must follow above hierachy.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:07:13 -07:00
Nicolas Dichtel 974ef93bf1 iplink: enable to specify a name for the link-netns
The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:06:21 -07:00
Nicolas Dichtel 9580bad7b9 ip: display netns name instead of nsid
When iproute2 has a name for the nsid, let's display it. It's more
user friendly than a number.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-08 10:06:21 -07:00
Keara Leibovitz 831b5d40d9 tc: add json support in csum action
Add json output support for checksum action.

Example output:

~$ $TC actions add action csum udp continue index 7
~$ $TC actions add action csum icmp iph igmp pipe index 200 cookie 112233
~$ $TC -j actions ls action csum

[{
    "total acts":2
}, {
    "actions": [{
        "order":0,
        "csum":"udp",
        "control_action": {
            "type":"continue"
        },
        "index":7,
        "ref":1,
        "bind":0
    }, {
        "order":1,
        "csum":"iph, icmp, igmp",
        "control_action": {
            "type":"pipe"
        },
        "index":200,
        "ref":1,
        "bind":0,
        "cookie":"112233"
    }]
}]

v2:
    Don't initialized char buf[64];
    Add output example

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-05 15:30:30 -07:00
David Ahern 55b973329c Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-05 14:22:15 -07:00
Ivan Vecera 9e4a92e5aa devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts
Since commit 049c58539f ("devlink: mnlg: Add support for extended ack")
devlink requires NETLINK_{CAP,EXT}_ACK. This prevents devlink from
working with older kernels that don't support these features.

host # ./devlink/devlink
Failed to connect to devlink Netlink

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-06-01 16:03:59 -04:00
Nicolas Dichtel eaf89d7d52 ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support
Parse and display those attributes.
Example:
ip l a type dummy
ip netns add foo
ip monitor link&
ip l s dummy1 netns foo
Deleted 6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 66:af:3a:3f:a0:89 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:59:40 -04:00
Nathan Harold b8e7799003 iproute2: fix 'ip xfrm monitor all' command
Currently, calling 'ip xfrm monitor all' will
actually invoke the 'all-nsid' command because the
soft-match for 'all-nsid' occurs before the precise
match for 'all'. This patch rearranges the checks
so that the 'all' command, itself an alias for
invoking 'ip xfrm monitor' with no argument, can
be called consistent with the syntax for other ip
commands that accept an 'all'.

Signed-off-by: Nathan Harold <nharold@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:57:26 -04:00
David Ahern 4c3493689e iplink_vrf: Save device index from response for return code
A recent commit changed rtnl_talk_* to return the response message in
allocated memory so callers need to free it. The change to name_is_vrf
did not save the device index which is pointing to a struct inside the
now allocated and freed memory resulting in garbage getting returned
in some cases.

Fix by using a stack variable to save the return value and only set
it to ifi->ifi_index after all checks are done and before the answer
buffer is freed.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:45:09 -04:00
Stephen Hemminger 2884af6d37 rt_protos: drop old experimental gated names
No longer need these petroglyph values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-06-01 15:44:52 -04:00
Roopa Prabhu 804c7fff76 iproute: ip route get support for sport, dport and ipproto match
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:19:30 -07:00
David Ahern 9107c425ac ip route: print RTA_CACHEINFO if it exists
RTA_CACHEINFO can be sent for non-cloned routes. If the attribute is
present print it. Allows route dumps to print expires times for example
which can exist on FIB entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:18:31 -07:00
David Ahern 45c0dd7286 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-06-01 08:17:23 -07:00
David Ahern 78d04c7b27 ipaddress: Add support for address metric
Add support for IFA_RT_PRIORITY using the same keywords as iproute for
RTA_PRIORITY.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-30 08:20:04 -07:00
David Ahern 57ac202c78 Update kernel headers
Update kernel headers to commit
ae40832e53c3 ("bpfilter: fix a build err")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-30 08:06:19 -07:00
Stephen Hemminger 65083b5fe3 ip: defer lookup interface index
The ip command would always lookup the network device index
even when not necessary. This slows down operations like creating
lots of VLAN's.

David reported the original issue, this is an alternative patch
that solves it in a slightly more general method.

Using iproute2 to create a bridge and add 4094 vlans to it can take from
2 to 3 *minutes*. The reason is the extraneous call to ll_name_to_index.
ll_name_to_index results in an ioctl(SIOCGIFINDEX) call which in turn
invokes dev_load. If the index does not exist, which it won't when
creating a new link, dev_load calls modprobe twice -- once for
netdev-NAME and again for NAME. This is unnecessary overhead for each
link create.

When ip link is invoked for a new device, there is no reason to
call ll_name_to_index for the new device. With this patch, creating
a bridge and adding 4094 vlans takes less than 3 *seconds*.

	old:
	# time ip -batch ip-vlan.batch
	real    3m13.727s
	user    0m0.076s
	sys     0m1.959s

	new:
	# time ip -batch ip-vlan.batch
	real    0m3.222s
	user    0m0.044s
	sys     0m1.777s

Reported-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-25 07:48:40 -07:00
David Ahern 39d16a02d9 ip route: Print expires as signed int
rta_expires is a signed int; print it as one.

Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:29:56 -07:00
Pavel Maltsev e2f5ceccda Allow to configure /var/run/netns directory
Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
However, some systems (e.g. Android) doesn't have /var
which results in error attempts to create network namespaces on these
systems.  This change makes NETNS_RUN_DIR configurable at build time
by allowing to pass environment variable to make command.
Also, this change makes /etc/netns directory configurable through
NETNS_ETC_DIR environment variable.

For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make

Tested: verified that iproute2 with configuration mentioned above
creates namespaces in /mnt/vendor/netns

Signed-off-by: Pavel Maltsev <pavelm@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-23 15:16:53 -07:00
Jiri Pirko 852ed60528 devlink: introduce support for showing port flavours
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:55 -07:00
David Ahern c2a569e63c Update kernel headers
Update kernel headers to commit
e89e59c08d1b ("Merge branch 'net-sfp-small-improvements'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-23 12:58:34 -07:00
David Ahern 24bd5ac6b8 Merge branch 'rdma-resource-tracking' into iproute2-next
Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include displaying
driver-specific resource attributes.  It is the user-space part of the
kernel driver resource tracking series that has been accepted for merging
into linux-4.18 [1]

If there are no additional review comments, it can now be merged, I think.

Changes since v2:
- resync rdma_netlink.h to fix uapi break

Changes since v1:
- commit log editorial fixes
- cite kernel commits that updated rdma_netlink.h in the
  iproute2 commit syncing this header
- reorder stack definitions ala "reverse christmas tree"
- correctly handle unknown driver attributes when printing

Changes since v0/rfc:
- changed "provider" to "driver" based on kernel side changes
- updated man pages
- removed "RFC" tag

Thanks,

Steve.

[1] https://www.spinics.net/lists/linux-rdma/msg64199.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:21:42 -07:00
Steve Wise 853d222d78 rdma: update man pages
Update the man pages for the resource attributes as well
as the driver-specific attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:20:01 -07:00
Steve Wise 331152752a rdma: print driver resource attributes
This enhancement allows printing rdma device-specific state, if provided
by the kernel. This is done in a generic manner, so rdma tool doesn't
need to know about the details of every type of rdma device.

Driver attributes for a rdma resource are in the form of <key,
[print_type], value> tuples, where the key is a string and the value can
be any supported driver attribute. The print_type attribute, if present,
provides a print format to use vs the standard print format for the type.
For example, the default print type for a PROVIDER_S32 value is "%d ",
but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.

Driver resources are only printed when the -dd flag is present.
If -p is present, then the output is formatted to not exceed 80 columns,
otherwise it is printed as a single row to be grep/awk friendly.

Example output:

# rdma resource show qp lqpn 1028 -dd -p
link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
    sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
    size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
    rqt_size 256 in_use 128 size 130

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:17:16 -07:00
Steve Wise 366d20b91f rdma: update rdma_netlink.h to get new driver attributes
Pull in the rdma_netlink.h changes from kernel
commits:

25a0ad85156a ("RDMA/nldev: Add explicit pad attribute")
da5c85078215 ("RDMA/nldev: add driver-specific resource tracking)"
0d52d803767e ("RDMA/uapi: Fix uapi breakage")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:16:22 -07:00
Jon Maloy 5947046dd9 tipc: fixed node and name table listings
We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:12:24 -07:00
Roman Mashak 53d34eb66c tc: add missing space symbol in ife output
In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Fixes: 8744c5d338 ("tc: jsonify ife action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:10:48 -07:00
Marcelo Ricardo Leitner ac6a4c2299 tc: flower: add support for verbose logging
Currently there is no way to log offloading errors if the rule is not
explicitly marked as skip_sw, making it hard for other applications such
as Open vSwitch to log why a given could not be offloaded.

This patch adds support for signaling the kernel that more verbose
logging is wanted, which now will include such messages.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:06:04 -07:00
David Ahern 4276e65290 Update kernel headers
Update kernel headers to commit
64a2658b58ab ("net: mscc: Add SPDX identifier")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-18 09:05:07 -07:00
Stephen Hemminger 405e0c4ffe tc: allow 0% for percent options
Allowing 0% is sometimes useful for example in netem loss and drop
or perhaps dropping all traffic in a HTB bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199745
Reported-by: stuartmarsden@gmail.com
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-17 16:20:50 -07:00
Marcelo Ricardo Leitner 4f59f4a5af tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-16 15:36:48 -07:00
Marcelo Ricardo Leitner 3f2c23811d tc-netem: fix limit description in man page
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:

static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                         struct sk_buff **to_free)
{
	...
        if (unlikely(sch->q.qlen >= sch->limit))
                return qdisc_drop_all(skb, sch, to_free);

So lets fix the description of the field in the man page.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:16:29 -07:00
David Ahern 961d0991bc Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-16 14:10:27 -07:00
Luca Boccassi 9b13cc98f5 ip: do not drop capabilities if net_admin=i is set
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.

As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.

Fixes: ba2fc55b99 ("Drop capabilities if not running ip exec vrf with libcap")

Signed-off-by: Luca Boccassi <bluca@debian.org>
2018-05-14 21:07:34 -07:00
David Ahern 4aba7dc4f4 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	rdma/include/uapi/rdma/rdma_netlink.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 21:04:16 -07:00
GhantaKrishnamurthy MohanKrishna 7d40bdbc8d tipc: Add support to set and get MTU for UDP bearer
In this commit we introduce the ability to set and get
MTU for UDP media and bearer.

For set and get properties such as tolerance, window and priority,
we already do:

    $ tipc media set PPROPERTY media MEDIA
    $ tipc media get PPROPERTY media MEDIA

    $ tipc bearer set OPTION media MEDIA ARGS
    $ tipc bearer get [OPTION] media MEDIA ARGS

The same has been extended for MTU, with an exception to support
only media type UDP.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:53:32 -07:00
David Ahern fd95ec0e8e Update kernel headers
Update kernel headers to commit 53a7bdfb2a27
("dt-bindings: dsa: Remove unnecessary #address/#size-cells")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-09 20:52:52 -07:00
Stephen Hemminger 10f687736b ss: remove non-functional slabinfo
Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.

Rather than trying to fix kernel, just trim away the no longer
valid statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 13:57:08 -07:00
Stephen Hemminger 9b2ab68516 rdma: add ib header files
The iproute2 header files must be complete to allow builds on
other places where some of the headers are not present.

For example, iproute2 is built on Windows Services for Linux
as a test tool. With the partial addition of rdma it was broken.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:14:55 -07:00
Stephen Hemminger 36eece51e3 rdma: align headers with upstream
This makes rdma/include/uapi/rdma headers align with those produced
by doing make headers_install from upstream (Linus) tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-09 08:12:13 -07:00
Jakub Kicinski 0c0394ff83 bpf: don't offload perf array maps
Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:08:00 -07:00
David Ahern 7732148d1d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-05-05 11:07:47 -07:00
Ido Schimmel c0ec7c9f87 iproute: Parse last nexthop in a multipath route
Continue parsing a multipath payload as long as another nexthop can fit
in the payload.

# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1

Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1

After:
# ip route show 192.0.2.0/24
192.0.2.0/24
        nexthop dev dummy0 weight 1
        nexthop dev dummy1 weight 1

Fixes: f48e14880a ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:44 -07:00
Baruch Siach 37bf5c6fcb arpd: remove pthread dependency
Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.

This change allows arpd build with toolchains that do not provide
threads support.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Baruch Siach 3b07981a27 README: update libdb build dependency information
Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-01 19:29:03 -07:00
Toke Høiland-Jørgensen 4db2ff0db4 json_print: Fix hidden 64-bit type promotion
print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:55 -07:00
Toke Høiland-Jørgensen bf717756b5 ingress: Don't break JSON output
The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 11:08:39 -07:00
Hangbin Liu 8f01001abc vxlan: add ttl auto in help message
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:43:46 -07:00
Sabrina Dubroca 7f520601f5 gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.

Add no* variants to allow clearing them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:58 -07:00
Sabrina Dubroca d21c028cf7 man: ip link: document GRE tunnels
GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:44 -07:00
David Ahern 0d93d1e736 Merge branch 'master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-23 19:42:21 -07:00
Jakub Kicinski f5393225f9 iplink_geneve: correct size of message to avoid spurious errors
Commit 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:

addattr_l ERROR: message exceeded bound of 4

when remote is specified.

Fixes: 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-04-20 10:39:53 -07:00
Stephen Hemminger 260a92afe6 bpf: fix warnings on gcc-8 about string truncation
In theory, the path for BPF could exceed the 4K PATH_MAX.
In practice, not really possible. But shut up gcc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:38:00 -07:00
Roman Mashak 0aaf62fcb6 tc: return on invalid smac or dmac in ife action
Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-04-20 10:35:21 -07:00
Stephen Hemminger 0b01f088ee flower: use 16 bit format where possible
Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:35:00 -07:00
Stephen Hemminger 7cd3f08b6f ipneigh: fix missing format specifier
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-20 10:33:22 -07:00
Hangbin Liu 5b1c363c7b vxlan: fix ttl inherit behavior
Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
vxlan ttl inherit should means inherit the inner protocol's ttl value.

But currently when we add vxlan with "ttl inherit", we only set ttl 0,
which is actually use whatever default value instead of inherit the inner
protocol's ttl value.

To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
to means "use whatever default value", the same behavior with ttl == 0.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:11:27 -07:00
David Ahern 075bf62a70 Update kernel headers
Update kernel headers to commit 292eba02dbb4
("net-next/hinic: add arm64 support")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-19 11:10:27 -07:00
David Ahern d42c7891d2 utils: Do not reset family for default, any, all addresses
Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.

'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.

Fixes: 93fa12418d ("utils: Always specify family and ->bytelen in get_prefix_1()")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Serhey Popovych <serhe.popovych@gmail.com>
2018-04-16 17:00:48 -07:00
Jakub Sitnicki ee53b42fd8 iproute: Abort if nexthop cannot be parsed
Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:

  # ip -6 route add fd00::1/64 nexthop dev fake1
  Cannot find device "fake1"
  Cannot find device "fake1"
  Cannot find device "fake1"
  ...
  Segmentation fault (core dumped)

Don't ignore errors from the helper routine that parses the nexthop
definition, and abort immediately if parsing fails.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
2018-04-16 16:58:38 -07:00
Roman Mashak 8744c5d338 tc: jsonify ife action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:23:17 -07:00
Roman Mashak 7b17701717 tc: jsonify skbedit action
v2:
   FIxed strings format in print_string()

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-15 17:09:16 -07:00
Stephen Hemminger 811ee8943c uapi/sctp: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:50:00 -07:00
Stephen Hemminger b7d3a4f009 uapi/tipc: update header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:49:41 -07:00
Stephen Hemminger dcf7997bcd uapi/bpf: update kernel header from 4.17-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-10 10:48:56 -07:00
Guillaume Nault ef36717816 bridge: fix typo in hairpin error message
No 'g' to hairpin.

Fixes: 64108901b7 ("bridge: Add support for setting bridge port attributes")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-09 11:17:50 -07:00
Roman Mashak 8feb516bfc tc: jsonify tunnel_key action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:33 -07:00
Roman Mashak 1d3c91a7c4 tc: jsonify connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-08 10:52:32 -07:00
Leon Romanovsky 1525942736 rdma: Print net device name and index for RDMA device
The RDMA devices are operated in RoCE and iWARP modes have net device
underneath. Present their names in regular output and their net index
in detailed mode.

[root@nps ~]# rdma link show mlx5_3/1
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7
[root@nps ~]# rdma link show mlx5_3/1 -d
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens7 netdev_index 7
    caps: <CM, IP_BASED_GIDS>

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-06 09:02:32 -07:00
David Ahern 09e3802b9a Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-06 09:02:02 -07:00
Guillaume Nault 458539ad35 l2tp: no need to export session offsets in JSON output
The offset and peer_offset parameters are only printed to avoid
confusing external scripts that may parse "ip l2tp show session"
output. There's no reason to keep them in JSON.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
2018-04-05 12:43:23 -07:00
Yuval Mintz 0927bf83e7 tc: Correct json output for actions
Commit 9fd3f0b255 ("tc: enable json output for actions") added JSON
support for tc-actions at the expense of breaking other use cases that
reach tc_print_action(), as the latter don't expect the 'actions' array
to be a new object.

Consider the following taken duringrun of tc_chain.sh selftest,
and see the latter command output is broken:

$ ./tc/tc -j -p actions list action gact | grep -C 3 actions
[ {
        "total acts": 1
    },{
        "actions": [ {
                "order": 0,

$ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
            },
            "skip_hw": true,
            "not_in_hw": true,{
                "actions": [ {
                        "order": 1,
                        "kind": "gact",
                        "control_action": {

Relocate the open/close of the JSON object to declare the object only
for the case that needs it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:41:36 -07:00
Guillaume Nault 2f75c5cf1a ip/l2tp: remove offset and peer-offset options
Ignore options "peer-offset" and "offset" when creating sessions. Keep
them when dumping sessions in order to avoid breaking external scripts.

"peer-offset" has always been a noop in iproute2. "offset" is now
ignored in Linux 4.16 (and was broken before that).

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:41:11 -07:00
Leon Romanovsky fda0a61dde rdma: Ignore unknown netlink attributes
The check if netlink attributes supplied more than maximum supported
is to strict and may lead to backward compatibility issues with old
application with a newer kernel that supports new attribute.

CC: Steve Wise <swise@opengridcomputing.com>
Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 16:39:58 -07:00
David Ahern 2c62a64d60 Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c
	misc/ss.c
	tc/tc.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-02 10:47:34 -07:00
Stephen Hemminger 4b6c4177ee v4.16.0 2018-04-02 10:06:08 -07:00
Jiri Pirko 6b4f03f518 man: fix devlink object list
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-02 09:19:59 -07:00
Stephen Hemminger 200e9d1961 uapi/if_ether: add definition of ether type field
Part of upstream commit
4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-02 09:19:08 -07:00
David Ahern 43eb8728b3 devlink: Print size of -1 as unlimited
(u64)-1  essentially means the size is unlimited. Print as 'unlimited'
as opposed to the current unsigned int range of 4294967295.

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-02 07:54:18 -07:00
Roman Mashak 7ada016aeb tc: jsonify sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:44:31 -07:00
Roman Mashak c2f60f5c8e tc: support oneline mode in action generic printer functions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:37:32 -07:00
David Ahern 386e37f543 Merge branch 'rdma-res-tracking' into iproute2-next
Steve Wise  says:

====================

This series enhances the iproute2 rdma tool to include dumping of
connection manager id (cm_id), completion queue (cq), memory region (mr),
and protection domain (pd) rdma resources.  It is the user-space part of
the kernel resource tracking series merged into rdma-next for 4.17 [1]
and [2].

Changes since v3:
- replaced rdma_cma.h inclusion with UAPI rdma_user_cm.h
- display only device names instead of device/port for cq, mr, and pd
since they are not associated with a specific port.

Changes since v2:
- pull in rdma-core:include/rdma/rdma_cma.h
- 80 column reformat
- add reviewed-by tags

Changes since v1/RFC:
- removed RFC tag
- initialize rd properly to avoid passing a garbage port number
- revert accidental change to qp_valid_filters
- removed cm_id dev/network/transport types
- cm_id ip addrs now passed up as __kernel_sockaddr_storage
- cm_id ip address ports printed as "address:port" strings
- only parse/display memory keys and iova if available
- filter on "users" for cqs and pds
- fixed memory leaks
- removed PD_FLAGS attribute
- filter on "mrlen" for mrs
- filter on "poll-ctx" for cqs
- don't require addrs or qp_type for parsing cm_ids
- only filter optional attrs if they are present
- remove PGSIZE MR attr to match kernel

[1] https://www.spinics.net/lists/linux-rdma/msg61720.html
[2] https://www.spinics.net/lists/linux-rdma/msg62979.html
    https://www.spinics.net/lists/linux-rdma/msg62980.html

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:19:21 -07:00
Steve Wise 4060e4c0d2 rdma: Add PD resource tracking information
Sample output:

Without CAP_NET_ADMIN capability:

dev mlx4_0 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 users 0 pid 0 comm [ib_srp]
dev mlx4_0 users 1 pid 0 comm [ib_core]
dev cxgb4_0 users 0 pid 0 comm [ib_srp]

With CAP_NET_ADMIN capability:
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srp]
dev mlx4_0 local_dma_lkey 0x8000 users 1 pid 0 comm [ib_core]
dev cxgb4_0 local_dma_lkey 0x0 users 0 pid 0 comm [ib_srp]

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:19:01 -07:00
Steve Wise 8958a15c04 rdma: Add MR resource tracking information
Sample output:

Without CAP_NET_ADMIN:

$ rdma resource show mr mrlen 65536
dev mlx4_0 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 mrlen 65536 pid 0 comm [nvme_rdma]

With CAP_NET_ADMIN:

# rdma resource show mr mrlen 65536
dev mlx4_0 rkey 0x12702 lkey 0x12702 iova 0x85724a000 mrlen 65536 pid 0 comm [nvme_rdma]
dev cxgb4_0 rkey 0x68fe4e9 lkey 0x68fe4e9 iova 0x835b91000 mrlen 65536 pid 0 comm [nvme_rdma]

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:56 -07:00
Steve Wise b0b8e32cbf rdma: Add CQ resource tracking information
Sample output:

# rdma resource show cq
dev cxgb4_0 cqe 46 users 2 pid 30503 comm rping
dev cxgb4_0 cqe 46 users 2 pid 30498 comm rping
dev mlx4_0 cqe 63 users 2 pid 30494 comm rping
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping
dev mlx4_0 cqe 1023 users 2 poll_ctx WORKQUEUE pid 0 comm [ib_core]

# rdma resource show cq pid 30489
dev mlx4_0 cqe 63 users 2 pid 30489 comm rping

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:51 -07:00
Steve Wise 9a362cc71a rdma: Add CM_ID resource tracking information
Sample output:

# rdma resource
2: cxgb4_0: pd 5 cq 2 qp 2 cm_id 3 mr 7
3: mlx4_0: pd 7 cq 3 qp 3 cm_id 3 mr 7

# rdma resource show cm_id
link cxgb4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link cxgb4_0/2 lqpn 1048 qp-type RC state CONNECT ps TCP pid 30503 comm rping src-addr 172.16.2.1:7174 dst-addr 172.16.2.1:38246
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
link mlx4_0/1 lqpn 539 qp-type RC state CONNECT ps TCP pid 30494 comm rping src-addr 172.16.99.1:7174 dst-addr 172.16.99.1:43670
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174

# rdma resource show cm_id dst-port 7174
link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:47 -07:00
Steve Wise 80c0478fdf rdma: initialize the rd struct
Initialize the rd struct so port_idx is 0 unless set otherwise.
Otherwise, strict_port queries end up passing an uninitialized PORT
nlattr.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:43 -07:00
Steve Wise 8d61311611 rdma: add UAPI rdma_user_cm.h
This allows parsing rdma_cm_id UAPI values.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:38 -07:00
Steve Wise 29122c1aae rdma: update rdma_netlink.h
Pull in the latest rdma_netlink.h which has support for
the rdma nldev resource tracking objects being added
with this patch series.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-04-01 08:18:20 -07:00
Roman Mashak 9fd3f0b255 tc: enable json output for actions
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:55:17 -07:00
Roman Mashak 6e8634eb13 tc: add oneline mode
Add initial support for oneline mode in tc; actions, filters and qdiscs
will be gradually updated in the follow-up patches.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-30 08:18:58 -07:00
David Ahern 8c5bf7f0e6 Merge branch 'tipc-addr' into iproute2-next
Jon Maloy  says:

====================

1: We introduce ability to set/get 128-bit node identities
2: We rename 'net id' to 'cluster id' in the command API,
   of course in a compatible way.
3: We print out all 32-bit node addresses as an integer in hex format,
   i.e., we remove the assumption about an internal structure.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-29 10:50:30 -07:00
Alexander Zubkov c121807250 arrange prefix parsing code after redundant patches
A problem was reported with parsing of prefixes all/any/default.
Commit 7696f1097f fixes the problem,
but there were also other pathces applied:
00b31a6b2e, which were intended to
fix the same problem. And they became redundant now. This patch
reverts changes introduced by those redundant patches.

Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-29 08:42:04 -07:00
Stephen Hemminger 89e3c36b06 namespace: limit the length of namespace name to avoid snprintf overflow
This fixes problem reported by gcc-8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:40:26 -07:00
Stephen Hemminger 08a93b32f5 bpf: avoid compiler warnings about strncpy
Use strlcpy to avoid cases where sizeof(buf) == strlen(buf)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29 08:32:48 -07:00
Stephen Hemminger da8034a019 misc: avoid snprintf warnings in ss and nstat
Gcc 8 checks that target buffer is big enough.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger d5732e3470 ematch: fix possible snprintf overflow
Fixes gcc 8 warning about possible snprint overflow

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger b8a6088e13 tc_class: fix snprintf warning
Size buffer big enough to avoid any possible overflow.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:32:43 -07:00
Stephen Hemminger fcb18aa3d9 tunnel: use strlcpy to avoid strncpy warnings
Fixes warnings about strncpy size by using strlcpy.

tunnel.c: In function ‘tnl_gen_ioctl’:
tunnel.c:145:2: warning: ‘strncpy’ specified bound
 16 equals destination size [-Wstringop-truncation]
  strncpy(ifr.ifr_name, name, IFNAMSIZ);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger fc9d755a3e ip: use strlcpy() to avoid truncation
This fixes gcc-8 warnings about strncpy bounds by using
strlcpy instead.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger 95744efac4 pedit: fix strncpy warning
Newer versions of Gcc warn about string truncation.
Fix by using strlcpy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:28 -07:00
Stephen Hemminger 6c6c0291d2 bridge: avoid snprint truncation on time
This fixes new gcc warning about possible string overflow.

mdb.c: In function ‘__print_router_port_stats’:
mdb.c:61:11: warning: ‘%.2i’ directive output may be truncated
 writing between 2 and 7 bytes into a region of size
 between 0 and 4 [-Wformat-truncation=]
      "%4i.%.2i", (int)tv.tv_sec,
           ^~~~
Note: already fixed in iproute2-next.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-29 08:30:27 -07:00
Jon Maloy 5aad0baa3d tipc: change node address printout formats
Since a node address now per definition is only an unstructured 32-bit
integer it makes no sense print it out as a structured string.

In this commit, we replace all occurrences of "<Z.C.N>" printouts with
just an "%x".

Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:41:15 -07:00
Jon Maloy 725ebfbf62 tipc: introduce command for handling a new 128-bit node identity
We add the possibility to set and get a 128 bit node identifier, as
an alternative to the legacy 32-bit node address we are using now.

We also add an option to set and get 'clusterid' in the node. This
is the same as what we have so far called 'netid' and performs the
same operations. For compatibility the old 'netid' commands are
retained, -we just remove them from the help texts.

Acked-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:38:52 -07:00
Stephen Hemminger 98453b6580 ip/l2tp: add JSON support
Convert ip l2tp to use JSON output routines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:37:00 -07:00
Stephen Hemminger 1f483fc618 ip/ila: support json and color
Use json print to enhance ila output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:36:58 -07:00
David Ahern 083d782718 Merge branch 'tipc-stats' into iproute2-next
GhantaKrishnamurthy MohanKrishna
         says:

====================

The following patchset add user space TIPC socket diagnostics support
in ss tool of iproute2. It requires the sock_diag framework
for AF_TIPC support in the kernel, commit id: c30b70deb5f
(tipc: implement socket diagnostics for AF_TIPC).

tipc socket stats are requested with the "--tipc" option. Additional
tipc specific info are requested with "--tipcinfo" option.

This patchset is based on top of iproute2 v4.15.0-100-g4f63187
commitid: f85adc6. It has been co-authored by
Parthasarathy Bhuvaragan.

Example output (the first socket is the internal topology server)

State  Recv-Q  Send-Q     Local Address:Port           Peer Address:Port
UNCONN 0       0               16781313:2809484547                 -             ino:13348 sk:4 users:(("tipc-pipe",pid=292,fd=3))
LISTEN 0       0               16781313:4117673024                 -             ino:13346 sk:5 users:(("tipc-pipe",pid=291,fd=3))
ESTAB  0       0               16781313:484097386          16781313:3203149317   ino:13345 sk:6 users:(("tipc-pipe",pid=294,fd=4))
LISTEN 0       0               16781313:2438310591                 -             ino:13344 sk:7 users:(("tipc-pipe",pid=294,fd=3),("tipc-pipe",pid=290,fd=3))
LISTEN 0       0               16781313:2658440413                 -             ino:12368 sk:3
ESTAB  0       0               16781313:3203149317         16781313:484097386    ino:13349 sk:8 users:(("tipc-pipe",pid=293,fd=3))

State  Recv-Q  Send-Q     Local Address:Port           Peer Address:Port
UNCONN 0       0               16781313:2809484547                 -
type:RDM cong:none  drop:0  publ
LISTEN 0       0               16781313:4117673024                 -
type:SEQPACKET cong:none  drop:0  publ
ESTAB  0       0               16781313:484097386          16781313:3203149317
type:STREAM cong:none  drop:0  via {1000,1000}
LISTEN 0       0               16781313:2438310591                 -
type:STREAM cong:none  drop:0  publ
LISTEN 0       0               16781313:2658440413                 -
type:SEQPACKET cong:none  drop:0  publ
ESTAB  0       0               16781313:3203149317         16781313:484097386
type:STREAM cong:none  drop:0  via {1000,1000}

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:28:58 -07:00
GhantaKrishnamurthy MohanKrishna 5caf79a0bc ss: Add support for TIPC socket diag in ss tool
For iproute 4.x
Allow TIPC socket statistics to be dumped with --tipc
and tipc specific info with --tipcinfo.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:28:06 -07:00
David Ahern 9effc146b7 Update kernel headers
Update kernel headers to commit 5d22d47b9ed9
("Merge branch 'sfc-filter-locking'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-28 20:26:25 -07:00
Stephen Hemminger 83b3c60544 rdma: fix man page typos
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-28 11:06:55 -07:00
Phil Sutter 3e1652c94c ss: Drop filter_default_dbs()
Instead call filter_db_parse(..., "all"). This eliminates the duplicate
default DB definition.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Phil Sutter 67d5fd5587 ss: Put filter DB parsing into a separate function
Use a table for database name parsing. The tricky bit is to allow for
association of a (nearly) arbitrary number of DBs with each name.
Luckily the number is not fully arbitrary as there is an upper bound of
MAX_DB items. Since it is not possible to have a variable length
array inside a variable length array, use this knowledge to make the
inner array of fixed length. But since DB values start from zero, an
explicit end entry needs to be present as well, so the inner array has
to be MAX_DB + 1 in size.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Phil Sutter c121111ecb ss: Allow excluding a socket table from being queried
The original problem was that a simple call to 'ss' leads to loading of
sctp_diag kernel module which might not be desired. While searching for
a workaround, it became clear how inconvenient it is to exclude a single
socket table from being queried.

This patch allows to prefix an item passed to '-A' parameter with an
exclamation mark to inverse its meaning.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 17:02:38 -07:00
Roman Mashak d64a22f393 tc: print index, refcnt & bindcnt for nat action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 17:00:32 -07:00
Stephen Hemminger fec62c0ec7 tc: help and whitespace cleanup
Break long lines, and cleanup usage message.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 15:33:13 -07:00
David Ahern 54eae5f76d Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 12:33:02 -07:00
Luca Boccassi ba2fc55b99 Drop capabilities if not running ip exec vrf with libcap
ip vrf exec requires root or CAP_NET_ADMIN, CAP_SYS_ADMIN and
CAP_DAC_OVERRIDE. It is not possible to run unprivileged commands like
ping as non-root or non-cap-enabled due to this requirement.
To allow users and administrators to safely add the required
capabilities to the binary, drop all capabilities on start if not
invoked with "vrf exec".
Update the manpage with the requirements.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 11:48:23 -07:00
Phil Sutter b2038cc0b2 ssfilter: Eliminate shift/reduce conflicts
The problematic bit was the 'expr: expr expr' rule. Fix this by making
'expr' token represent a single filter only and introduce a new token
'exprlist' to represent a combination of filters.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 11:41:08 -07:00
Phil Sutter 8ee38d833c man: tc-vlan.8: Fix for incorrect example
This has to be a second match statement to the same u32 filter, not a
second one (which tc-filter doesn't support at all).

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:13:28 -07:00
Jiri Pirko da7a1aa7da devlink: fix port new monitoring message typo
s/net/new/

Fixes: a3c4b484a1 ("add devlink tool")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:13:09 -07:00
Stefano Brivio 32ea3d54b4 ss: Fix rendering of continuous output (-E, --events)
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).

This happens because I missed this case in 691bd854bf ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.

To fix this, we need to:

- call render() every time we need to display new socket events
  from generic_show_sock(), which is only used to follow events.
  Always call it even if specific socket display functions
  return errors to ensure we clean up buffers

- get the screen width every time we have new events to display,
  thus factor out getting the screen width from main() into a
  function we'll call whenever we calculate columns width

- reset the current field pointer after rendering, more output
  might come after render() is called

Reported-by: Roman Mashak <mrv@mojatatu.com>
Fixes: 691bd854bf ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-27 09:09:38 -07:00
Phil Sutter 79f49f58aa man: ip-route.8: ssthresh parameter is NUMBER
Synopsis section was inconsistent with regards to help text and later
description of ssthresh parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-27 09:07:16 -07:00
Roman Mashak 990b1d90d7 tc: print actual action for connmark action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2018-03-27 09:03:15 -07:00
Stephen Hemminger 00b31a6b2e Merge branch 'revert' 2018-03-27 08:58:36 -07:00
Alexander Zubkov 7696f1097f treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-27 08:58:26 -07:00
Roi Dayan 17504be81d tc: Fix compilation error with old iptables
The compat_rev field does not exists in old versions of iptables.
e.g. iptables 1.4.

Fixes: dd29621578 ("tc: add em_ipt ematch for calling xtables matches from tc matching context")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-27 06:38:52 -07:00
Leon Romanovsky 2c6962cfaf rdma: Move RDMA UAPI header file to be under RDMA responsibility
In iproute2 package, the updates of UAPIs files are performed
after the needed feature lands in kernel's net-next tree.

Such development flow created delays to the rdma tool developers,
who uses rdma-next tree as a basis for their work.

Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
of syncing this file will be on them.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-26 07:02:19 -07:00
Roopa Prabhu b4f84bf8c9 bridge: add option extern_learn to set NTF_EXT_LEARNED on fdb entries
NTF_EXT_LEARNED can be set by a user on bridge fdb entry.
Provide a bridge command option to allow a user to set
NTF_EXT_LEARNED on a bridge fdb entry.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-20 08:24:51 -07:00
Alexander Zubkov b8d2619989 treat "default" and "all"/"any" addresses differenty
Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Zubkov <green@msu.ru>
2018-03-19 09:17:28 -07:00
Roman Mashak bf7d148803 tc: use get_u32() in psample action to match types
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:50 -07:00
Roman Mashak e9fa16583a tc: print actual action for sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:38 -07:00
Toke Høiland-Jørgensen 997f2dc193 tc: Add JSON output of fq_codel stats
Enable proper JSON output support for fq_codel in `tc -s qdisc` output.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:40 -07:00
Toke Høiland-Jørgensen d7d044ff53 tc: Add missing documentation for codel and fq_codel parameters
Add missing documentation of the memory_limit fq_codel parameter and the
ce_threshold codel and fq_codel parameters.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:35 -07:00
Pieter Jansen van Vuuren fb4e6abfca tc: f_flower: Add support for matching first frag packets
Add matching support for distinguishing between first and later fragmented
packets.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags firstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags nofirstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:03:21 -07:00
David Ahern 4de0a06b34 Update kernel headers
Update kernel headers to commit a870a02cc963
("pktgen: use dynamic allocation for debug print buffer")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:59:59 -07:00
David Ahern e9625d6aea Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c

Updated bridge/bridge.c per removal of check_if_color_enabled by commit
1ca4341d2c ("color: disable color when json output is requested")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:48:10 -07:00
Stephen Hemminger 96303c25ee Revert "iproute: "list/flush/save default" selected all of the routes"
This reverts commit 9135c4d603.

Debian maintainer found that basic command:
	# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-12 14:02:36 -07:00
David Ahern a121129df9 Merge branch 'mcast-json' into iproute2-next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <sthemmin@microsoft.com>

Some more JSON support and report better error if kernel
is configured without multicast.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:53:36 -07:00
Stephen Hemminger e06e9a6bac ipmroute: better error message if no kernel mroute
If kernel does not support the IP multicast address family,
then it will report all routes (PF_UNSPEC).
Give the user a better error message and abort the command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:34 -07:00
Stephen Hemminger 0f1475c268 ipmroute: convert to output JSON
Should be no change for non-json case except putting color
on address if desired.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:23 -07:00
Stephen Hemminger 311dca0aa0 ipmaddr: json and color support
Support printing mulitcast addresses in json and color mode.
Output format is unchanged for normal use.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:52:06 -07:00
David Ahern bea42e6c24 Merge branch 'iplink-parse' into iproute2-next
Serhey Popovych  says:

====================

This is main routine to parse ip-link(8) configuration parameters.

Move all code related to command line parsing and validation to it from
iptables_modify(). As benefit we reduce number of arguments as well as
checking for most of weired cases in single place to give benefit to
iptables_parse() users.

See individual patch description message for more information.

v4
  Drop patches intended to reduce number of arguments to
  iptables_parse(): postpone to the series with real use cases.

  Save only ifi_index in iplink_vxcan.c and link_veth.c: no need
  to save whole ifinfomsg data structure.

  Note that there is no sense to introduce custom version of
  iplink_parse() to use in iplink_vxcan.c and link_veth.c because
  there is too much parameters we need to support (except VF and
  few others) making huge code duplication.

v3
  Move vxlan/veth ifinfomsg save/restore to separate patch to
  make clear change that perform most of request buffer setups
  and checks in iplink_parse().

  Update commit message descriptions and extra new line from
  "utils: Introduce and use nodev() helper routine" patch.

v2
  Terminate via exit() when failing to parse command line arguments
  to help identify failing line in batch mode.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-11 18:46:07 -07:00
Serhey Popovych c58213f69c iplink: Perform most of request buffer setups and checks in iplink_parse()
To benefit other users (e.g. link_veth.c) of iplink_parse() from
additional attribute checks and setups made in iplink_modify(). This
catches most of weired cobination of parameters to peer device
configuration.

Drop @name, @dev, @link, @group and @index from iplink_parse() parameters
list: they are not needed outside.

While there change return -1 to exit(-1) for group parsing errors: we
want to stop further command processing unless -force option is given
to get error line easily.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:59:03 -07:00
Serhey Popovych b06a29603a iplink: Follow documented behaviour when "index" is given
Both ip-link(8) and error message when "index" parameter is given for
set/delete case says that index can only be given during network
device creation.

Follow this documented behaviour and get rid of ambiguous behaviour in
case of both "dev" and "index" specified for ip link delete scenario
(actually "index" being ignored in favor to "dev").

Prohibit "index" when configuring/deleting group of network devices.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:56 -07:00
Serhey Popovych a24315ba46 iplink: Use "dev" and "name" parameters interchangeable when possible
Both of them accept network device name as argument, but have different
meaning:

  dev  - is a device by it's name,
  name - name for specific device.

The only case where they treated separately is network device rename
case where need to specify both ifindex and new name. In rest of the
cases we can assume that dev == name.

With this change we do following:

  1) Kill ambiguity with both "dev" and "name" parameters given the same
     name:

       ip link {add|set} dev veth100a name veth100a ...

  2) Make sure we do not accept "name" more than once.

  3) For VF and XDP treat "name" as "dev". Fail in case of "dev" is
     given after VF and/or XDP parsing.

  4) Make veth and vxcan to accept both "name" and "dev" as their peer
     parameters, effectively following general ip-link(8) utility
     behaviour on link create:

       ip link add {name|dev} veth1a type veth peer {name|dev} veth1b

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:51 -07:00
Serhey Popovych fe99adbca4 utils: Introduce and use nodev() helper routine
There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.

Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.

Use -nodev() in traffic control (tc) code to return 1.

Simplify expression for checking for argument being 0/NULL in @if
statement.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:36 -07:00
Tariq Toukan 527f85141c ip-address: Fix negative prints of large TX rate limits
TX rate limit fields are unsigned (__u32).
Use %u and print_uint when printing.

Tested:
$ ip link set ens1 vf 1 rate 2294967296
$ ip link show |grep -iE "vf 1" | grep rate

before:
vf 1 MAC 00:00:00:00:00:00, tx rate -2000000000 (Mbps), max_tx_rate -2000000000Mbps, ...

after:
vf 1 MAC 00:00:00:00:00:00, tx rate 2294967296 (Mbps), max_tx_rate 2294967296Mbps, ...

Fixes: 3fd8663087 ("iproute2: rework SR-IOV VF support")
Fixes: 8c29ae7cc2 ("ip link: Fix crash on older kernels when show VF dev")
Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Fixes: ae7229d5f9 ("ip: Add support for setting and showing SR-IOV virtual funtion link params")
Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2018-03-10 09:00:27 -08:00
Roopa Prabhu f686f76468 iprule: support for ip_proto, sport and dport match options
add support to match on ip_proto, sport and dport ranges.
For ip_proto, this patch currently enumerates, tcp, udp and sctp.
This list can be extended in the future.

example:
$ip rule add sport 666-777 dport 999 ip_proto tcp table 100
$ip rule show
0:      from all lookup local
32765:  from all ip_proto 6 sport 666-777 dport 999 lookup 100
32766:  from all lookup main
32767:  from all lookup default

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 10:08:18 -08:00
Stephen Hemminger e93d922123 netns: add JSON support
Basic support for JSON output when showing network namespaces.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:53:11 -08:00
David Ahern 8c278ecad0 Update kernel headers to 4.16.0-rc4+
Update kernel headers to commit 08a24239cd46
("Merge branch 'hns3-next'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:34:05 -08:00
Leon Romanovsky f2ffa0a0ff rdma: Update device capabilities flags
In kernel commit e1d2e8873369 ("IB/core: Add PCI write
end padding flags for WQ and QP"), we introduced new
device capability to advertise PCI write end padding.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

This commit updates RDMAtool to present this field.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-08 09:15:28 -08:00
Roman Mashak b80c9af8a4 tc: updated tc-bpf man page
Added description of direct-action parameter.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-07 14:55:08 -08:00
David Ahern 8966c2490f Merge branch 'macsec-json' into iproute2-next
Stephen Hemminger  says:

====================

The macsec code didn't really support JSON and had several
pieces of copy/pasted code.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:43:29 -08:00
Stephen Hemminger c0b904de62 macsec: support JSON
The JSON support in macsec code was mostly missing and what was
there was broken. This uses new json_print utilities to complete
output.

Compile tested only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:43 -08:00
Stephen Hemminger d341863839 ipmacsec: collapse common code
Several places copy/paste same code for printing array of statistics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:39 -08:00
Stephen Hemminger c2f260f4eb ip: macsec cleanup
Break long lines and use const as recommended by checkpatch.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-07 08:41:23 -08:00
David Ahern 65745eae83 Merge branch 'more-json' into iproute2-next
Stephen Hemminger says:

====================

The ip command implementation of JSON was very spotty. Only address
and link were originally implemented. After doing route for next,
went ahead and implemented it for a bunch of the other sub commands.

Hopefully will reach full coverage soon.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:48:22 -08:00
Stephen Hemminger 41b99db1c6 fou: support JSON output
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:34 -08:00
Stephen Hemminger 5c92c2eee5 fou: break long lines
Split up long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:30 -08:00
Stephen Hemminger 689bef5dc9 tuntap: support JSON output
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:25 -08:00
Stephen Hemminger b62ec792a9 token: support JSON
Add JSON output to ip token command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:19 -08:00
Stephen Hemminger 111f79ad38 ipsr: add json support
Add json flag to ip sr command outputs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:14 -08:00
Stephen Hemminger 74498126fd tcp_metrics: add json support
Add JSON support to the ip tcp_metrics output.

$ ip -j -p tcp_metrics show
[ {
        "dst": "192.18.1.11",
        "age": 23617.8,
        "ssthresh": 7,
        "cwnd": 3,
        "rtt": 0.039176,
        "rttvar": 0.039176,
        "source": "192.18.1.2"
    }
...

The JSON output does scale values differently since there is no good
way to indicate units. The rtt values are displayed in seconds in
JSON and microseconds in the original (non JSON) mode. In the example
above the output in without the -j flag, the output would be
 ... rtt 39176us rttvar 39176us

I did this since all the other values in the JSON record are also in
floating point seconds.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:07 -08:00
Stephen Hemminger 8a61d8968c tcp_metrics; make tables const
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:39:02 -08:00
Stephen Hemminger 96032aaf7d ipnetconf: add JSON support
Basic JSON support for ip netconf command.
Also cleanup some checkpatch warnings about long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:57 -08:00
Stephen Hemminger 3c1e087b05 ipntable: add json support
Add JSON (and limited color) to ip neighbor table parameter output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:50 -08:00
Stephen Hemminger 0dd4ccc56c iprule: add json support
More JSON and colorizing.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:44 -08:00
Stephen Hemminger a7ad1c8a68 ipaddrlabel: add json support
Add missing json and color support to addrlabel display

Example:
$ ip -j -p addrlabel
[ {
        "address": "::1",
        "prefixlen": 128,
        "label": 56
    },{
        "address": "::",
        "prefixlen": 96,
        "label": 56
    },{
...

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:41 -08:00
Stephen Hemminger aac7f725fa ipneigh: add color and json support
Use json_print to provide json (and color) support to
ip neigh command.

Example:
$ ip -j -p neigh
[ {
        "dst": "192.168.1.29",
        "dev": "enp12s0",
        "state": [ "FAILED" ]
    },{
        "dst": "192.168.1.130",
        "dev": "enp12s0",
        "state": [ "FAILED" ]
    },{
        "dst": "192.168.1.131",
        "dev": "enp12s0",
        "lladdr": "00:15:5d:2a:16:4f",
        "state": [ "STALE" ]
    }
...

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-06 15:38:18 -08:00
Stephen Hemminger d9d8c8393e json_writer: add SPDX Identifier (GPL-2/BSD-2)
I wrote this code so put SPDX License on it and intentionally
allow use in BSD code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-06 14:39:19 -08:00
Roman Mashak 9426673910 tc: added tc monitor description in man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-05 15:02:12 -08:00
Davide Caratti 75ef7b18d2 tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:

 Test a6d6: Add skbedit action with index
 Test 38f3: Delete skbedit action
 Test a568: Add action with ife type
 Test b983: Add action without ife type
 Test 7d50: Add skbmod action to set destination mac
 Test 9b29: Add skbmod action to set source mac
 Test e93a: Delete an skbmod action

Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:

 Test 8dd5: Add police action with control ok

tested with:
 # ./tdc.py

test results:
 all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
 action from a list" (which is failing also before this commit)

Fixes: 3572e01a09 ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()")
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:38 -08:00
Jean-Philippe Brucker eb8559eff1 ss: fix NULL dereference when rendering without header
When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.

The bug can be triggered with: ss -H sport = 100000

Signed-off-by: Jean-Philippe Brucker <jphilippe.brucker@gmail.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:31 -08:00
David Ahern 3dec72672f libnetlink: __rtnl_talk_iov should only loop max iovlen times
William reported ip hanging and bisected to a recent commit for batching
allowing more than 1 command to be sent per message. The loop over
recvmsg should never cycle more than iovlen times -- 1 response for
each command in the message.

Fixes: 72a2ff3916 ("lib/libnetlink: Add a new function rtnl_talk_iov")
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-02 13:30:34 -08:00
Phil Sutter 06867c3719 ip-link: Fix use after free in nl_get_ll_addr_len()
Immediately after freeing the buffer returned from rtnl_talk(), it is
accessed again via pointer in struct rtattr array. This leads to some
builds not allowing to set an interface's MAC address because the
expected length value is garbage.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-03-02 13:29:40 -08:00
Joe Stringer a0405444f7 bpf: Print section name when hitting non ld64 issue
It's useful to be able to tell which section is being processed in the
ELF when this error is triggered, so print that detail.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-02 13:28:53 -08:00
David Ahern 62964f1a95 Merge branch 'ip-rule-proto' into iproute2-next
Donald Sharp  says:

====================

Fix iprule.c to use the actual `struct fib_rule_hdr` and to
allow the end user to see and use the protocol keyword
for rule manipulation.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:45:56 -08:00
Donald Sharp 33f1e250ec ip: Allow rules to accept a specified protocol
Allow the specification of a protocol when the user
adds/modifies/deletes a rule.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:37 -08:00
Donald Sharp 7c083da77c ip: Display ip rule protocol used
Modify 'ip rule' command to notice when the kernel passes
to us the originating protocol.

Add code to allow the `ip rule flush protocol XXX`
command to be accepted and properly handled.

Modify the documentation to reflect these code changes.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:29 -08:00
Donald Sharp 5baaf07cb3 ip: Use the `struct fib_rule_hdr` for rules
The iprule.c code was using `struct rtmsg` as the data
type to pass into the kernel for the netlink message.
While 'struct rtmsg' and `struct fib_rule_hdr` are
the same size and mostly the same, we should use
the correct data structure.  This commit translates
the data structures to have iprule.c use the correct
one.

Additionally copy over the modified fib_rules.h file

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-28 19:32:21 -08:00
Arkadi Sharshevsky f85adc61dd devlink: Fix error reporting
The current code doesn't set errno in case of extended ack.

Fixes: 049c58539f ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-28 16:10:32 -08:00
David Ahern 7c6e942e84 Merge branch 'tc-ipt-ematch' into iproute2-next
Eyal Birger  says:

====================

This patchset extends tc to support the ipt ematch.

The first patch adds the ability for ematch cmdline parsers
to receive argc,argv parameters.
The second patch adds the em_ipt module.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:44:33 -08:00
Eyal Birger dd29621578 tc: add em_ipt ematch for calling xtables matches from tc matching context
The commit calls a new tc ematch for using netfilter xtable matches.

This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.

Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).

The matcher uses libxtables for parsing the input parameters.

Example use for matching an IPSec state with reqid 1:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
    basic match 'ipt(-m policy --dir in --pol ipsec --reqid 1)' \
    action drop

This is the user-space counter part of kernel commit ccc007e4a746
("net: sched: add em_ipt ematch for calling xtables matches")

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:16 -08:00
Eyal Birger 526862038e tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
ematche uses YACC to parse ematch arguments and places them in struct bstr
linked lists.

It is useful to be able to receive parameters as argc,argv in order to use
getopt (and alike) argument parsers.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:06 -08:00
David Ahern cb4ade6e38 Import tc_em_ipt.h from kernel at commit 08009a760213
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:42:23 -08:00
David Ahern 02ffee14ae Update kernel headers to 08009a760213
Update kernel headers to commit 08009a760213
("net: make kmem caches as __ro_after_init")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-26 13:24:38 -08:00
Sabrina Dubroca 7ba0a77b7e ip link: add json support for tun attributes
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 118eda77d6 ("ip link: add support to display extended tun attributes")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-26 09:28:16 -08:00
Petr Machata f798a8ab52 ip: link_gre6.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
For IP-in-IP tunnels, one can specify the [no]allow-localremote command
when configuring a device. Under the hood, this flips the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on the netdevice. However, ip6gretap
and ip6erspan devices, where the flag is also relevant, are not IP-in-IP
tunnels, and thus there's no way to configure the flag on these
netdevices. Therefore introduce the command to link_gre6 as well.

The original support was introduced in commit 21440d19d9
("ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag")

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-25 19:45:39 -08:00
Donald Sharp 728eb8d00b ip: Properly display AF_BRIDGE address information for neighbor events
The vxlan driver when a neighbor add/delete event occurs sends
NDA_DST filled with a union:

union vxlan_addr {
	struct sockaddr_in sin;
	struct sockaddr_in6 sin6;
	struct sockaddr sa;
};

This eventually calls rt_addr_n2a_r which had no handler for the
AF_BRIDGE family and "???" was being printed.

Add code to properly display this data when requested.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 11:27:09 -08:00
Leon Romanovsky 4ac152d003 rdma: Avoid memory leak for skipper resource
The call to get_task_name() allocates memory which is not freed
in case of skipping the object.

Fixes: 8ecac46a60 ("rdma: Add QP resource tracking information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:42:28 -08:00
Arkadi Sharshevsky 58b48c5d75 devlink: Update man pages and add resource man
Add resource man, and update dev manual for reload command.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky ead180274c devlink: Add support for resource/dpipe relation
Dpipe - Each dpipe table can have one resource which is mapped to it.
The resource is presented via its full path. Furthermore, the number
of units consumed by single table entry is presented.

Resource - Each resource presents the dpipe tables that use it.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 06a2cda9b0 devlink: Move dpipe context from heap to stack
Move dpipe context to stack instead of dynamically.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 06dd94f952 devlink: Add support for hot reload
Add support for hot reload. It should be used in order for resource
updates to take place.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 8cd6440958 devlink: Add support for devlink resource abstraction
Add support for devlink resource abstraction. The resources are
represented by a tree based structure and are identified by a name and
a size. Some resources can present their real time occupancy.

First the resources exposed by the driver can be observed, for example:

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry
    resources:
      name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
      name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
      name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

Some resource's size can be changed. Examples:

$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368

The changes do not apply immediately, this can be validate by the 'size_new'
attribute, which represents the pending changed size. For example

$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit entry size_valid false
  resources:
    name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
    name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
    name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128

In case of a pending change the nested resources present an indication
for a valid configuration of its children (sum of its children sizes
doesn't exceed the parent's size).

In order for the changes to take place hot reload is needed. The hot
reload through devlink will be introduced in the following patch.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 049c58539f devlink: mnlg: Add support for extended ack
Add support for extended ack.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Arkadi Sharshevsky 844646a528 devlink: Change empty line indication with indentations
Currently multi-line objects are separated by new-lines. This patch
changes this behavior by using indentations for separation.

Signed-off-by: Arkadi Sharhsevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:36:05 -08:00
Masatake YAMATO 97352f1b33 ss: prepare rth when killing inet sock
kill_inet_sock() expects rhn_handle instance is passed
via inet_diag_arg argument. However on the following calling path:

    generic_show_sock
    => show_one_inet_sock
       => kill_inet_sock

rth field of inet_diag_arg is not filled with the address of
rhn_handle instance. As the result ss crashes.

This commit fills the field with newly created rhn_handle
instance.

Changes in v2:
Instead of creating rtn_handle instances for each socket, create
one in upper layer and reuse it.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:32:39 -08:00
Quentin Monnet a883dd8b06 README: re-add updated information link
The "Information" link was removed from README file in commit
d7843207e6 ("README: update location of git repositories, remove
broken info link"), because it redirected to a page that no longer
existed on the Linux Foundation wiki.

This page has just been restored, so we can add the link back again.
Since the previous link was a redirection, use the updated link instead.

Thanks to Luca Boccassi for investigating this issue, restoring and
updating the page.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-02-23 08:19:38 -08:00
Vincent Bernat 1ca4341d2c color: disable color when json output is requested
Instead of declaring -color and -json exclusive, ignore -color when
-json is provided. The rationale is to allow to put -color in an alias
for ip while still being able to use -json. -color is merely a
presentation suggestion and we can assume there is nothing to color in
the JSON output.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:33 -08:00
Adam Vyskovsky 2fb854d07c tc: fix an off-by-one error while printing tc actions
The tc_print_action() function did not print all tc actions
when e.g. TCA_ACT_MAX_PRIO actions were defined for a single
tc filter.

Signed-off-by: Adam Vyskovsky <adamvyskovsky@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:29 -08:00
Timothy Redaelli 7bdd623948 bridge: Prevent a double space in bridge mdb show
Prevent a double space in "bridge mdb show" when the MDB entry is not
marked as "offload".

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:18 -08:00
Lubomir Rintel 8f0807023d lib/namespace: don't try to mount rw /sys over a ro one
It will fail with EPERM on Linux 4.15.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:06 -08:00
Roopa Prabhu 430e05d33f ss: print skmeminfo for packet sockets
before:
$ss --packet -p -m
p_raw    0          0                            *:eth0
          users:(("lldpd",pid=2240,fd=11))

after:
$ss --packet -p -m
p_raw    0          0                            *:eth0
          users:(("lldpd",pid=2240,fd=11))
          skmem:(r0,rb266240,t0,tb266240,f0,w0,o320,bl0,d0)

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:45:27 -08:00
Leon Romanovsky 486fe5f03c rdma: Add batch command support
Implement an option (-b) to execute RDMAtool commands
from supplied file. This follows the same model as
in use for ip and devlink tools, by expecting
every new command to be on new line.

These commands are expected to be without any -*
(e.g. -d, -j, e.t.c) global flags, which should be
called externally.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:44:46 -08:00
David Ahern 472e59b0eb Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-22 14:43:33 -08:00
Stephen Hemminger 2d165c0811 tc: implement color output
Implement the -color option; in this case -co is ambiguous
since it was already used for -conf.
For now this just means putting device name in color.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 09:12:28 -08:00
David Ahern 14f2124a34 Merge branch 'bridge-color-json' into next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <sthemmin@microsoft.com>

This set of patches adds color and full JSON support to bridge command.

The output format for bridge link command changes so that
  $ bridge link show
and
  $ ip link show
use same basic format.

The "-c" flag to bridge changes from shortened form of "-compressvlan"
to shortened form of "-color".  Once again this is so that ip
and bridge command take similar options.

Lastly the JSON output format changes slightly but this
could not impact any real user, because in several cases
the current format was invalid JSON!

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:47:03 -08:00
Stephen Hemminger 4328b687b4 ip: always print interface name in color
Even in brief mode the interface name should be printed
in color if desired. This makes output consistent across
regular and brief mode.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:42:04 -08:00
Stephen Hemminger 3a1ca9a5b6 bridge: update man page for new color and json changes
Document color option, and no longer have restriction on json

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:42:02 -08:00
Stephen Hemminger f32e4977dc bridge: add json support for link command
Add json output for bridge link show command and reuse code
from ip command to display interface information.

This also changes the output format slightly for the non JSON case so
that it has same format as the ip link show command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:38 -08:00
Stephen Hemminger c7c1a1ef51 bridge: colorize output and use JSON print library
Use new functions from json_print to simplify code.
Provide standard flag for colorizing output.

The shortened -c flag is ambiguous it could mean color or
compressvlan; it is now changed to mean color for consistency
with other iproute2 commands.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:31 -08:00
Stephen Hemminger 01842eb581 bridge: implement json pretty print flag
Make bridge work like other iproute2 commands and accept
same json and pretty flags.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 08:41:28 -08:00
Stephen Hemminger 6bfa7a6b0e ip: remove dead code
Remove long dead code (in #if 0) from original iproute2
for numeric names.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-20 16:01:46 -08:00
Stephen Hemminger b68b361b4b ip: don't colorize the master device
Putting whole string "master eth0" in the interface name color
is wrong and confusing. Let's just turn color off for all attributes
of device.

Fixes: d92cc2d087 ("ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-20 12:16:42 -08:00
Stephen Hemminger a8beadb5f6 uapi: update if_ether compat headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-20 10:48:32 -08:00
Sabrina Dubroca 118eda77d6 ip link: add support to display extended tun attributes
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-20 07:19:22 -08:00
David Ahern 07ed8df604 Update kernel headers to 4.16.0-rc2+
Update kernel headers to commit f5c0c6f4299f
("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-19 20:06:04 -08:00
David Ahern 34894a7b96 Merge branch 'print_linkinfo_brief' into next
Serhey Popovych  says:

====================

With this series I propose to make print_linkinfo_brief() static in
favor of print_linkinfo() as single point for linkinfo printing.

Changes presented with this series tested using following script:

\#!/bin/bash

iproute2_dir="$1"
iface='eth0.2'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
	DIR="/tmp/$i"
	mkdir -p "$DIR"

	ln -snf ip.$i ip/ip

	# normal
	ip/ip link show                  >"$DIR/ip-link-show"
	ip/ip -4 addr show               >"$DIR/ip-4-addr-show"
	ip/ip -6 addr show               >"$DIR/ip-6-addr-show"
	ip/ip addr show dev "$iface"     >"$DIR/ip-addr-show-$iface"

	# brief
	ip/ip -br link show              >"$DIR/ip-br-link-show"
	ip/ip -br -4 addr show           >"$DIR/ip-br-4-addr-show"
	ip/ip -br -6 addr show           >"$DIR/ip-br-6-addr-show"
	ip/ip -br addr show dev "$iface" >"$DIR/ip-br-addr-show-$iface"
done
rm -f ip/ip

diff -urN /tmp/{old,new} |sed -n -Ee'/^(-{3}|\+{3})[[:space:]]+/!p'
rc=$?

popd &>/dev/null
exit $rc

Expected results : <no output>
Actual results   : <no output>

Although test coverage is far from ideal in my opinion it covers most
important aspects of the changes presented by the series.

All this work is done in prepare of iplink_get() enhancements to support
attribute parse that finally will be used to simplify ip/tunnel
RTM_GETLINK code.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:49 -08:00
Serhey Popovych c956e9a934 ipaddress: Make print_linkinfo_brief() static
It shares lot of code with print_linkinfo(): drop duplicated part,
change parameters list, make it static and call from print_linkinfo()
after common path.

While there move SPRINT_BUF() to the function scope from blocks to
avoid duplication and use "%s" to print "\n" to help compiler optimize
exit for both print_linkinfo_brief() and normal paths.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:25 -08:00
Serhey Popovych f5b50a18ae utils: Introduce and use print_name_and_link() to print name@link
There is at least three places implementing same things: two in
ipaddress.c print_linkinfo() & print_linkinfo_brief() and one in
bridge/link.c.

They are diverge from each other very little: bridge/link.c does not
support JSON output at the moment and print_linkinfo_brief() does not
handle IFLA_LINK_NETNS case.

Introduce and use print_name_and_link() routine to handle name@link
output in all possible variations; respect IFLA_LINK_NETNS attribute to
handle case when link is in different namespace; use ll_idx_n2a() for
interface name instead of "<nil>" to share logic with other code (e.g.
ll_name_to_index() and ll_index_to_name()) supporting such template.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:22 -08:00
Serhey Popovych fcac966526 utils: Introduce and use get_ifname_rta()
Be consistent in handling of IFLA_IFNAME attribute in all places: if
there is no attribute report bug to stderr and use ll_idx_n2a() as
last measure to get name in "if%u" format instead of "<nil>".

Use check_ifname() to validate network device name: this catches both
unexpected return from kernel and ll_idx_n2a().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:20 -08:00
Serhey Popovych 0cec58dac4 lib: Correct object file dependencies
Neither internal libnetlink nor libgenl depends on ll_map.o: prepare for
upcoming changes that brings much more cleaner dependency between
utils.o and ll_map.o.

However ll_map.o depends on libnetlink.o functions so we need to provide
libnetlink.a after libutil.a in LIBNETLINK at global Makefile.

Tested using make clean && make -j4. No problems so far.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:18 -08:00
Serhey Popovych 1bccd1e43b ipaddress: Simplify print_linkinfo_brief() and it's usage
Simplify calling code in ipaddr_list_flush_or_save() by introducing
intermediate variable of @struct nlmsghdr, drop duplicated code:
print_linkinfo_brief() never returns values other than <= 0 so we can
move print_selected_addrinfo() outside of each block.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:16 -08:00
Serhey Popovych 9516823051 ipaddress: Improve print_linkinfo()
There are few places to improve:

  1) return -1 when entry is filtered instead of zero, which means
     accept entry: ipaddress_list_flush_or_save() the only user of this

  2) use ll_idx_n2a() as last resort to translate name to index for
     "should never happen" cases when cache shouldn't be considered

  3) replace open coded access to IFLA_IFNAME attribute data by
     RTA_DATA() with rta_getattr_str()

  4) simplify ifname printing since name is never NULL, thanks to (2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:15 -08:00
Serhey Popovych fe269b6e7c utils: Reimplement ll_idx_n2a() and introduce ll_idx_a2n()
Now all users of ll_idx_n2a() replaced with ll_index_to_name() we can
move it's functionality to ll_index_to_name() and implement index to
name conversion using snprintf() and "if%u".

Use %u specifier in "if%..." template consistently: network device
indexes are always greather than zero.

Also introduce ll_idx_n2a() conterpart: ll_idx_a2n() that is used
to translate name of the "if%u" form to index using sscanf().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:13 -08:00
Serhey Popovych d92cc2d087 ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()
There is no reentrancy as well as deferred result usage for all cases
where ll_idx_n2a() being used: it is safe to use ll_index_to_name() that
internally calls ll_idx_n2a() with static buffer to hold result.

While there print master network device name using correct color.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:11 -08:00
Serhey Popovych 17df3d607d ipaddress: Abstract IFA_LABEL matching code
There at least two places in ip/ipaddress.c where we match IFA_LABEL
against filter.label if that is given.

Get rid of "common" if () statement for inet_addr_match_rta() and
ifa_label_match_rta(): it is not common because first will check for
filter.pfx.family != AF_UNSPEC inside and second for filter.label being
non NULL.

This allows us to further simplify down code and prepare for
ll_idx_n2a() replacement with ll_index_to_name() without 80 columns
checkpatch notice.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:14:09 -08:00
Serhey Popovych 5433656705 ip: Use single variable to represent -pretty
After commit a233caa0aa ("json: make pretty printing optional") I get
following build failure:

    LINK     rtmon
    ../lib/libutil.a(json_print.o): In function `new_json_obj':
    json_print.c:(.text+0x35): undefined reference to `show_pretty'
    collect2: error: ld returned 1 exit status
    make[1]: *** [rtmon] Error 1
    make: *** [all] Error 2

It is caused by missing show_pretty variable in rtmon.

On the other hand tc/tc.c there are two distinct variables and single
matches() call that handles -pretty option thus setting show_pretty
will never happen. Note that since commit 44dcfe8201 ("Change
formatting of u32 back to default") show_pretty is used in tc/f_u32.c
so this is first place where -pretty introduced.

Furthermore other utilities like misc/ifstat.c and misc/nstat.c define
pretty variable, however only for their own purposes. They both support
JSON output and thus depend show_pretty in new_json_obj().

Assuming above use common variable to represent -pretty option, define
it in utils.c and declare in utils.h that is commonly used. Replace
show_pretty with pretty.

Fixes: a233caa0aa ("json: make pretty printing optional")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:13:36 -08:00
David Ahern 3ce21b2d84 Merge branch 'unify-tunnel-endpoint-parsing' into next
Serhey Popovych  says:

====================

Use get_addr_rta() helper to unify address retriveal from netlink
message when configuring tunnel and get_addr() to parse endpoint
address into @inet_prefix.

This is next step towards ip and ipv6 tunnel module merge: endpoint
address parsing code will differ only in @family constant being
passed to get_addr_rta() and get_addr().

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:02:12 -08:00
Serhey Popovych 7bda9fd37a iptnl/ip6tnl: Unify local/remote endpoint and 6rd address parsing
We are going to merge link_iptnl.c and link_ip6tnl.c and this is final
step to make their diffs clear and show what needs to be changed during
merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Make sure we initialize ip6rdprefix and ip6rdrelayprefix bitlen in
link_iptnl.c only when configuring existing tunnel: if kernel does not
submit prefixlen in corresponding attributes preceeding get_addr_rta()
will set bitlen to -1 which is incorrect value.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:01:07 -08:00
Serhey Popovych a066cc6623 gre/gre6: Unify local/remote endpoint address parsing
We are going to merge link_gre.c and link_gre6.c and this is final step
to make their diffs clear and show what needs to be changed during merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:01:03 -08:00
Serhey Popovych 3d4e0db65c vti/vti6: Unify local/remote endpoint address parsing
We are going to merge link_vti.c and link_vti6.c and this is final step
to make their diffs clear and show what needs to be changed during merge.

Note that it is safe to omit endpoint address(es) from netlink create
request as kernel is aware of such case and will use zero for that
endpoint(s).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:00:46 -08:00
Serhey Popovych 9cc173d485 utils: Introduce and use inet_prefix_reset()
Initializing @inet_prefix using C initializers or memset() seems
inefficient and unnecessary: only small part of ->data[] field will be
used to store address corresponding to ->family.

Instead initialize ->flags with zero and assume no other fields accessed
before checking corresponding bits in ->flags. For example special
helpers (e.g. is_addrtype_*()) can be used to ensure that @inet_prefix
contains valid ip or ipv6 address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-14 09:00:26 -08:00
Phil Sutter 8a237420f2 Remove leftovers from removed Latex documentation
Since there is no documentation in Latex format left, there is no need
to check for commands to build it. Also there is no need to ignore any
of the temporary files which were created by them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-02-13 16:43:19 -08:00
Quentin Monnet d7843207e6 README: update location of git repositories, remove broken info link
Reflect the recent change of location for the git repositories, and the
creation of the -next development repo, in README and README.devel.

Also remove the link to the Linux Foundation wiki that contained
information about iproute2. The link is now broken, I did not find any
alternative page to point to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
2018-02-13 16:42:51 -08:00
Stephen Hemminger 766fa4ac33 include: update rdma header from 4.16-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-13 16:42:00 -08:00
David Ahern 5a809da4f5 Merge branch 'iproute-json-color' into next
Stephen Hemminger  says:

====================

From: Stephen Hemminger <stephen@networkplumber.org>

This set of patches adds JSON output to route printing.
Tested for the simple cases, but there are many variations and there
such as lw tunnels which have not be tested.

The color formatting may need some additional tweaks. It looks
like for some tags the tag is also showing up in color.
This should be fixed in print_color_string rather than having
to do special case handling in so many places.

This patchset also changes the default JSON output to be compressed
(since the purpose of JSON is to make output machine readable);
but do optional pretty print formatting with -p flag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:24:57 -08:00
Stephen Hemminger 663c3cb231 iproute: implement JSON and color output
Add JSON and color output formatting to ip route command.
Similar to existing address and link output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:20:39 -08:00
Stephen Hemminger 6cbd9465bc json: fix newline at end of array
The json print library was toggling pretty print at the end of
an array to workaround a bug in underlying json_writer.
Instead, just fix json_writer to pretty print array correctly.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:18:49 -08:00
Stephen Hemminger bff0f25241 man: add documentation for json and pretty flags
Add description for -json and -pretty options.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:16:14 -08:00
Stephen Hemminger a233caa0aa json: make pretty printing optional
Since JSON is intended for programmatic consumption, it makes
sense for the default output format to be concise as possible.

For programmer and other uses, it is helpful to keep the pretty
whitespace format; therefore enable it with -p flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:15:08 -08:00
Serhey Popovych db2b8b6ef0 ip: Use print_0xhex() where appropriate
In gre/gre6 for non-JSON output 0x%x format is used: use print_0xhex()
to get the same value for JSON.

Get rid of custom _print_hex() in bridge slave code: print_0xhex() can
be used perfectly.

Break long print_uint() with long argument list to fit into 80 columns.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:05:30 -08:00
Serhey Popovych 5bd937957d ip/tunnel: Minor cleanups
Few minor changes to reduce diffs between ip and ipv6 tunnel code:

  1) reduce intendation by one level when adding attributes in gre and
     gre6; reorder addattr*() calls to simplify diff

  2) reorder local variables definition; change their type (e.g. for
     IFLA_LINK) to match ones returned by rta_getattr_*()

  3) move "mode" parameter parsing in link_iptnl.c to the similar
     position as in link_ip6tnl.c

  4) handle "tc" as shortcut for "tclass"/"tos" in link_iptnl.c

  5) add whitespace where required

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:05:18 -08:00
David Ahern 8e7548462e Merge branch 'unify_tunnel_help' into next
Serhey Popovych  says:

====================

To show only relevant diffs of ip and ipv6 variants help message print
routines needs to be unified and improved.

Get rid of print_usage() and usage() wrappers: use single function to
output help message. As side effect we return -1 from parse function
instead of calling exit(2) in case of "... tunnel <help|garbage>" is
found.

Additionally we get pointer to @struct link_util and can directly access
->id information to prepare customized help message.

Split calls to fprintf() two group: one that contains format string with
specifiers (thus requiring parameters) and another one that does not.
This helps compiler to optimize calls to fprintf() with fputs() when no
format specifiers in string. Do not use fputs() directly to keep code
formatting nice.

After this series applied following diffs:

  # diff -urN ip/link_gre{,6}.c
  # diff -urN ip/link_vti{,6}.c
  # diff -urN ip/link_ip{,6}tnl.c

in scope of help print routines reduced to necessary minimum.

Tested minimally by compiling and executing "ip link help <kind>" and
"ip link add type help" commands. Looks correct.

See individual patch description for more information.

Reviews, commands and suggestions are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:39 -08:00
Serhey Popovych 06e3975f4c iptnl/ip6tnl: Unify iptunnel_print_help()
Reduce diff lines between iptnl and ip6tnl help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to iptunnel_print_help().

Get rid of custom print_usage() and usage() functions and use
iptunnel_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:23 -08:00
Serhey Popovych ae91205c4d gre/gre6: Unify gre_print_help()
Reduce diff lines between gre and gre6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to gre_print_help().

Get rid of custom print_usage() and usage() functions and use
gre_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:19 -08:00
Serhey Popovych 4aa552eac1 vti/vti6: Unify vti_print_help()
Reduce diff lines between vti and vti6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to vti_print_help().

Get rid of custom print_usage() and usage() functions and use
vti_print_help() directly, return from function on "... type
<help|garbage>" instead of exit(2).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-09 08:04:05 -08:00
Christian Brauner 375d51caaa netns: allow negative nsid
If the kernel receives a negative nsid it will automatically assign
the next available nsid. In this case alloc_netid() will set min and
max to 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret
this as the maximum range, i.e. specific to nsids it will try to find
an id in the range [0,INT_MAX). This is intentionally supported in the
kernel for nsids.

Commit acbe9118ce ("ip netns: use strtol() instead of atoi()")
regressed ip netns in that respect although previously the use-case
was either accidentally supported or opaquely supported such that it
triggered the original commit. From what I can gather it went as
follows before: atoi() was called with a string indicating a negative
value which caused it to return -1 which was passed to the
kernel. Let's make it less opaque by introducing the keyword "auto":

ip netns set <netns-name> auto

will cause nsid to be set to -1 and the kernel will select an available
nsid.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-08 07:57:34 -08:00
Stephen Hemminger 432e5f97be iproute: make flush a separate function
Minor refactoring to move flush into separate function to improve
readability and reduce depth of nesting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:23 -08:00
Stephen Hemminger cc5608250a iproute: don't do assignment in condition
Fix checkpatch complaints about assignment in conditions.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:19 -08:00
Stephen Hemminger 80d1c528b9 iproute: whitespace fixes
Add whitespace around operators for consistency.
Use tabs for indentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:28:11 -08:00
David Ahern 58cf7b6759 Merge branch 'dev_walk' into iproute2-next
Serhey Popovych  says:

====================

In this seris I replace /proc/net/dev and /sys/class/net usage for walk
through network device list in iptunnel/ip6tunnel and iptuntap with
netlink dump.

Following changed since RFC was sent:

  1) Treat @struct rtnl_link_stats and @struct rtnl_link_stats64 as
     array with __u32 and __u64 elements respectively in
     copy_rtnl_link_stats64() as suggested by Stephen Hemminger.

  2) Remove @name and @size parameters from @struct tnl_print_nlmsg_info
     since we can get them easily from other data.

Testing.
========

Following script is used to ensure I didn't broke things too much:

\#!/bin/bash

iproute2_dir="$1"
iface='gre1'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
	DIR="/tmp/$i"
	mkdir -p "$DIR"

	ln -snf ip.$i ip/ip

	for o in '' -s -d; do
		ip/ip $o tunnel show           >"$DIR/ip${o}-tunnel-show"
		ip/ip -4 $o tunnel show        >"$DIR/ip-4${o}-tunnel-show"
		ip/ip -6 $o tunnel show        >"$DIR/ip-6${o}-tunnel-show"
		ip/ip $o tunnel show dev "$iface" \
			>"$DIR/ip${o}-tunnel-show-$iface"
		ip/ip $o tuntap show           >"$DIR/ip${o}-tuntap-show"
	done
done
rm -f ip/ip

diff -urN /tmp/{old,new} |sed -n -Ee'/^(-{3}|\+{3})[[:space:]]+/!p'
rc=$?

popd &>/dev/null
exit $rc

Results:
========

...
fopen /sys/class/net/ipip1/tun_flags: No such file or directory
fopen /sys/class/net/ipip2/tun_flags: No such file or directory
fopen /sys/class/net/gre10/tun_flags: No such file or directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note that this comes from ip.old
...
diff -urN /tmp/old/ip-d-tuntap-show /tmp/new/ip-d-tuntap-show
@@ -1,4 +1,4 @@
-tun1: tap user 1004 group 27
-	Attached to processes:
 tun0: tun user 1000 group 27
 	Attached to processes:
+tun1: tap user 1004 group 27
+	Attached to processes:
diff -urN /tmp/old/ip-s-tuntap-show /tmp/new/ip-s-tuntap-show
@@ -1,2 +1,2 @@
-tun1: tap user 1004 group 27
 tun0: tun user 1000 group 27
+tun1: tap user 1004 group 27
diff -urN /tmp/old/ip-tuntap-show /tmp/new/ip-tuntap-show
@@ -1,2 +1,2 @@
-tun1: tap user 1004 group 27
 tun0: tun user 1000 group 27
+tun1: tap user 1004 group 27

So basically only print order for ip tuntap get changes. Rest is intact.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:19:12 -08:00
Serhey Popovych 8960d45fba tuntap: Use netlink to walk through tuntap list
It seems bad idea to depend on sysfs being mounted and reflected to the
current network namespace. Same applies to procfs.

Instead netlink should be used to talk to the kernel and get list of
specific network devices among with their parameters.

Support for kernel netlink message filtering by passing IFLA_INFO_KIND
in RTM_GETLINK request: if kernel does not support filtering by the kind
we will check it in reply anyway. Check for ifi->ifi_type to be either
ARPHRD_NONE or ARPHRD_ETHER to seed up things a bit without kernel level
filtering.

Unfortunately tun driver does not implement dumping it's configuration
via netlink and we still need to use read_prop() which depends on sysfs
to get additional tun device information.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:47 -08:00
Serhey Popovych 3e95393871 iptunnel/ip6tunnel: Use netlink to walk through tunnels list
Both tunnels use legacy /proc/net/dev interface to get tunnel device and
it's statistics. This may cause problems for cases when procfs either
not mounted or not unshare(2)d for given network namespace.

Use netlink to walk through list of tunnel devices which is network
namespace aware and provides additional information such as statistics
in the dump message.

Since both address family specific variants of do_tunnels_list() nearly
the same, except for tunnel parameters structure initialization,
matching and printing we can introduce common one in tunnel.c.

To implement address family specific parts introduce new data structure
@struct tnl_print_nlmsg_info what contains all necessary information as
well as pointers to ->init(), ->match() and ->print() callbacks.

Annotate data structures by const where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:42 -08:00
Serhey Popovych ec89594080 iptunnel/ip6tunnel: Code cleanups
Use switch () instead of if () to compare tunnel type to fit into 80
columns and make code more readable. Print "\n" using fputc().

In iptunnel.c abstract tunnel parameters matching code in iptunnel.c
into ip_tunnel_parm_match() helper to conform with ip6tunnel.c. Use
memset() to initialize @p1.

In ip6tunnel.c no need to call ll_name_to_index() with name twice: just
use found previously index. Do not initialize @p1: this is done in
ip6_tnl_parm_init().

This is to show real differences between ip and ipv6 do_tunnels_list()
implementations and prepare for upcoming unification of them.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:37 -08:00
Serhey Popovych affb361785 tunnel: Split statistic getting and printing
This is first step to move tunnel code to use rtnl dump interface
instead of /proc/net/dev read.

Make tnl_print_stats() to accept @struct rtnl_link_stats64 parameter,
introduce tnl_get_stats() that will parse line from /proc/net/dev into
@struct rtnl_link_stats64.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:33 -08:00
Serhey Popovych 9a7bd5442b ip: Introduce get_rtnl_link_stats_rta() to get link statistics
Assume all statistics in ip(8) represented either by IFLA_STATS64 or
IFLA_STATS is 64 bit. It is clean that we can store __u32 counters of
@struct rtnl_link_stats in __u64 counters in @struct rtnl_link_stats64.

New get_rtnl_link_stats_rta() follows __print_link_stats() behaviour on
handling of stats attribute: copy no more than size of data structure
and no less than attribute length zeroing rest.

Drop print_link_stats32() as it's functionality can be handled by 64bit
variant. Move code from __print_link_stats() to print_link_stats64() and
finally rename print_link_stats64() to __print_link_stats().

More users of introduced function will come in future.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:28 -08:00
Serhey Popovych fb7b816827 ipaddress: Unify print_link_stats() and print_link_stats64()
To show real differences between these two variants adjust whitespace
intendation and use print_uint() instead of print_int() as all members
in both @struct rtnl_link_stats and @struct rtnl_link_stats64 are
unsigned.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:15:04 -08:00
David Ahern 306d616093 Merge branch 'route_print_refactor' into iproute2-next
Stephen Hemminger  says:

====================

This patch set breaks up the big print_route function into
smaller pieces for readability and to make later changes
to support JSON and color output easier.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:12:52 -08:00
Stephen Hemminger 1506d8d3f8 iproute: refactor printing of interface
For JSON and colorization, make common code a function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:44 -08:00
Stephen Hemminger f48e14880a iproute: refactor multipath print
Make printing of multipath attributes a function to improve
readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:39 -08:00
Stephen Hemminger a3484a9f20 iproute: refactor newdst, gateway and via printing
Since these fields are printed in both route and multipath case;
avoid duplicating code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:34 -08:00
Stephen Hemminger 5782965a1e iproute: refactor printing flow info
Use common code for printing flow info.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:28 -08:00
Stephen Hemminger 968272e791 iproute: refactor metrics print
Make a separate function to improve readability and enable
easier JSON conversion.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:22 -08:00
Stephen Hemminger 6e41810e1b iproute: refactor cacheinfo printing
Make common function for decoding cacheinfo.
This code may print more info than old version in some cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:16 -08:00
Stephen Hemminger daf30f6fde iproute: make printing IPv4 cache flags a function
More refactoring prior to JSON support.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:10 -08:00
Stephen Hemminger 8cfc2d4739 iproute: make printing icmpv6 a function
Refactor to reduce size of print_route and improve
readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:08:04 -08:00
Stephen Hemminger b3ab1e68e7 iproute: refactor printing flags
Both next hop and route need to decode flags.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-07 16:07:44 -08:00
Leon Romanovsky 5f8265536f rdma: Check return value of strdup call
Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 860676b424 rdma: Document resource tracking
Spartan version of resource tracking documentation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 8ecac46a60 rdma: Add QP resource tracking information
This patch adds ss-similar interface to view various resource
tracked objects. At this stage, only QP is presented.

1. Get all QPs for the specific device:
$ rdma res show qp link mlx5_4
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

$ rdma res show qp link mlx5_4/
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

2. Provide illegal port number (0 is illegal):
$ rdma res show qp link mlx5_4/0
Wrong device name

3. Get QPs of specific port:
$ rdma res show qp link mlx5_4/1
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

4. Get QPs which have not assigned port yet:
link mlx5_4/- lqpn 8 type UD state RESET sq-psn 0 pid 0 comm [ib_ipoib]

5. Limit to specific Local QPNs:
$ rdma res show qp link mlx5_4/1 lqpn 1-3,7
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 0 type SMI state RTS sq-psn 0 pid 0 comm [ib_core]

. Filter types (strings):
$ rdma res show qp link mlx5_4/1 type UD,gSi
link mlx5_4/1 lqpn 7 type UD state RTS sq-psn 0 pid 0 comm [ib_core]
link mlx5_4/1 lqpn 1 type GSI state RTS sq-psn 0 pid 0 comm [ib_core]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 923aa825ff rdma: Add resource tracking summary
The global resource summary information. The object names, current utilization
and maximum numbers are received as is from the kernel.

$ rdma res
1: mlx5_0: pd 3 cq 5 qp 4
2: mlx5_1: pd 3 cq 5 qp 4
3: mlx5_2: pd 3 cq 5 qp 4
4: mlx5_3: pd 2 cq 3 qp 2
5: mlx5_4: pd 3 cq 5 qp 4

$ rdma res show mlx5_4
5: mlx5_4: pd 3 cq 5 qp 44

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 684d82094b rdma: Allow external usage of compare string routine
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 6e0de6e886 rdma: Set pointer to device name position
The dev and link execution callbacks expects that next
command line argument is device or port name.

Set pointer to device or port name position prior calls to
rd_exec_dev()/rd_exec_link().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 1174be72d1 rdma: Add filtering infrastructure
This patch adds general infrastructure to RDMAtool to handle various
filtering options needed for the downstream resource tracking patches.

The infrastructure is generic and stores filters in list of key<->value
entries. There are three types of filters:

1. Numeric - the values are intended to be digits combined with '-' to
mark range and ',' to mark multiple entries, e.g. pid 1-100,234,400-401
is perfectly legit filter to limit process ids.

2. String - the values are consist from strings and "," as a denominator.

3. Link - special case to allow '/' in string to provide link name, e.g.
link mlx4_1/2.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 5f4892e2c8 rdma: Make visible the number of arguments
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Leon Romanovsky 6416d1a01f rdma: Add option to provide "-" sign for the port number
According to the IBTA spec [1], the physical connected port is provided
for the QP in RTR-to-INIT stage performed by modify_qp(). It causes
to do not have port number for newly created QPs.

The following patch adds "-" sign to present absence of port, because
QPs are going to be associated with rdmatool link object, which needs
port number as an index.

[1] InfiniBand Architecture Release 1.3 -
	"Table 96 QP State Transition Properties"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:23:52 -08:00
Stephen Hemminger c30c7b6e35 include: update UAPI types.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:21:27 -08:00
Stephen Hemminger 25226777b8 include: update interface UAPI from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:21:01 -08:00
Stephen Hemminger a0fc63ed68 include: update rdma uapi from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:20:14 -08:00
Stephen Hemminger d707207f4d include: update netfilter headers from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:19:32 -08:00
Stephen Hemminger d857a7fd4b include: update uapi with BPF from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-05 17:18:53 -08:00
Serhey Popovych c14f9d92ee treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
We have helper routines to support nested attribute addition into
netlink buffer: use them instead of open coding.

Use addattr_nest_compat()/addattr_nest_compat_end() where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 15:01:09 -08:00
Serhey Popovych 28254695d1 ip: Minor cleanups
1) Rename @hdr parameter to @n to be coherent with rest of the parsing
     code.

  2) Use NLMSG_DATA() to get pointer to the data after nlmsghdr instead
     of calculating it directly in ip/tunnel code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 14:58:26 -08:00
Serhey Popovych f7af0dc580 ip: Consolidate ip, xdp and lwtunnel parse/dump prototypes in ip_common.h
Having iplink_parse() and @struct iplink_req in include/utils.h does not
reflect it's IP nature: move to ip/ip_common.h.

Move contents of ip/iplink_xdp.h and ip/iproute_lwtunnel.h to
ip/ip_common.h since they are small (i.e. only two function prototypes):
ip/iplink_bridge.c and ip/iplink_vrf.c prototypes already there.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 14:55:12 -08:00
Serhey Popovych 2cd0578fcb Revert "ip address: Change print_linkinfo_brief to take filter as an input"
This reverts commit 63891c7013.

It seems print_linkinfo_brief() never accepts filter different than
default one and David Ahern suggests to revert it instead of making
new change that actually do revert.

Conflicts:
	ip/ipaddress.c
	ip/iplink.c

These are caused by JSON support addition after commit we reverting.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-01 08:27:29 -08:00
David Ahern 1e24e773f1 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-29 08:24:57 -08:00
Stephen Hemminger 50b8a842e8 v4.15.0 2018-01-29 08:08:52 -08:00
Jakub Kicinski 44c7655186 tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f366 ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski 7f536df7f3 ip: address: fix stats64 JSON object name
The JSON object name for statistics in ip link show is "stats644".
Looks like a typo, commit d0e720111a ("ip: ipaddress.c: add support
for json output") contains an example with the expected "stats64" name.

The fact that no one has noticed until now is probably an indication
that no one is using this object.  Hopefully it's not too late to fix
this, although IIUC this has already been in 4.13 and 4.14 releases :S

Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski c061b75895 tc: prio: JSON-ify prio output
Make JSON output work with prio Qdiscs.  This will also make
other qdiscs which reuse the print_qopt work, like mqprio or
pfifo_fast.

Note that there is a double space between "priomap" and first
prio number.  Keep this original behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 13:00:18 -08:00
Jakub Kicinski 097415d510 tc: red: JSON-ify RED output
Make JSON output work with RED Qdiscs.  Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 12:59:55 -08:00
David Ahern db9fd71038 Merge branch 'get_addr_rta' into iproute2-next
Serhey Popovych  says:

====================

Now we enhance get_addr() to return additional information about address
(e.g. if it unspecified or multicast) we want to have same functionality
for attributes in netlink message.

Introduce and use get_addr_rta() that parses given netlink attribute
into @inet_prefix data structure in the same way similar get_addr()
parses address from it's string representation.

Use attribute length to guess address family: force it by giving non
AF_UNSPEC @family to get_addr_rta() to ensure address is of expected
family.

Introduce and use inet_addr_match_rta() to further simplify and unify
code where get_addr_rta() intended to be used together with
inet_addr_match().

This is next step in ipv4 and ipv6 modules unification to prepare for
merge in the future.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:32:27 -08:00
Serhey Popovych b761fc4113 ip/tunnel: Unify local/remote endpoint address printing
Introduce and use tnl_print_endpoint() helper to print of tunnel
endpoint address.

Note that for AF_INET and AF_INET6 inet_ntop(3) is used that may return
NULL in case of failure and while unlikely format_host_rta() might
return NULL too. Handle this case when passing local/remote to
print_string().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:29 -08:00
Serhey Popovych 228f2e97ba tcp_metric: Use get_addr_rta()
While there remove & from inet_prefix.data when since it is array.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:27 -08:00
Serhey Popovych 62f9f94acf ipl2tp: Use get_addr_rta()
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:25 -08:00
Serhey Popovych a4270fd8ae ipneigh: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:24 -08:00
Serhey Popovych ba6052df6d ipmroute: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:22 -08:00
Serhey Popovych 746035b4d1 iprule: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:21 -08:00
Serhey Popovych c4de9adaf5 ipaddress: Use inet_addr_match_rta()
While there check return from get_prefix() for filter address.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:19 -08:00
Serhey Popovych 27c523e209 utils: Introduce get_addr_rta() and inet_addr_match_rta()
First is used to get address from netlink attribute to
inet_prefix data structure. Use memcpy() with constant
value to let complier optimize by replacing a call by
inlining load/store instructions.

Second is used to match address in given netlink attribute
with one given as reference. It matches successfully if
no attribute is given (@rta is NULL), reference address
family is AF_UNSPEC or it's length isn't given; fails if
get_attr_rta() can't get attribute or it's family does
not match reference; calls inet_addr_match() to get final
verdict.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-25 09:31:16 -08:00
David Ahern 68944e6f93 Merge branch 'unify_external' into iproute2-next
Serhey Popovych  says:

====================

With this series I want to unify collect metadata
handling in tunnels:

  1) Use "external" name for JSON and non-JSON output.

     Do not *print* any options when tunnel in
     collect metadata mode: gre6 already do
     this, so just apply to others.

  2) Do not *add* any attributes when configuring
     gre tunnel in collect metadata mode.

     Other tunnels (e.g. gre6, iptnl, ip6tnl)
     alredy do that.

This is next step in ipv4 and ipv6 modules
unification to prepare for merge in the future.

Any comments, suggestions and criticism as always
welcome.

v2
  For all tunnels implementing collect metadata
  use "external" keyword for both JSON. Thanks
  to Jiri Benc for detailed explanation.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:02:27 -08:00
Serhey Popovych de54cdd3de gre/gre6: Unify attribute addition to netlink buffer
There are couple of minor improvements:

  1) Check erspan_ver == 2 in gre6. It still could
     be 1 if erspan_idx is 0.

  2) Add tunnel encapsulation attributes only when
     collect metadata not in effect in gre.

  3) Trivial: address checkpatch issues.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:01:30 -08:00
Serhey Popovych 00ff4b8e31 ip/tunnel: Be consistent when printing tunnel collect metadata
Print only "external" if collect meta data attribute
is given: rest of parameters are irrelevant. This is
to follow gre6.

For both JSON and non-JSON output use "external" for
all tunnels including vxlan and geneve.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 10:01:26 -08:00
David Ahern 6517b5c0ac Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 09:59:03 -08:00
Wolfgang Bumiller 7ac29190db tc/lexer: let quotes actually start strings
The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-01-24 08:49:10 -08:00
Serhey Popovych 7a14358b16 iplink: Use ll_name_to_index() instead of if_nametoindex()
While benefit from using ll_name_to_index() with populated
cache can potentially be exploited only in few places
(e.g. bridge fdb/mdb/vlan show routines) there is another
advantage of ll_name_to_index() over plain if_nametoindex():

  in case of if_nametoindex() failure ll_name_to_index()
  will attempt to get index from common name in form "if%d"
  that may be returned from ll_index_to_name().

This makes output from ip(8) coherent with it's input.

Note that most of the code already switched from plain
if_nametoindex() to ll_name_to_index() to cached variant.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych 9dc04a4fd2 vti/vti6: Minor improvements
In prepare of link_vti.c and link_vti6.c merge:

  1) Make @fwmark of __u32 type instead of unsigned int
     in vti to match with rest tunneling code.

  2) Report when unable to translate @link network device
     name to index instead of silently exiting in vti6.

  3) Remove newline separating local/remote attributes
     from the ikey/okey in vti6 to match vti module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych c4743c4d9b iptnl/ip6tnl: Unify ttl/hoplimit parsing routines
Handle "inherit" case properly for gre6 and ip6tnl.

Use get_u8() in gre to parse ttl/hoplimit.

Be consistent about "hlim" alias to ttl/hoplimit
support.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych b53835de38 tunnel: Add space between encap-dport and encap-sport in non-JSON output
Fixes: bad76e6b1f ("ip/tunnel: Abstract tunnel encapsulation options printing")
Fixes: e2d4588331 ("ip: link_gre.c: add json output support")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-23 14:50:59 -08:00
Serhey Popovych 3cb92eb9ab gre/gre6: Post merge fixes
Few minor changes after merge of 'master' into 'net-next' branch:

  1) Follow 80 line length for printing erspan_index parameter
     as we did in master with commit 2a8d0f6e9c ("gre/tunnel:
     Print erspan_index using print_uint()").

  2) Remove remnants of encapsulation option printing: now it
     is done using tnl_print_encap() helper in commit bad76e6b1f
     ("ip/tunnel: Abstract tunnel encapsulation options printing").

Fixes: 8c75f69411 ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-22 09:30:09 -08:00
David Ahern c2e368df0a Merge branch 'shared_block' into net-next
Jiri Pirko  says:

====================

From: Jiri Pirko <jiri@mellanox.com>

Kernel allows to share all filters between qdiscs with use
of shared block.

Example:

block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
                        ^^^^^^^^^^^^^^^^

If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1

To make is more visual, the situation looks like this:

   ens7 ingress qdisc                 ens7 ingress qdisc
          |                                  |
          |                                  |
          +---------->  block 22  <----------+

Unlimited number of qdiscs may share the same block.

Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24

We can add filter using the block index:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

Note we cannot use the qdisc for filter manipulations of shared blocks:

$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.

We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:

$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 11:20:56 -08:00
Jiri Pirko 063463efd7 tc: implement ingress/egress block index attributes for qdiscs
During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:57 -08:00
Jiri Pirko 0c7cef9669 tc: introduce support for block-handle for filter operations
So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:53 -08:00
Jiri Pirko d0bcedd549 tc: introduce tc_qdisc_block_exists helper
This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:35 -08:00
David Ahern 40cf5b0959 Merge branch 'inet_get_addr' into net-next
Serhey Popovych  says:

====================

It looks confusing to have multiple independent
routines to get internet address from it's string
representation: get_addr() and inet_get_addr().

Most complicated users of inet_get_addr() is
iplink_geneve.c and iplink_vxlan.c because they
required to handle both AF_INET and AF_INET6
for their local/remote endpoints.

On the other hand get_addr() does not provide
additional information like address type: need
to address this. to get rid of current and
possible future code duplications. Note that
this functionality is first step to make proto
independent handling of local/remote endpoints
in ip/tunnel code (there will be additional
series based on this one).

Also fix get_addr_1() and get_prefix() to make
sure it always provide correct ->family and
->bitlen.

As always comments, suggestions and criticism
are welcome.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:11:07 -08:00
Serhey Popovych 6caad8f505 ip: Get rid of inet_get_addr()
Both geneve and vxlan modules are converted to
use get_addr() we can replace inet_get_addr()
in less problematic places and finally get
rid of inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:26 -08:00
Serhey Popovych 1e9b8072de iplink_vxlan: Get rid of inet_get_addr()
Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:23 -08:00
Serhey Popovych 6c4b672738 iplink_geneve: Get rid of inet_get_addr()
Now we have additional information about address
class from get_addr() we can use it in place of
inet_get_addr().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:22 -08:00
Serhey Popovych 7bf5e876d0 utils: Fast inet address classification after get_addr()
It looks very useful to receive additional information
from get_addr_1() and get_addr() about address to simplify
caller and get rid of code duplications.

For now following information can be returned:

  1) address is unspecified (zero)
  2) address is multicast
  3) address is internet: family is either AF_INET or
     AF_INET6.

More information can be added in the future.

Introduce inline helpers to make code using this new
address classification interface more self explaining:

  bool is_addrtype_inet(inet_prefix *addr)
    true if @addr is inet address

  bool is_addrtype_inet_unspec(inet_prefix *addr)
    true if @addr is unspecified inet address

  bool is_addrtype_inet_multi(inet_prefix *addr)
    true if @addr is multicast inet address

  bool is_addrtype_inet_not_unspec(inet_prefix *addr)
    true if @addr is not unspecified inet address
    false if @addr is not inet or unspecified inet

  bool is_addrtype_inet_not_multi(inet_prefix *addr)
    true if @addr is not multicast inet address
    false if @addr is not inet or multicast inet

Last two are useful for case when we need inet address
that is not unspecified or multicast.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:21 -08:00
Serhey Popovych 93fa12418d utils: Always specify family and ->bytelen in get_prefix_1()
Handle default/all/any special case in get_addr_1() to setup
->family and ->bytelen correctly.

Make get_addr_1() return ->bitlen == -2 instead of -1 to
distinguish default/all/any special case from the rest:
it is safe because all callers check ->bitlen < 0, not
explicit value -1.

Reduce intendation by one level and get rid of goto/label
to make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:19 -08:00
Serhey Popovych f2522007d8 utils: Always specify family for address in get_addr_1()
Set ->family correctly when string representing address
is "default", "all" or "any": get_addr_1() might be called
with AF_UNSPEC (e.g. get_addr() -> get_addr_1()).

Extend support for zero address to all address families,
not only AF_INET and AF_INET6 when one explicitly given
as @family: use af_byte_len() to correctly set address length.

Still assume AF_INET when @family is AF_UNSPEC.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:38:17 -08:00
David Ahern 8c75f69411 Merge branch 'master' into net-next
Conflicts:
	ip/link_gre.c
	ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:37:39 -08:00
Jakub Kicinski 5691e6bc58 bpf: support map offload
When program is loaded with a specified ifindex, use that
ifindex also when creating maps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:41 -08:00
Jakub Kicinski e0850bdedc tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:23 -08:00
David Ahern c0788a09d4 Update kernel headers to 4.15-rc8
Update kernel headers to commit 30c3e9d47035
("l2tp: remove switch block in l2tp_nl_cmd_session_create()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:33:41 -08:00
Serhey Popovych c9391f120e tunnel: Return constant string without copying it
We return constant string from tnl_strproto(), no need
to copy it to temporary buffer and then return such
buffer as const: return constant string instead.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych b8dc6c5b0e vti6/tunnel: Unify and simplify link type help functions
Both of these two changes are missing for link_vti6.c:

  commit 8b47135474 ("ip: link: Unify link type help functions a bit")
  commit 561e650eff ("ip link: Shortify printing the usage of link type")

Replay them on link_vti6.c to bring link type help functions
inline with other tunneling code.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych 34a8c54d6d vti/tunnel: Unify ikey/okey printing
For vti6 tunnel we print [io]key in dotted-quad notation
(ipv4 address) while in vti we do that in hex format.

For vti tunnel we print [io]key only if value is not
zero while for vti6 we miss such check.

Unify vti and vti6 tunnel [io]key output.

While here enlarge s2 buffer to the same size as in rest
of tunnel support code (64 bytes) and check return from
inet_ntop().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:41 -08:00
Serhey Popovych 2a8d0f6e9c gre/tunnel: Print erspan_index using print_uint()
One is missing in JSON output because fprintf()
is used instead of print_uint().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych bad76e6b1f ip/tunnel: Abstract tunnel encapsulation options printing
Get rid of code duplications and consolidate encapsulation
options printing in single function - tnl_print_encap().

Introduce and use tnl_encap_str() to format encapsulation
option string according to tempate and given values to avoid
code duplication and simplify it.

Use print_string() instead of fputs() and fprintf() to
print encapsulation for !is_json_context().

Print "unknown" parameter for "encap" type in PRINT_FP
context using "%s " format specifier and benefit from
complite time string merge.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych e97ad3d248 ip/tunnel: Use print_0xhex() instead of print_string()
No need for custom SPRINT_BUF() and snprintf() 0x%x
value to this buffer: we can use print_0xhex() instead
of print_string().

In link_iptnl.c use s2 instead of s1 buffer and remove
s1.

While there adjust fwmark option print order in iptnl
and ip6tnl to get it match each other.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 3caa526c7b ip/tunnel: Simplify and unify tos printing
For ip tunnels tos can be 0 when not configured, 1 when
inherited from encapsulated packet and rest specifying
diffserv (rfc2474) or tos (rfc1349) bits. It is stored
in packet tos/diffserv field and returned in tos
netlink attribute to userspace.

Simplify and unify tos printing by using print_0xhex()
and print_string() instead of fprintf() to output values.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 375560c4ab ip/tunnel: Correct and unify ttl/hoplimit printing
Both ttl/hoplimit is from 1 to 255. Zero has special meaning:
use encapsulated packet value. In ip-link(8) -d output this
looks like "ttl/hoplimit inherit". In JSON we have "int" type
for ttl and therefore values from 0 (inherit) to 255.

To do the best in handling ttl/hoplimit we need to accept
both cases: missing attribute in netlink dump and zero value
for "inherit"ed case. Last one is broken since JSON output
introduction for gre/iptnl versions and was never true for
gre6/ip6tnl.

For all tunnels, except ip6tnl change JSON type from "int" to
"uint" to reflect true nature of the ttl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:40 -08:00
Serhey Popovych 45d3a6efb2 iplink: Use ll_index_to_name() instead of if_indextoname()
There are two reasons for switching to cached variant:

  1) ll_index_to_name() may return result from cache,
     eliminating expensive ioctl() to the kernel.

     Note that most of the code already switched from plain
     if_indextoname() to ll_index_to_name() to cached variant
     in print path because in most cases cache populated.

  2) It always return name in the form "if%d", even if
     entry is not in cache and ioctl() fails. This drops
     "link_index" from JSON output.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:34:37 -08:00
Arkadi Sharshevsky c619f2be8b devlink: Ignore unknown attributes
In case of extending the UAPI old packages would break.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:30:36 -08:00
Serhey Popovych c4fb35bdfc iplink: Fix "alias" parameter length calculations
We need NEXT_ARG() to get *argv pointing to "alias"
parameter value. Overwise we get and check "alias"
string length.

Fixes: f88becf35e ("iplink: Process "alias" parameter correctly")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-18 16:24:43 -08:00
Gal Pressman c7db3921ec man: Document the meaning of zero in min/max_tx_rate parameters
Zero value in min/max_tx_rate has a special meaning of no rate limit,
document it.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Gal Pressman 39315157ab ipaddress: Make sure VF min/max rate API is supported before using it
When using the new minimum rate API and providing only one parameter
(minimum rate/maximum rate), we query the VF min and max rate regardless
of kernel support.
This resulted in segmentation fault in ipaddr_loop_each_vf, which tries
to access NULL pointer.

This patch identifies such cases by testing the VF table for NULL
pointer in IFLA_VF_RATE, and aborts the operation.
Aborting on the first VF is valid since if the kernel does not support
the new API for the first VF, it will not support it for the other VFs
as well.

Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Gal Pressman 04be08e0bd iplink: Validate minimum tx rate is less than maximum tx rate
According to the documentation (man ip-link), the minimum TXRATE should
be always <= Maximum TXRATE, but commit f89a2a05ff ("Add support to
configure SR-IOV VF minimum and maximum Tx rate through ip tool") didn't
enforce it.

Fixes: f89a2a05ff ("Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-01-17 10:44:42 -08:00
Serhey Popovych a3e0229e25 ipaddress: Use family_name() for better code reuse
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-17 10:42:17 -08:00
Phil Sutter 6f7df6b2a1 tc: Optimize gact action lookup
When adding a filter with a gact action such as 'drop', tc first tries
to open a shared object with equivalent name (m_drop.so in this case)
before trying gact. Avoid this by matching the action name against those
handled by gact prior to calling get_action_kind().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-01-17 10:27:47 -08:00
David Ahern e209d4c83d Merge branch 'tc-batch' into net-next
Chris Mi says:

====================
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this patchset, at most 128
commands can be accumulated before sending to kernel.

We introduced a new function in patch 1 to support for sending
multiple messages. In patch 2, we add this support for filter
add/delete/change/replace and actions add/change/replace commands.

But please note that kernel still processes the requests one by one.
To process the requests in parallel in kernel is another effort.
The time we're saving in this patchset is the user mode and kernel mode
context switch. So this patchset works on top of the current kernel.

Using the following script in kernel, we can generate 1,000,000 rules.
	tools/testing/selftests/tc-testing/tdc_batch.py

Without this patchset, 'tc -b $file' exection time is:

real    0m15.555s
user    0m7.211s
sys     0m8.284s

With this patchset, 'tc -b $file' exection time is:

real    0m12.360s
user    0m6.082s
sys     0m6.213s

The insertion rate is improved more than 10%.
====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-15 08:27:20 -08:00
Chris Mi 485d0c6001 tc: Add batchsize feature for filter and actions
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this support, at most 128
commands can be accumulated before sending to kernel.

Now it only works for the following successive commands:
1. filter add/delete/change/replace
2. actions add/change/replace

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:35 -08:00
Chris Mi 72a2ff3916 lib/libnetlink: Add a new function rtnl_talk_iov
rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_iov that can send multiple messages to kernel.
rtnl_talk_iov takes struct iovec * and iovlen as arguments.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:33 -08:00
Christian Ehrhardt 9bed02a5d5 tests: make sure rand_dev suffix has 6 chars
The change to limit the read size from /dev/urandom is a tradeoff.
Reading too much can trigger an issue, but so it could if the
suggested 250 random chars would not contain enough [:alpha:] chars.
If they contain:
 a) >=6 all is ok
 b) [1-5] the devname would be shorter than expected (non fatal).
 c) 0 things would break

In loops of hundreds of thousands it always was (a) for my, but since
if occuring in a test it might be hard to track what happened avoid
this issue by retrying on the length condition.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:29:51 -08:00
Christian Ehrhardt 4afbeaeeaf tests: read limited amount from /dev/urandom
In some test environments like e.g. Ubuntu & Debian autopkgtest it
can happen that while generating random device names the pipes
between tr and head are considered dead while processing.
That prints (non fatal) issues like:
  Running ip/link/new_link.t [iproute2-this/4.13.0-17-generic]: tr:
write error: Broken pipe
  tr: write error
  PASS

This only happens if reading an infinite amount of chars with the
read from urandom, so reading a defined amount fixes the issue.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:29:51 -08:00
Mike Frysinger a8b970d7d2 ifcfg/rtpr: convert to POSIX shell
These files are already mostly written in POSIX shell, so convert their
shebangs to /bin/sh and tweak the few bashisms in here.

URL: https://crbug.com/756559
Reported-by: Pat Erley <perley@chromium.org>
Signed-off-by: Mike Frysinger <vapier@chromium.org>
2018-01-10 08:26:09 -08:00
Mike Frysinger 54f5991acd mark shell scripts +x
This makes it easier to execute locally for testing.

Signed-off-by: Mike Frysinger <vapier@chromium.org>
2018-01-10 08:23:49 -08:00
Stephen Hemminger 7d63671030 tc: remove no longer relevant README
This document described how kernel and tc used to handle
timing. In last two years, kernel has switched over to using
ktime. Nothing to see here, move along.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:21:22 -08:00
Serhey Popovych cc899123cc ip6tnl/tunnel: Output hoplimit before encapsulation limit
To follow gre6 output print hoplimit before encapsulation
limit in link_ip6tnl.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 763cf4956d gre6/tunnel: Output flowlabel after tclass
To follow ip6tnl output print flowlabel after tclass
in link_gre6.c.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych e3945d92b0 ip6/tunnel: Unify encap_limit printing
Use %u format specifier to print it in link_gre6.c and
make code more readable.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych a0fd0c3a30 ip6/tunnel: Unify flowlabel printing
Use @s2 buffer to store string representation of
flowlabel and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

Omit flowlabel from output if no flowinfo attribute
is given and IP6_TNL_F_USE_ORIG_FLOWLABEL isn't set.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 090524f899 ip6/tunnel: Unify tclass printing
Use @s2 buffer to store string representation of
tclass and get rid of extra SPRINT_BUF(): no
need to preserve @s2 contents for later.

Use print_string(PRINT_ANY, ...) with prepared by
snprintf() string for both PRINT_JSON and PRINT_FP
cases.

While there use __u32 for flowinfo in link_gre6.c
and check for IFLA_GRE_FLOWINFO attribute presense.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych 4dc6665b6b ip6tnl/tunnel: Do not print obscure flowinfo
It is implementation internal and main purpose
of printing it seems debugging.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:12 -08:00
Serhey Popovych b76b24006c ip6/tunnel: Fix tclass output
In link_gre6.c it seems copy paste error: tclass is 8 bits,
not 20 as flowlabel.

In link_iptnl.c rename "flowinfo_tclass" to "tclass" as it
correct name since flowinfo is implementation internal name
used to label combined within u32 attribute tclass and
flowlabel.

Fixes: 1facc1c61c ("ip: link_ip6tnl.c: add json output support")
Fixes: 2e706e12d9 ("Merge branch 'master' into net-next")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:06:11 -08:00
Jamal Hadi Salim 24a5a48e27 tc: Fix filter protocol output
Fixes: 249284ff5a ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-01-09 08:09:10 -08:00
Stephen Hemminger a366508913 include: update ethernet headers
Incorporate upstream changes to fix compliation with MUSL.
See commit 6926e041a892
 ("uapi/if_ether.h: prevent redefinition of struct ethhdr")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:08:03 -08:00
Antonio Quartulli ebbb219c92 ss: fix NULL pointer access when parsing unix sockets with oldformat
When parsing and printing the unix sockets in unix_show(),
if the oldformat is detected, the peer_name member of the sockstat
object is left uninitialized (NULL).
For this reason, if a filter has been specified on the command line,
a strcmp() will crash when trying to access it.

Avoid crash by checking that peer_name is not NULL before
passing it to strcmp().

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:02:46 -08:00
Antonio Quartulli 192be8fccb ss: fix crash when skipping disabled header field
When the first header field is disabled (i.e. when passing the -t
option), field_flush() is invoked with the `buffer` global variable
still zero'd.
However, in field_flush() we try to access buffer.cur->len
during variables initialization, thus leading to a SIGSEGV.

It's interesting to note that this bug appears only when the code
is compiled with -O0, because the compiler is smart
enough to immediately jump to the return statement if optimizations
are enabled and skip the faulty instruction.

Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 08:02:46 -08:00
Filip Moc 33f6dd23a5 ip fou: pass family attribute as u8
This fixes fou on big-endian systems.

Signed-off-by: Filip Moc <dev@moc6.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 07:58:37 -08:00
David Ahern ac6561417a Restore --no-print-directory option for silent builds
Commit 69fed534a5 ("change how Config is used in Makefile's") removed
Config from Makefile. Config had the checks to set VERBOSE based on user
request and VERBOSE is used to add the --no-print-directory argument.
Since Config is gone, add the relevant setup for VERBOSE to Makefile
to restore quieter builds by default.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-09 07:58:00 -08:00
David Ahern 6a21ca8a4a Merge branch 'master' into net-next
Conflicts:
	man/man8/ip-link.8.in

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-08 10:10:45 -08:00
Serhey Popovych 9deb754283 link_iptnl: Open "encap" JSON object
It seems missing pair of open_json_object()/close_json_object()
in iptnl implementation.

Note that we open "encap" JSON object in ip6tnl.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Serhey Popovych d9aefbc0b8 link_iptnl: Print tunnel mode
Tunnel mode does not appear in parameters print for iptnl
supported tunnels like ipip and sit, while printed for
ip6tnl.

Print tunnel mode as "proto" field name for JSON and
without any name when printing to cli to follow ip6tnl
behaviour.

For non JSON output we have:

   $ ip -d link show dev sit1

Before:
-------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit remote any local X.X.X.X ...
        ~~~

After:
------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit any remote any local X.X.X.X ...
        ^^^

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Serhey Popovych 68a7f5ed47 link_iptnl: Kill code duplication
Both sit and ipip "mode" parameter handling nearly the same.
Except for sit we have "ip6ip" mode: check it only when
configuring sit.

Note that there is no need strcmp(lu->id, "ipip"): if it is
not sit it is "ipip" because we have only these two link util
defined in module.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-01-05 16:35:47 -08:00
Matthias Schiffer cfd6ccbfd0 devlink, rdma, tipc: properly define TARGETS without HAVE_MNL
Leaving a variable with a generic name such as TARGETS undefined would lead
to Make picking up its value from the environment. Avoid this by always
defining TARGETS in the Makefiles.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
2018-01-05 16:32:17 -08:00
Jakub Kicinski 7d424c7193 ip: link: add support for netdevsim device type
netdevsim is a new software device for testing kernel APIs
without any hardware attached.  Allow users to create such
devices.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
2018-01-02 20:46:19 -08:00
Luca Boccassi c7d6cbaf85 man: fix small formatting errors
Lintian detected the following formatting errors:

 man/man8/devlink-sb.8.gz 230: warning: macro `b' not defined
 man/man8/ip-link.8.gz 1243: warning: macro `in-8' not defined
  (possibly missing space after `in')
 man/man8/tc-u32.8.gz `R' is a string (producing the registered sign),
  not a macro.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-02 11:29:39 -08:00
Luca Boccassi 36c1d2383a man: routel/routef: don't mention filesystem paths
The filesytem paths to these scripts might be different on various
distros, so don't mention it in the manpages. It is not really useful
information anyway.

Originally submitted as Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424

Reported-by: jidanni@jidanni.org
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi fe2ab15d2c man: ip-address: document 15-char limit for LABEL
Trying to set a label longer than 15 characters returns an error:
 RTNETLINK answers: Numerical result out of range

Document the limit in the manpage.

Originally reported as a Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886

Reported-by: Gabor Kiss <kissg@ssg.ki.iif.hu>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi be78fade55 man: add more keywords to ip.8 short description
A Debian user suggested adding more network-related keywords to the
ip manpage, so that manpage-scraping and indexing software like
apropos can do a better job of categorizing the programs.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983

Suggested-by: Lynoure Braakman <lynoure@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Luca Boccassi cd25876440 man: drop references to Debian-specific paths
Documentation should be distribution-agnostic - any specific quirks
should be handled by downstream maintainers, if necessary.
Remove mentions of Debian paths and package names.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-30 09:43:47 -08:00
Serhey Popovych b760a8823a ip/tunnel: Document "external" parameter
Add it to ip-link(8) "type gre" output help message
as well as to ip-link(8) page.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-28 09:40:02 -08:00
Serhey Popovych 8c0b19d178 vxcan,veth: Forbid "type" for peer device
It is already given for original device we configure this
peer for.

Results from following command before/after change applied
are shown below:

  $ ip link add dev veth1a type veth peer name veth1b \
                           type veth peer name veth1c

Before:
-------

<no output, no netdevs created>

After:
------

Error: duplicate "type": "veth" is the second value.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-28 09:35:27 -08:00
Yuval Mintz b97c6fa71d qdisc: print offload indication
Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-27 13:55:16 -08:00
Serhey Popovych afdf9277eb gre6/tunnel: Do not submit garbage in flowinfo
We always send flowinfo to the kernel. If flowlabel/tclass
was set first to non-inherit value and then reset to
inherit we do not clear flowlabel/tclass part in flowinfo,
send it to kernel and can get from the kernel back.

Even if we check for IP6_TNL_F_USE_ORIG_TCLASS and
IP6_TNL_F_USE_ORIG_FLOWLABEL when printing options
sending invalid flowlabel/tclass to the kernel seems
bad idea.

Note that ip6tnl always clean corresponding flowinfo
parts on inherit.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-27 13:45:37 -08:00
Serhey Popovych 147ade01b0 gre,ip6tnl/tunnel: Fix noencap- support
We must clear bit, not set all but given bit.

Fixes: 858dbb208e ("ip link: Add support for remote checksum offload to IP tunnels")
Fixes: 73516e128a ("ip6tnl: Support for fou encapsulation"
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-27 13:41:42 -08:00
Leon Romanovsky 874c734c1c rdma: Move link execution logic to common code
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 5fc17280b1 rdma: Rename rd_free_devmap to be rd_free
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 7109f4b212 rdma: Rename free function to be rd_cleanup
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky b1e6bc437f rdma: Get rid of dev_map_free call
The dev_map_free() is called once only and it is short,
so it is better to integrate it into the caller's site.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky b55644412f rdma: Print supplied device name in case of wrong name
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky e3dee3c81f rdma: Check that port index exists before operate on link layer
Link layer operates on port layer, hence it should check
it existence before execution commands.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 4e2eb9fdf9 rdma: Fix misspelled SYS_IMAGE_GUID
SYS_IMAGE_GUIG is actually SYS_IMAGE_GUID.

Fixes: da990ab40a ("rdma: Add link object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 8b92a2c930 rdma: Move per-device handler function to generic code
Most of the proposed objects are working in the scope "dev"
and will implement the same logic. Move the code to utils.c,
so other objects will be able to reuse the code.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 99da90326e rdma: Protect dev_map_lookup from wrong input
Despite the fact that all callers to dev_map_lookup are ensuring that
there is always device name prior to call to that function, it is better
and safer to check that in the dev_map_lookup itself.

Fixes: 40df8263a0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Leon Romanovsky 0fc8c30b4e rdma: Reduce scope of _dev_map_lookup call
There is no external users of _dev_map_lookup function,
so let's limit its scope to be local.

Fixes: 40df8263a0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
William Tu 1e4be52bf2 erspan: add erspan usage description
The patch adds erspan usage description, so 'ip link help erspan'
and 'ip link help ip6erspan' shows the options.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-27 07:47:35 -08:00
Serhey Popovych 08ede25fda ip/tunnel: No need to free answer after rtnl_talk() on error
Since rtnl_talk() never returns with answer buffer allocated
on error we do not need to release it manually. After this
initializing answer with NULL before rtnl_talk() is useless.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-26 09:07:43 -08:00
Serhey Popovych 1ed8a5ca87 utils: ll_addr: Handle ARPHRD_IP6GRE in ll_addr_n2a()
ll_addr_n2a() correctly prints tunnel endpoints for gre, ipip, sit
and ip6tnl, but not for ip6gre. Fix this by adding ARPHRD_IP6GRE to
IPv6 tunnel endpoing address conversion.

Before:
-------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd \
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

After:
------

$ ip link show
...
18: ip6tnl0: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default
    link/tunnel6 :: brd ::
19: ip6gre0: <NOARP> mtu 1456 qdisc noop state DOWN mode DEFAULT group default
    link/gre6 :: brd ::

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-26 09:07:42 -08:00
William Tu 2897636267 erspan: add erspan version II support
The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation.  Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph.  Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-20 15:18:04 -08:00
David Ahern f82517f80d Update headers from 4.15-rc3
Update kernel headers to commit f39a5c01c3d2 ("Merge branch
'nfp-flower-add-Geneve-tunnel-support'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-12-20 15:17:59 -08:00
Alexander Zubkov 9135c4d603 iproute: "list/flush/save default" selected all of the routes
When running "ip route list default" and not specifying address family,
one will get all of the routes instead of just default only. The same
is for "exact default" and "match default".

It behaves in such a way because default route with unspecified family
has the same all-zeroes value like no prefix specified at all. Thus
following code blindly ignores the fact, that prefix was actually
specified.

This patch adds the flag PREFIXLEN_SPECIFIED to the default route too.
And then checks its value when filtering routes.

Signed-off-by: Alexander Zubkov <green@msu.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:23:09 -08:00
Alexander Zubkov ed7fdc950d iproute: list/flush/save filter also by metric
Metric is one of the "unique key" fields of the route in Linux. But
still one can not use its value in filter while running ip list.
Because of this writing checks in scripts for example is incovenient.

Signed-off-by: Alexander Zubkov <green@msu.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:16:10 -08:00
Serhey Popovych 560cf61253 link_vti6: Always add local/remote endpoint attributes
All tunnels already support for parsing/adding zero
endpoints and vti6 isn't an exception.

This check was added as part of commit 2a80154fde
(vti6: fix local/remote any addr handling) and looks
too restrictive as purpose of change is to avoid
endpoint configuration from uninitialized data.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:59 -08:00
Serhey Popovych 95614cc8a3 link_ip6tnl: Use IN6ADDR_ANY_INIT to initialize local/remote endpoints
Use specialized helper to initialize endpoint addresses with
zeros instead of open coding this. This unifies initialization
style with other ipv6 tunnel variants (i.e. gre6 and vti6).

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:01 -08:00
Serhey Popovych 1f44b93744 ip/tunnel: Use tnl_parse_key() to parse tunnel key
It is added with
commit a7ed1520ee ("ip/tunnel: introduce tnl_parse_key()")
to avoid code duplication in ip6?tunnel.c.

Reuse it for gre/gre6 and vti/vti6 tunnel rtnl
configuration interface with the same purpose
it is used in tunnel ioctl interface in ip6?tunnel.c.

While there change type of key variables from
unsigned integer to __be32 to reflect nature of the
value they store and place error message in
tnl_parse_key() on a single line to make single
call to fprintf().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:14:01 -08:00
Serhey Popovych dac9ff35ea iplink: Kill redundant network device name checks
Since commit 625df645b7 (Check user supplied interface name lengths)
iplink_parse() validates network device name using check_ifname()
helpers.

Remove redundant "name" length checks from iplink_parse() callers.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:12:41 -08:00
Serhey Popovych f88becf35e iplink: Process "alias" parameter correctly
Do not stop parameters processing after "alias" parameter: it might
not be a last one. Seems copy pasted from "type" parameter code.

Check it's length does not exceed IFALIASZ - 1. Better we warn
than get RTNL error.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:11:38 -08:00
Serhey Popovych b7ea12ae43 iplink: Improve index parameter handling
Correctly check for valid network device index supplied on
command line: indexes are always greather than zero. Check
for duplicate "index" argument.

Initialize @index to 0 to simplify handling it in iplink_modify().
Other callers (link_veth.c, iplink_vxcan.c) already did so.

No need to initialize ifi_index with 0 since it is already
initialized at the @struct req initialization time and not
modified in iplink_parse().

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-19 08:11:38 -08:00
Stephen Hemminger bd9cea5d8c utils: fix makeargs stack overflow
The makeargs() function did not handle end of string correctly
and would reference past end of string.

Found by fuzzing with ASAN.

Reported-by:Bug Basher <iamliketohack@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-18 11:19:48 -08:00
Stephen Hemminger 5073581835 ss: fix crash with invalid command input file
If given an invalid input file with -F flag, ss would crash.
Examples of invalid input are line to long, or null file.

Found by fuzzing with ASAN.

Reported-by:Bug Basher <iamliketohack@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-18 11:18:55 -08:00
Stephen Hemminger ae8e1cb83b ip: validate vlan value for vlan info
The VLAN tag must be 0..4095 to be valid.
Better to trap it here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 13:14:38 -08:00
Serhey Popovych a6addd5cdc ip: gre: fix IFLA_GRE_LINK attribute sizing
Attribute IFLA_GRE_LINK is 32 bit long, not 8 bit.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Serhey Popovych 9aceaad71b ip/tunnel: Use get_addr() instead of get_prefix() for local/remote endpoints
Manual page ip-link(8) states that both local and remote accept
IPADDR not PREFIX. Use get_addr() instead of get_prefix() to
parse local/remote endpoint address correctly.

Force corresponding address family instead of using preferred_family
to catch weired cases as shown below.

Before this patch it is possible to create tunnel with commands:

  ip    li add dev ip6gre2 type ip6gre local fe80::1/64 remote fe80::2/64
  ip -4 li add dev ip6gre2 type ip6gre local 10.0.0.1/24 remote 10.0.0.2/24

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Serhey Popovych 57daab1e70 ip/tunnel: Unify setup and accept zero address for local/remote endpoints
It is fully legal to submit zero (INADDR_ANY/IN6ADDR_ANY_INIT)
value for local and/or remote endpoints for all tunnel drivers:
no need additionally check this in userspace.

Note that all tunnel specific code already can pass zero address
to the kernel.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2017-12-16 10:08:54 -08:00
Oliver Hartkopp 1eccc57341 ip: add vxcan/veth to ip-link man page
veth and vxcan both create a vitual tunnel between a pair of virtual network
devices. This patch adds the content for the now supported vxcan netdevices
and the documentation to create peer devices for vxcan and veth.

Additional remove 'can' that accidently was on the list of link types which
can be created by 'ip link add' as 'can' devices are real network devices.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:04:33 -08:00
Roman Mashak 3d791a326b ss: add missing path MTU parameter
v3:
   Rebase and use out() instead of printf().
v2:
   Print the path MTU immediately after the MSS, as it is easier to parse
   for humans (suggested by Neal Cardwell).

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:02:34 -08:00
Stephen Hemminger 2c6aaad949 include: qdisc offload defines
UAPI changes from upstream:
	net: sched: Add TCA_HW_OFFLOAD
	pkt_sched: Remove TC_RED_OFFLOADED from uapi

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-16 10:00:43 -08:00
Stephen Hemminger c189177efc Merge branch 'master' into net-next 2017-12-14 21:19:54 -08:00
William Tu 6231c5bec6 gre6: add collect metadata support
The patch adds 'external' option to support collect metadata
gre6 tunnel.  The 'external' keyword is already used to set the
device into collect metadata mode such as vxlan, geneve, ipip,
etc.  This patch extends support for ipv6 gre and gretap.
Example of L3 and L2 gre device:
bash:~# ip link add dev ip6gre123 type ip6gre external
bash:~# ip link add dev ip6gretap123 type ip6gretap external

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 21:19:49 -08:00
Chris Mi 83cf5bc73b tc: fix command "tc actions del" hang issue
If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
not be called. Command tc will wait for the reply for ever.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 21:17:04 -08:00
Stephen Hemminger 08f9d166c3 iplink: add definitions for GSO_MAX
Until kernel exports these, add GSO_MAX values into iplink
rather than assuming they are UINT_MAX + 1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 18:22:56 -08:00
Solio Sarabia 051274b4db iplink: validate maximum gso_max_size
Validate the upper limit for gso_max_size, valid range is [0-65,536]
inclusive. Fix minor whitespace in iplink man page.

Signed-off-by: Solio Sarabia <solio.sarabia@intel.com>
2017-12-14 18:12:14 -08:00
Jiri Pirko 1876ab0779 tc: fix json array closing
Fixes: 2704bd6255 ("tc: jsonify actions core")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-12-13 18:16:27 -08:00
Oliver Hartkopp 7827b37603 ip: add vxcan to help text
Add missing tag 'vxcan' inside the help text which was missing in commit
efe459c76d ('ip: link add vxcan support').

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2017-12-13 18:16:22 -08:00
Phil Dibowitz 7b17832445 Show 'external' link mode in output
Recently `external` support was added to the tunnel drivers, but there is no way
to introspect this from userspace. This adds support for that.

Now `ip -details link` shows it:

```
7: tunl60@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group
default qlen 1
    link/tunnel6 :: brd :: promiscuity 0
    ip6tnl external any remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000) addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```

Signed-off-by: Phil Dibowitz <phil@ipom.com>
2017-12-13 18:15:51 -08:00
Stephen Hemminger 3a2fbf007b Merge branch 'master' into net-next 2017-12-12 12:12:20 -08:00
Davide Caratti 88b428f03f tc: bash-completion: add missing 'classid' keyword
users of 'matchall' filter can specify a value for the class id: update
bash-completion accordingly.

Fixes: b32c0b64fa ("tc: bash-completion: Add support for matchall")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-12-12 12:11:37 -08:00
Stefano Brivio 87b1a7aec7 ss: Implement automatic column width calculation
Group fitting fields into lines and space them equally using the
remaining screen width for each line. If columns don't fit on
one line, break them into the least possible amount of lines and
keep them aligned across lines.

This is done by:
 - recording the length of the longest item in each column during
   formatting and buffering (which was added in the previous patch)
 - fitting as many fields as possible on each line of output
 - distributing the remaining padding space equally between the
   columns

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 691bd854bf ss: Buffer raw fields first, then render them as a table
This allows us to measure the maximum field length for each
column before printing fields and will permit us to apply
optimal field spacing and distribution. Structure of the output
buffer with chunked allocation is described in comments.

Output is still unchanged, original spacing is used.

Running over one million sockets with -tul options by simply
modifying main() to loop 50,000 times over the *_show()
functions, buffering the whole output and rendering it at the
end, with 10 UDP sockets, 10 TCP sockets, while throwing
output away, doesn't show significant changes in execution time
on my laptop with an Intel i7-6600U CPU:

- before this patch:
$ time ./ss -tul > /dev/null
real	0m29.899s
user	0m2.017s
sys	0m27.801s

- after this patch:
$ time ./ss -tul > /dev/null
real	0m29.827s
user	0m1.942s
sys	0m27.812s

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 59f46b7b5b ss: Introduce columns lightweight abstraction
Instead of embedding spacing directly while printing contents,
logically declare columns and functions to buffer their content,
to print left and right spacing around fields, to flush them to
screen, and to print headers.

This makes it a bit easier to handle layout changes and prepares
for full output buffering, needed for optimal spacing in field
output layout.

Columns are currently set up to retain exactly the same output
as before. This needs some slight adjustments of the values
previously calculated in main(), as the width value introduced
here already includes the width of left delimiters and spacing
is not explicitly printed anymore whenever a field is printed.
These calculations will go away altogether once automatic width
calculation is implemented.

We can also remove explicit printing of newlines after the final
content for a given line is printed, flushing the last field on
a line will cause field_flush() to print newlines where
appropriate.

No changes in output expected here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stefano Brivio 90351722cb ss: Replace printf() calls for "main" output by calls to helper
This is preparation work for output buffering, which will allow
us to use optimal spacing and alignment of logical "columns".

The new out() function is just a re-implementation of a typical
libc's printf(), except that the return value of vfprintf() is
ignored as no callers use it. This implementation will be
replaced in the next patches to provide column width adjustment
and adequate spacing.

All printf() calls that output parts of the socket list are now
replaced by calls to out(). Output of summary and version is
excluded from this.

No functional differences here, output not affected.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2017-12-12 12:11:37 -08:00
Stephen Hemminger 81724d6142 Merge branch 'master' into net-next 2017-12-11 16:06:11 -08:00
Stephen Hemminger 4b072e9b49 uapi: tun add eBPF based queue selection method
Upstream commit 96f84061620c6325a2ca9a9a05b410e6461d03c3
    tun: add eBPF based queue selection method

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 16:03:27 -08:00
Stephen Hemminger b7f5fd3698 uapi: add access to snd_cwnd and other sock_ops
From upstream kernel commit f19397a5c65665d66e3866b42056f1f58b7a366b
    bpf: Add access to snd_cwnd and others in sock_ops

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 16:01:17 -08:00
Roman Mashak 9f1a9ae888 ss: remove duplicate assignment
Fixes: 8250bc9ff4 ("ss: Unify inet sockets output")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-11 15:56:10 -08:00
Stephen Hemminger c2db423f7c iplink: allow configuring GSO max values
This allows sending GSO maximum values when configuring a device.
The values are advisory. Most devices will ignore them but for some
pseudo devices such as veth pairs they can be set.

Example:
	# ip link add dev vm1 type veth peer name vm2 gso_max_size 32768

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-12-08 21:33:08 -08:00
Stephen Hemminger 5c6e3478ac Merge branch 'master' into net-next 2017-12-08 21:32:33 -08:00
Michal Privoznik 3572e01a09 tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()
Not all callers want parse_action_control*() to advance the
arguments. For instance act_parse_police() does the argument
advancing itself.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-12-08 10:29:01 -08:00
Wei Wang 00ac78d39c ss: print tcpi_rcv_ssthresh
tcpi_rcv_ssthresh is an important stats when debugging receive side
behavior.
Add it to the ss output.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
2017-12-08 10:27:57 -08:00
Stephen Hemminger 39be47fb5e update headers from 4.15-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-05 17:30:29 -08:00
Phil Sutter 6bf156415a man: tc-csum.8: Fix inconsistency in example description
Commit 6bbe5e6290 ("man: tc-csum.8: Fix example") changed both source
and destination IP addresses in example code but missed to update the
example's description accordingly.

Fixes: 6bbe5e6290 ("man: tc-csum.8: Fix example")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-29 10:14:51 -08:00
Stephen Hemminger b38778bb5e update bpf header from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-28 18:16:51 -08:00
Stephen Hemminger f6351157b9 Merge branch 'master' into net-next 2017-11-28 09:53:28 -08:00
Jiri Pirko 615634c30e man: add -json option to tc manpage
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-28 09:52:26 -08:00
Robert Shearman b6fae7887f vxlan: Make id optional when modifying a link
Specifying the IFLA_VXLAN_LINK attribute on a vxlan link modify is
optional in the kernel, so make the id argument optional for "ip link
set ..." to avoid a user needing to specify it when changing another
attribute.

Signed-off-by: Robert Shearman <rs823p@att.com>
2017-11-28 09:48:26 -08:00
Robert Shearman 079e67816e gre: Fix ttl inherit option
Specifying "... ttl inherit" currently does nothing on a GRE link
modify since the previous ttl value is retrieved up front. Fix this by
explicitly setting ttl to 0 when "inherit" is specified for the
option, since 0 represents the semantics of inherit.

Signed-off-by: Robert Shearman <rs823p@att.com>
2017-11-28 09:48:22 -08:00
Phil Sutter 56708ae7c9 link_gre6: Detect invalid encaplimit values
Looks like a typo: get_u8() returns 0 on success and -1 on error, so the
error checking here was ineffective.

Fixes: a11b7b71a6 ("link_gre6: really support encaplimit option")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-28 09:48:13 -08:00
Stephen Hemminger c6a656f4f9 m_mirred: style cleanups
Fix whitespace and long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:42:17 -08:00
Stephen Hemminger 5c235ac27e m_gact: whitespace cleanup
Fix whitespace errors reported by checkpatch

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:38:21 -08:00
Stephen Hemminger ed4856919f m_action: style cleanup
Break long lines, and use bool where possible.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:36:15 -08:00
Stephen Hemminger eb4bccf12b m_vlan: style cleanups
Break long lines and make duplicated code into function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:28:55 -08:00
Jiri Pirko b021ee40f6 tc: jsonify vlan action
Add json output to vlan action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 502c4adf19 tc: jsonify mirred action
Add json output to mirred action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 66fedb6df0 tc: jsonify gact action
Add json output to gact action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 2704bd6255 tc: jsonify actions core
Add json output to actions core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 619ca351e3 tc: jsonify matchall filter
Add json output to matchall filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko e28b88a464 tc: jsonify flower filter
Add json output to flower filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 249284ff5a tc: jsonify filter core
Add json output to filter core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko f354fa6aa5 tc: jsonify htb qdisc
Add json output to htb qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 378ac491f5 tc: jsonify fq_codel qdisc
Add json output to fq_codel qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 4fcec7f366 tc: jsonify stats2
Add json output to stats2.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko c91d262f41 tc: jsonify qdisc core
Add json output to qdisc core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 81051c60c2 tc: remove action cookie len from printout
Make the output same as input and avoid printout of unnecessary len.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jiri Pirko abff45b802 tc: move action cookie print out of the stats if
Cookie print was made dependent on show_stats for no good reason. Fix
this bu pushing cookie print ot of the stats if.

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jakub Kicinski 4f2eb14f71 iplink: communicate ifindex for xdp offload
When xdpoffload option is used, communicate the ifindex down
to the kernel to trigger device-specific load.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:58 -08:00
Jakub Kicinski eb91c55731 f_bpf: communicate ifindex for eBPF offload
Split parsing and loading of the eBPF program and if skip_sw is set
load the program for ifindex, to which the qdisc is attached.

Note that the ifindex will be ignored for programs which are already
loaded (e.g. when using pinned programs), but in that case we just
trust the user knows what he's doing.  Hopefully we will get extack
soon in the driver to help debugging this case.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 01ea76b1cf tc_filter: resolve device name before parsing filter
Move resolving device name into an ifindex before calling filter
specific callbacks.  This way if filters need the ifindex, they
can read it from the request.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 67c857df80 {f, m}_bpf: don't allow specifying multiple bpf programs
Both BPF filter and action will allow users to specify run
multiple times, and only the last one will be considered by
the kernel.  Explicitly refuse such command lines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 65fdae3d18 bpf: allow loading programs for a specific ifindex
For BPF offload we need to specify the ifindex when program is
loaded now.  Extend the bpf common code to accommodate that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 4a847fcb51 bpf: expose bpf_parse_common() and bpf_load_common()
Expose bpf_parse_common() and bpf_load_common() functions
for those users who may want to modify the parameters to
load after parsing is done.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 399db8392b bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
bpf_parse_common() parses and loads the program.  Rename it
accordingly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 3f0b9e620c bpf: split parse from program loading
Parsing command line is currently done together with potentially
loading a new eBPF program.  This makes it more difficult to
provide additional parameters for loading (which may come after
the eBPF program info on the command line).

Split the two (only internally for now).  Verbose parameter
has to be saved in struct bpf_cfg_in to be carried between
the stages.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 51be754690 bpf: allocate opcode table in struct bpf_cfg_in
struct bpf_cfg_in already carries a pointer to sock_filter ops.
It's currently set to a local variable in bpf_parse_opt_tbl(),
shared between parsing and loading stages.  Move the array
entirely to struct bpf_cfg_in, this will allow us to split
parsing and loading.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski f20ff2f195 bpf: keep parsed program mode in struct bpf_cfg_in
bpf_parse() will parse command line arguments to find out the
program mode.  This mode will later be needed at loading time.
Instead of keeping it locally add it to struct bpf_cfg_in,
this will allow splitting parsing and loading stages.

enum bpf_mode has to be moved to the header file, because C
doesn't allow forward declaration of enums.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 658cfebc27 bpf: pass program type in struct bpf_cfg_in
Program type is needed both for parsing and loading of
the program.  Parsing may also induce the type based on
signatures from __bpf_prog_meta.  Instead of passing
the type around keep it in struct bpf_cfg_in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Stephen Hemminger 6054c1ebf7 SPDX license identifiers
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 12:21:35 -08:00
Stephen Hemminger 859af0a5dc tc: break long lines
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:31:36 -08:00
Nishanth Devarajan 927e3cfb52 tc: B.W limits can now be specified in %.
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.

Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.

Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
2017-11-24 11:22:13 -08:00
Stephen Hemminger b317557f58 tc: replace magic constant 16 with #define
For places where tc is expecting device name use IFNAMSIZ.
For others where it is a filter name, introduce a new constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:19:18 -08:00
Stephen Hemminger b2a2d9530c update bpf header from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:23:13 -08:00
Stephen Hemminger a03c704b2f ila: fix formatting of help message
Make ip ila help look like ip route help

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:21:43 -08:00
Tom Herbert 010260a717 ila: create ila_common.h
Move common functions related to checksum, identifier and hook-type
parsing to a common include file.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 86905c8f05 ila: support for configuring identifier and hook types
Expose identifier type and hook types in ILA configuraiton
and reporting. This adds support in both ip ila ILA LWT.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 1177552398 ila: support to configure checksum neutral-map-auto
Configuration support in both ip ila and ip LWT for checksum
neutral-map-auto. This is a mode of ILA where checksum
neutral mapping is assumed for packets (there is no C-bit
in the identifier to indicate checksum neutral).

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert 2a1bc2fb7c ila: added csum neutral support to ipila
Add checksum neutral to ip ila configuration. This control whether
the C-bit is interpreted as checksum neutral bit.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Tom Herbert d3357cfc7b ila: Fix reporting of ILA locators and locator match
Fix retrieval of locator value for RTA to get 64 bits instead of 32.

Signed-off-by: Tom Herbert <tom@quantonium.net>
2017-11-24 09:14:13 -08:00
Stephen Hemminger 7b8c436c30 update headers from 4.15-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 09:07:42 -08:00
Jakub Kicinski f6a54d72a5 bpf: initialize the verifier log
If program loading fails before verifier prints its first
message, the verifier log will not be initialized.  Always
set the first character of the log buffer to zero to make
sure we don't dump non-printable characters to the terminal.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-23 20:47:38 -08:00
Simon Ruderich de3ddbc27d man: document ip xfrm policy nosock
Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Simon Ruderich 7662e20161 man: document ip fou show
This was forgotten in cf4caf336a (2017-11-16, Add "show" subcommand to
"ip fou").

Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Simon Ruderich 2fc8883b9a man: document ip route get mark
Signed-off-by: Simon Ruderich <simon@ruderich.org>
2017-11-20 10:40:33 -08:00
Lorenzo Colitti 05b3b344b2 iproute2: fixes to compile on some systems.
1. Put the declarations of strlcpy and strlcat inside
   an #ifdef NEED_STRLCPY. Their declarations were already in a
   similar #ifdef.
2. In bpf_scm.h, include sys/un.h for struct sockaddr_un.
3. In utils.h, include time.h for struct timeval.

Tested: builds on ubuntu 14.04 with "make clean distclean; ./configure && make -j64"
Tested: 4.14.1 builds on Android with Android-specific #ifndefs for missing library code
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2017-11-20 10:38:58 -08:00
Amritha Nambiar 2e67b57a43 man: tc-flower: add explanation for hw_tc option
Add details explaining the hw_tc option.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-18 15:59:00 -08:00
Amritha Nambiar f63783c7bf man: tc-mqprio: add documentation for new offload options
This patch adds documentation for additional offload modes and
associated parameters in tc-mqprio.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-18 15:59:00 -08:00
Greg Greenway cf4caf336a Add "show" subcommand to "ip fou"
Sample output:

$ sudo ./ip/ip fou add port 111 ipproto 11
$ sudo ./ip/ip fou add port 222 ipproto 22 -6
$ ./ip/ip fou show
port 222 ipproto 22 -6
port 111 ipproto 11

Signed-off-by: Greg Greenway <ggreenway@apple.com>
2017-11-16 17:05:07 -08:00
Phil Sutter 66942e522e tc_util: Silence spurious compiler warning
GCC version 7.2.1 complains that 'result1' may be used uninitialized in
parse_action_control_slash_spaces(). This should not be possible in
practice, so the actual value 'result1' is initialized with does not
matter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Phil Sutter b7c61286de tc_util: Drop needless pointer check
The function parse_action_control_slash() returns early if 'p' is NULL,
so after the first call to action_a2n(), 'p' is guaranteed not to be
NULL. Otherwise, the assignment '*p = 0' above would dereference the
NULL pointer already anyway, so just drop this check here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Jon Maloy aab3661bd2 tipc: change family attribute from u32 to u16
commit 28033ae4e0f ("net: netlink: Update attr validation to require
exact length for some types") introduces a stricter control on attributes
of type NLA_U* and NLA_S*.

Since the tipc tool is sending a family attribute of u32 instead of as
expected u16 the tool is now effectively broken.

We fix this by changing the type of the said attribute.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
2017-11-16 15:58:48 -08:00
Stephen Hemminger a60742aaf4 Merge branch 'master' into net-next 2017-11-13 10:35:17 -08:00
Stephen Hemminger 212b52299e v4.14.1 2017-11-13 10:09:57 -08:00
Stephen Hemminger b867d46daf utils: remove duplicate include of ctype.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-13 10:08:54 -08:00
Leon Romanovsky aba736dc25 ip: Fix compilation break on old systems
As was reported [1], the iproute2 fails to compile on old systems,
in Cong's case, it was Fedora 19, in our case it was RedHat 7.2, which
failed with the following errors during compilation:

ipxfrm.c: In function ‘xfrm_selector_print’:
ipxfrm.c:479:7: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
  case IPPROTO_MH:
       ^
ipxfrm.c:479:7: note: each undeclared identifier is reported only once
for each function it appears in
ipxfrm.c: In function ‘xfrm_selector_upspec_parse’:
ipxfrm.c:1345:8: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
   case IPPROTO_MH:
        ^                                                                                                                                                            make[1]: *** [ipxfrm.o] Error 1

The reason to it is the order of headers files. The IPPROTO_MH field is
set in kernel's UAPI header file (in6.h), but only in case
__UAPI_DEF_IPPROTO_V6 is set before. That define comes from other kernel's
header file (libc-compat.h) and is set in case there are no previous
libc relevant declarations.

In ip code, the include of <netdb.h> causes to indirect inclusion of
<netinet/in.h> and it sets __UAPI_DEF_IPPROTO_V6 to be zero and prevents from
IPPROTO_MH declaration.

This patch takes the simplest possible approach to fix the compilation
error by checking if IPPROTO_MH was defined before and in case it
wasn't, it defines it to be the same as in the kernel.

[1] https://www.spinics.net/lists/netdev/msg463980.html

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Riad Abo Raed <riada@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-11-13 10:07:25 -08:00
Stephen Hemminger 9edf7016e8 Merge branch 'master' into net-next 2017-11-12 16:30:14 -08:00
Stephen Hemminger 7d14d00795 v4.14.0 2017-11-12 16:29:43 -08:00
Stephen Hemminger 913352fe54 drop unneeded include of syslog.h
Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 16:22:36 -08:00
Stephen Hemminger d72ac5a17b Merge branch 'master' into net-next 2017-11-12 16:17:37 -08:00
Ivan Vecera 3e897912cb devlink: add batch command support
The patch adds support to batch devlink commands.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-11-12 16:15:23 -08:00
Ivan Vecera 6648853975 lib: make resolve_hosts variable common
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2017-11-12 16:15:23 -08:00
Stephen Hemminger cd458a7764 update kernel headers from 4.14 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 15:58:11 -08:00
Roman Mashak 274b63ae21 tc: distinguish Add/Replace qdisc operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-12 15:57:08 -08:00
Stephen Hemminger 840d95d348 update kernel headers
To 4.14 final kernel version
Note: SPDX tag added by upstream

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 15:55:49 -08:00
Jesus Sanchez-Palencia 1915af404f man: Clarify idleslope calculation for tc-cbs
In order to calculate the idleSlope parameter of CBS correctly, users
must take into account the entire packet size, including the overhead
from all layers.

Add some more details to the man page to clarify that, giving one
simple example and pointing users to the correct 802.1Q section for
further clarifications if needed.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
2017-11-12 15:51:23 -08:00
William Tu 8595cc40e9 ip6_gre: add support for ERSPAN tunnel
The patch adds ERSPAN type II tunnel support for IPv6.

Signed-off-by: William Tu <u9012063@gmail.com>
2017-11-09 09:53:34 +09:00
David Ahern 844c37b423 libnetlink: Handle extack messages for non-error case
Kernel can now return non-fatal error messages in extack facility.
Update iproute2 to dump to use if present.
- rename nl_dump_ext_err to nl_dump_ext_ack
- rename errmsg to msg
- add call to nl_dump_ext_ack in rtnl_dump_done and __rtnl_talk for
  non-error path

Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
2017-11-09 09:46:50 +09:00
Stephen Hemminger b158c1790f Merge branch 'master' into net-next 2017-11-09 09:45:17 +09:00
Stephen Hemminger e4beb52787 netem: use fixed rather than floating point for scaling
Don't need to do floating point math to compute scaled random.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:15:34 +09:00
Thomas Egerer 0c7d651b38 xfrm_{state, policy}: Allow to deleteall polices/states with marks
Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Thomas Egerer 5474d440b8 xfrm_policy: Do not attempt to deleteall a socket policy
Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Thomas Egerer 20e4840a0a xfrm_policy: Add filter option for socket policies
Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2017-11-07 11:12:30 +09:00
Amritha Nambiar 0d575c4dac flower: Represent HW traffic classes as classid values
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.

HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.

Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1/32\
  ip_proto udp dst_port 12000 skip_sw\
  hw_tc 1

# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  skip_sw
  in_hw

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-07 11:04:54 +09:00
Stephen Hemminger ba914908eb Update kernel headers with new SPDK identifier
The kernel header sanitizisation process now puts SPDK GPLv2
license comment on files.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:02:41 +09:00
Stephen Hemminger 665ef5a5c0 Update kernel headers from 4.14-rc8 nete-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:02:08 +09:00
Roopa Prabhu 86d0988b16 bridge: fdb: print NDA_SRC_VNI if available
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-11-01 22:31:50 +01:00
Vinicius Costa Gomes d652988920 man: Add initial manpage for tc-cbs(8)
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Vinicius Costa Gomes c9681ac1b3 tc: Add support for the CBS qdisc
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
   		hicredit <HICREDIT> sendslope <SENDSLOPE>
		idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Amritha Nambiar e1ac5b06f2 tc/mqprio: Offload mode and shaper options in mqprio
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.

Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.

# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
  min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             mode:channel
             shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit

v2: Avoid buffer overrun and minor cleanup.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-01 22:20:06 +01:00
Mahesh Bandewar 1ef5c95201 ip/ipvlan: enhance ability to add mode flags to existing modes
IPvlan supported bridge-only functionality prior to commits
a190d04db937 ('ipvlan: introduce 'private' attribute for all
existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode').
These two commits allow to configure the VEPA and private modes now.
This patch adds those options in ip command.

e.g.
  bash:~# ip link add link eth0 name ipvl0 type ipvlan mode l2 private
  -or-
  bash:~# ip link add link eth0 type ipvl0 type ipvlan mode l2 vepa

Also the output will reflect the mode and the mode-flag accordingly.
e.g.
  bash:~# ip -details link show ipvl0
  4: ipvl0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc ...
     link/ether 00:1a:11:44:a5:3e brd ff:ff:ff:ff:ff:ff promiscuity 0
     ipvlan  mode l2 private addrgenmode eui64 numtxqueues 1 ...

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
2017-11-01 22:17:01 +01:00
Stephen Hemminger fe388b9e0c update kernel headers from 4.14-rc7 net-next 2017-11-01 22:15:50 +01:00
Stephen Hemminger 5ee63855dc Merge branch 'master' into net-next 2017-11-01 22:15:00 +01:00
Stefano Brivio 4357f5c31a ss: Fix width calculations when Netid or State columns are missing
If Netid or State columns are missing, we must not subtract one
for each of these two columns from the remaining screen width,
while distributing available space to columns. This one
character corresponding to one delimiting space has to be
subtracted only if the columns are actually printed.

Further, in the existing implementation, if the screen width is
an odd number, one additional character is added to the width of
one of the two columns.

But if both are not printed, this filling character needs to be
added somewhere else, in order to have the right spacing
allowing us to fill lines completely.

Address and port fields are printed in pairs (local and remote),
so we can't distribute the space to any of them, because it
would be doubled. Instead, print this additional space to the
right of the Send-Q column, to keep code changes to a minimum.

This is particularly visible with 'ss -f netlink -Z'. Before
this patch, with an 80 column terminal, we have:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pr
oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

and with an 81 column terminal:

$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0            rtnl:evolution-calen/2049           *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0            rtnl:clock-applet/1944              *                     pro
c_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

After this patch, in both cases, the output is:
$ ss -f netlink -Z|head -n3
Recv-Q Send-Q Local Address:Port                 Peer Address:Port
0      0             rtnl:evolution-calen/2049            *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
0      0             rtnl:clock-applet/1944               *
 proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Stefano Brivio 22658ff53a ss: Streamline process context printing in netlink_show_one()
There's no need to check 'pid_context' before calling free().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Stefano Brivio 38509fa903 ss: Remove useless width specifier in process context print
Both local address and service, and remote address and service
fields are already printed out in netlink_show_one() before we
start printing process context, by calling sock_addr_print()
twice.

At this point, sock_addr_print() has already forced the remote
service field to be 'serv_width' wide -- that is, 'serv_width'
width has already been consumed, before we print process
context.

Hence, it makes no sense to force the display width of process
context to be 'serv_width' wide again: previous prints have
filled up the line already. Remove the width specifier and
prefix with a space instead, to keep this consistent with fields
which are displayed after the first output line.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2017-11-01 22:10:52 +01:00
Christoph Paasch e54ed38074 ip: add fastopen_no_cookie option to ip route
This patch adds fastopen_no_cookie option to enable/disable TCP fastopen
without a cookie on a per-route basis.

Support in Linux was added with 71c02379c762 (tcp: Configure TFO without
cookie per socket and/or per route).

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
2017-11-01 22:07:51 +01:00
Roman Mashak acbe9118ce ip netns: use strtol() instead of atoi()
Use strtol-based API to parse and validate integer input; atoi() does
not detect errors and may yield undefined behaviour if result can't be
represented.

v2: use get_unsigned() since network namespace is really an unsigned value.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-01 22:06:05 +01:00
Shmulik Ladkani 21440d19d9 ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
IP6_TNL_F_ALLOW_LOCAL_REMOTE allows tunnel traffic on ip6tnl devices
where the remote endpoint is a local host address.

Specifying "[no]allow-localremote" controls the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on ip6tnl interfaces.

This is the user-space counterpart for kernel
commit 908d140a87a7 ("ip6_tunnel: Allow rcv/xmit even if remote address is a local address")

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2017-10-31 18:15:30 +01:00
Roopa Prabhu 8652eeb3ab bridge: vlan: support for per vlan tunnel info
This patch uses kernel bridge vlan attribute
IFLA_BRIDGE_VLAN_TUNNEL_INFO to set/delete/show per vlan tunnel info.

$bridge vlan add dev vxlan0 vid 2000 tunnel_info id 2000
$bridge vlan add dev vxlan0 vid 1000-1001 tunnel_info id 2000-2001

$bridge vlan tunnelshow
port    vlan ids        tunnel id
vxlan0   1000-1001       1000-1001
         2000            2000

$bridge  -j vlan tunnelshow
{
    "dummy0": [],
    "dummy1": [],
    "bridge": [],
    "vxlan0": [{
            "vlan": 1000,
            "vlanEnd": 1001,
            "tunid": 1000,
            "tunidEnd": 1001
        },{
            "vlan": 2000,
            "tunid": 2000
        }
    ]
}

This patch also fixes a json termination bug in print_vlan
when filter vlan is provided by the user.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-31 18:04:30 +01:00
Roopa Prabhu 8cfde5c97f iplink: bridge: support bridge port vlan_tunnel attribute
This config maps to IFLA_BRPORT_VLAN_TUNNEL bridge port netlink
flag attribute. This flag enables vlan to tunnel mapping on a bridge
port. It is off by default.

set vlan_tunnel attribute on bridge port vxlan0:

$ip link set dev vxlan0 type bridge_slave vlan_tunnel on
$ip link set dev vxlan0 type bridge_slave vlan_tunnel off

or via bridge command

$bridge link set dev vxlan0 vlan_tunnel on
$bridge link set dev vxlan0 vlan_tunnel off

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-31 18:04:30 +01:00
Stephen Hemminger 0ac0017a1a Update kernel headers from net-next (4.14-rc6)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-31 18:04:13 +01:00
Stephen Hemminger c1606c44b3 Merge branch 'master' into net-next 2017-10-31 18:03:12 +01:00
Stephen Hemminger e348889289 Update kernel headers based on 4.14-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-31 18:01:51 +01:00
Alexander Aring 25a24934ab tc: m_ife: fix match tcindex parsing
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-10-31 17:56:58 +01:00
Roman Mashak 103bc5f11d ip: added missing newline in man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-31 17:24:45 +01:00
Stephen Hemminger 106753c937 Merge branch 'master' into net-next 2017-10-27 09:27:43 +02:00
Stephen Hemminger bcddcddd29 bridge: checkpatch related cleanups
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 09:15:23 +02:00
Stephen Hemminger 21fef525fa iproute: source code cleanup
Break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:52:48 +02:00
Stephen Hemminger 1d2cfcf8b5 update kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:31:26 +02:00
Stephen Hemminger 7fde8cfddc include: add TCP fastopen option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:30:48 +02:00
Stephen Hemminger fa19d6bc01 bpf: update header file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-27 08:28:36 +02:00
Roman Mashak fab9a18a2e bridge: request vlans along with link information
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Roman Mashak 52fd1fe36c bridge: dump vlan table information for link
Kernel also reports vlans a port is member of, so print it. Since vlan
table can be quite large, dump it only when detailed information is
requested.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Roman Mashak b97c679c9f bridge: isolate vlans parsing code in a separate API
IFLA_BRIDGE_VLAN_INFO parsing logic will be used in link and vlan
processing code, so it makes sense to move it in the separate function.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-26 12:35:04 +02:00
Hangbin Liu 86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Hangbin Liu 2d34851cd3 lib/libnetlink: re malloc buff if size is not enough
With commit 72b365e8e0 ("libnetlink: Double the dump buffer size")
we doubled the buffer size to support more VFs. But the VFs number is
increasing all the time. Some customers even use more than 200 VFs now.

We could not double it everytime when the buffer is not enough. Let's just
not hard code the buffer size and malloc the correct number when running.

Introduce function rtnl_recvmsg() to always return a newly allocated buffer.
The caller need to free it after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
yupeng 5a9bca7145 man: add additional explainations for ss
Add detail explains of -m, -o, -e and -i options, which are not documented anywhere

Signed-off-by: yupeng <yupeng0921@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-10-26 12:25:42 +02:00
Stephen Hemminger 66e40a4a86 update headers for TC and TIPC from net-next 2017-10-25 12:40:47 +02:00
Stephen Hemminger 2ac0c6c2c1 Merge branch 'master' into net-next 2017-10-25 12:39:18 +02:00
Jamal Hadi Salim 35f2a7639d tc/actions: introduce support for jump action
Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
	action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
	ref 2 bind 1 installed 198 sec used 2 sec
	Action statistics:
	Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
	backlog 0b 0p requeues 0

	action order 2:  skbedit mark 11 pass
	 index 11 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 3:  skbedit mark 12 pass
	 index 12 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-25 12:33:46 +02:00
Nikolay Aleksandrov a5e3f41b4d ip: bridge_slave: add neigh_suppress to the type help and
Add neigh_suppress to the type help and document it in ip-link's man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-23 14:46:24 +02:00
Stephen Hemminger 702631416e Merge branch 'master' into net-next 2017-10-23 14:44:55 +02:00
Roman Mashak c4be5febaa ss: initialize 'fackets' member of tcpstat structure
'fackets' has never been initialized with kernel extracted information, thus
never really printed.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-10-23 14:43:11 +02:00
Michal Kubecek 21503ed2af ip maddr: fix filtering by device
Commit 530903dd90 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison.  This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.

Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.

Fixes: 530903dd90 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
2017-10-21 15:02:24 +02:00
Phil Sutter 572e893613 ss: Detect IPPROTO_ICMPV6 sockets
Prefix IPPROTO_ICMPV6 sockets with 'icmp6' instead of '???'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 15:00:16 +02:00
Phil Sutter 1267c0b924 ss: Distinguish between IPv4 and IPv6 wildcard sockets
Commit aba9c23a6e ("ss: enclose IPv6 address in brackets") unified
display of wildcard sockets in IPv4 and IPv6 to print the unspecified
address as '*'. Users then complained that they can't distinguish
between address families anymore, so change this again to what Stephen
Hemminger suggested:

| *:80    << both IPV6 and IPV4
| [::]:80 << IPV6_ONLY
| 0.0.0.0:80  << IPV4_ONLY

Note that on older kernels which don't support INET_DIAG_SKV6ONLY
attribute, pure IPv6 sockets will still show as '*'.

Cc: Humberto Alves <hjalves@live.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-21 14:59:29 +02:00
Stephen Hemminger 4b4dde0ae6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2017-10-18 17:11:50 -07:00
Nikolay Aleksandrov fdbdd356f0 ip: bridge_slave: add support for per-port group_fwd_mask
This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-10-16 09:26:05 -07:00
Stephen Hemminger 75209f840b Merge branch 'master' into net-next 2017-10-16 09:25:56 -07:00
Petr Vorel 4b73d52f8a color: Rename enum
COLOR_NONE is more descriptive than COLOR_CLEAR.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel 99b89c518e color: Cleanup code to remove "magic" offset + 7
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel 24b058a2a4 color: Fix another ip segfault when using --color switch
Commit 959f1428 ("color: add new COLOR_NONE and disable_color function")
introducing color enum COLOR_NONE, which is not only duplicite of
COLOR_CLEAR, but also caused segfault, when running ip with --color
switch, as 'attr + 8' in color_fprintf() access array item out of
bounds. Thus removing it and restoring "magic" offset + 7.

Reproduce with:
$ ip -c a

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel e6849a5722 color: Fix ip segfault when using --color switch
Commit d0e72011 ("ip: ipaddress.c: add support for json output")
introduced passing -1 as enum color_attr. This is not only wrong as no
color_attr has value -1, but also causes another segfault in color_fprintf()
on this setup as there is no item with index -1 in array of enum attr_colors[].
Using COLOR_CLEAR is valid option.

Reproduce with:
$ COLORFGBG='0;15' ip -c a

NOTE: COLORFGBG is environmental variable used for defining whether user
has light or dark background.
COLORFGBG="0;15" is used to ask for color set suitable for light background,
COLORFGBG="15;0" is used to ask for color set suitable for dark background.

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:24:11 -07:00
Petr Vorel f1241a7e3b tests: Revert back /bin/sh in shebang
This was added by mistake in commit ecd44e68
("tests: Remove bashisms (s/source/.)")

Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-16 09:22:01 -07:00
Stephen Hemminger 4c6080b5c4 Merge branch 'master' into net-next 2017-10-12 09:06:10 -07:00
Stephen Hemminger 268a9eee98 netem: fix code indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 18:08:15 -07:00
Stephen Hemminger 4999c57733 Merge branch 'master' into net-next 2017-10-11 11:07:20 -07:00
Ivan Delalande da9cc6ab90 ss: print MD5 signature keys configured on TCP sockets
These keys are reported by kernel 4.14 and later under the
INET_DIAG_MD5SIG attribute, when INET_DIAG_INFO is requested (ss -i)
and we have CAP_NET_ADMIN. The additional output looks like:

	md5keys:fe80::/64=signing_key,10.1.2.0/24=foobar,::1/128=Test

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Ivan Delalande 7c72df5a95 utils: add print_escape_buf to format and print arbitrary bytes
Keep it as simple as possible for now: just escape anything that is not
isprint-able, is among the "escape" parameter or '\' as an octal escape
sequence. This should be pretty easy to extend if any other user needs
something more complex in the future.

Signed-off-by: Ivan Delalande <colona@arista.com>
2017-10-11 11:04:47 -07:00
Baruch Siach 4f6b73380d lib: fix multiple strlcpy definition
Some C libraries, like uClibc and musl, provide BSD compatible
strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
and strlcat when the C library provides them.

This fixes the following static link error with uClibc-ng:

.../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
collect2: error: ld returned 1 exit status

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2017-10-11 11:02:13 -07:00
Petr Vorel ecd44e6805 tests: Remove bashisms (s/source/.)
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
2017-10-11 10:59:50 -07:00
Roopa Prabhu 41973a47dd iplink: new option to set neigh suppression on a bridge port
neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-10-11 10:56:36 -07:00
Yotam Gigi 2055bf15f1 ip: mroute: Print offload indication
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.

Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1)      Iif: sw1p7  Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1)  Iif: sw1p7  Oifs: sw1p4 State: resolved offload

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-10-11 10:54:27 -07:00
Stefan Hajnoczi c759116a0b ss: add AF_VSOCK support
The AF_VSOCK address family is a host<->guest communications channel
supported by VMware, KVM, and Hyper-V.  Initial VMware support was
released in Linux 3.9 in 2013 and transports for other hypervisors were
added later.

AF_VSOCK addresses are <u32 cid, u32 port> tuples.  The 32-bit cid
integer is comparable to an IP address.  AF_VSOCK ports work like
TCP/UDP ports.

Both SOCK_STREAM and SOCK_DGRAM socket types are available.

This patch adds AF_VSOCK support to ss(8) so that sockets can be
observed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:51:03 -07:00
Stefan Hajnoczi b338a3e7e7 ss: allow AF_FAMILY constants >32
Linux has more than 32 address families defined in <bits/socket.h>.  Use
a 64-bit type so all of them can be represented in the filter->families
bitmask.

It's easy to introduce bugs when using (1 << AF_FAMILY) because the
value is 32-bit.  This can produce incorrect results from bitmask
operations so introduce the FAMILY_MASK() macro to eliminate these bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-11 10:50:20 -07:00
Stephen Hemminger e9b0d82dfa uapi: add include linux/vm_sockets_diag.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:49:25 -07:00
Stephen Hemminger 07682b88d8 Merge branch 'master' into net-next 2017-10-11 10:47:55 -07:00
Stephen Hemminger 237a52731b rdma: move headers to uapi
And update with version from upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:47:28 -07:00
Stephen Hemminger f53da99ad7 update uapi headers from 4.14-rc4 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:43:38 -07:00
Stephen Hemminger 92503441cc Merge branch 'master' into net-next 2017-10-11 10:43:13 -07:00
Lorenzo Colitti 596b1c94aa iproute: build more easily on Android
iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.

For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.

Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.

Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc.  However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.

This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.

Tested: compiles on ubuntu trusty (glibc)

Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2017-10-11 10:35:45 -07:00
Stephen Hemminger b0af8fc1aa tipc: don't need custom CFLAGS
Since libmnl CFLAGS are now handled by config.mk

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 10:35:38 -07:00
Stephen Hemminger 60509b997d Merge branch 'master' into net-next 2017-10-02 08:04:13 -07:00
Stephen Hemminger 1db903def7 update headers from net-next rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-02 08:03:45 -07:00
Phil Sutter 625df645b7 Check user supplied interface name lengths
The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter ee474849c8 tc: flower: No need to cache indev arg
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter 26111ab1db ip{6, }tunnel: Avoid copying user-supplied interface name around
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Michal Kubecek 4c0939a29e ip xfrm: use correct key length for netlink message
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.

We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-10-01 13:44:38 -07:00
Yulia Kartseva 73451259da tc: fix ipv6 filter selector attribute for some prefix lengths
Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
2017-10-01 13:41:29 -07:00
Stephen Hemminger f412357017 Merge branch 'master' into net-next 2017-09-29 12:03:16 -07:00
Phil Sutter e4139268ba ip-route: Fix for listing routes with RTAX_LOCK attribute
This fixes a corner-case for routes with a certain metric locked to
zero:

| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0

Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.

Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-29 12:02:09 -07:00
Stephen Hemminger ee7bfb52a7 Merge branch 'master' into net-next 2017-09-29 10:51:25 -07:00
Stephen Hemminger b2fd7a0e6e doc: drop old ip command documentation
The old IP cross reference manual was very out of date, barely updated
since 1999.  The correct documentation is in the man pages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:51:02 -07:00
Julien Fortin 429f314ef7 lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.

The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

How to reproduce:

$ ip -t mon label link
(gdb) bt
.#0  _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1  0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2  0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
    fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
    at libnetlink.c:761
.#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-29 10:10:47 -07:00
Stephen Hemminger a4cda980bb doc: remove outdated IPv6 flow label document
Not updated since Linux 2.2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:06:50 -07:00
Stephen Hemminger bbf2a3634e doc: remove outdated tc-filters documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:05:25 -07:00
Stephen Hemminger fd1aa86741 ignore generated Config file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:02:45 -07:00
Stephen Hemminger 3e83c095e8 doc: remove outdated nstat/rtstat documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:01:15 -07:00
Stephen Hemminger 760e9830fc doc: remove outdated arpd documentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 10:00:12 -07:00
Stephen Hemminger d77ce080d3 doc: remove outdated ss documentation
The current version is well documented on man page.
The latex documentation is very old and was never upated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 09:58:39 -07:00
Stephen Hemminger 1298403e26 doc: remove obsolete ip-tunnels documentation
This file has not been updated since conversion to git
and is really old and outdated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-29 09:58:02 -07:00
Julien Fortin 70556c1632 lib: json_print: rework 'new_json_obj' drop FILE* argument
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.

The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

How to reproduce:

$ ip -t mon label link
(gdb) bt
.#0  _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1  0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2  0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
    fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
    at libnetlink.c:761
.#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-27 09:21:54 +01:00
Stephen Hemminger b7a38c397d Merge branch 'master' into net-next 2017-09-22 10:10:01 -07:00
Thomas Haller 01777e055d man: fix documentation for range of route table ID
Signed-off-by: Thomas Haller <thaller@redhat.com>
2017-09-22 10:09:04 -07:00
Daniel Borkmann bc2d4d838f bpf: properly output json for xdp
After merging net-next branch into master, Stephen asked
to fix up json dump for XDP. Thus, rework the json dump a
bit, such that 'ip -json l' looks as below.

  [{
        "ifindex": 1,
        "ifname": "lo",
        "flags": ["LOOPBACK","UP","LOWER_UP"],
        "mtu": 65536,
        "xdp": {
            "mode": 2,
            "prog": {
                "id": 5,
                "tag": "e1e9d0ec0f55d638",
                "jited": 1
            }
        },
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "loopback",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00"
    },[...]
  ]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-22 10:07:15 -07:00
Daniel Borkmann 0b4b35e1e8 json: move json printer to common library
Move the json printer which is based on json writer into the
iproute2 library, so it can be used by library code and tools
other than ip. Should probably have been done from the beginning
like that given json writer is in the library already anyway.
No functional changes.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-22 10:06:43 -07:00
Stephen Hemminger 58677cc2d3 tc: flower remove unused variable
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:08:43 -07:00
Benjamin LaHaise 7638ee13c1 tc: flower: support for matching MPLS labels
This patch adds support to the iproute2 tc filter command for matching MPLS
labels in the flower classifier.  The ability to match the Time To Live,
Bottom Of Stack, Traffic Control and Label fields are added as options to
the flower filter.

e.g.:
  tc filter add dev eth0 protocol 0x8847 parent ffff: \
    flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
    action drop

Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2017-09-20 18:07:21 -07:00
Julien Fortin 6335c5ff67 ip: ipaddress: fix missing space after prefixlen
Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-09-20 18:05:03 -07:00
Davide Caratti bc6ba66047 tc: fix typo in tc-tcindex man page
fix mis-typed 'pass_on' keyword.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-09-20 18:01:02 -07:00
Stephen Hemminger 44cf841560 BPF: update headers from 4.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:00:55 -07:00
Eric Dumazet ff28b7519d tc: fq: support low_rate_threshold attribute
TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9

Tested:

lpaa5:/tmp# tc -qd add dev eth1 root fq
lpaa5:/tmp# tc -s qd sh dev eth1
qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
 orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
 initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  116 flows (114 inactive, 0 throttled)
  1 gc, 0 highprio, 0 throttled

lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
7081

lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
858

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
2017-09-12 21:33:31 -07:00
Phil Sutter 1cfcf62c68 ipaddress: Fix segfault in 'addr showdump'
Obviously, 'addr showdump' feature wasn't adjusted to json output
support. As a consequence, calls to print_string() in print_addrinfo()
tried to dereference a NULL FILE pointer.

Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-12 21:27:36 -07:00
Arkadi Sharshevsky b2947f8b2c devlink: Add support for protocol IPv4/IPv6/Ethernet special formats
Add support for protocol IPv4/IPv6/Ethernet special formats.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Arkadi Sharshevsky 31639589f3 devlink: Add support for special format protocol headers
In case of global header (protocol header), the header:field ids are used
to perform lookup for special format printer. In case no printer existence
fallback to plain value printing.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Arkadi Sharshevsky 92b2a5bb76 devlink: Make match/action parsing more flexible
This patch decouples the match/action parsing from printing. This is
done as a preparation for adding the ability to print global header
values, for example print IPv4 address, which require special formatting.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-09-07 15:10:25 -07:00
Phil Sutter 50ea3c6438 utils: strlcpy() and strlcat() don't clobber dst
As David Laight correctly pointed out, the first version of strlcpy()
modified dst buffer behind the string copied into it. Fix this by
writing NUL to the byte immediately following src string instead of to
the last byte in dst. Doing so also allows to reduce overhead by using
memcpy().

Improve strlcat() by avoiding the call to strlcpy() if dst string is
already full, not just as sanity check.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-07 15:06:47 -07:00
Stephen Hemminger 01e5409371 Merge branch 'net-next' 2017-09-05 09:48:36 -07:00
Stephen Hemminger 39740278a8 v4.13.0 2017-09-05 09:39:32 -07:00
Stephen Hemminger 4a5b3035de update headers from 4.14 merge
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-05 09:38:31 -07:00
Stephen Hemminger a17a01145f Merge branch 'master' into net-next 2017-09-05 09:33:29 -07:00
Daniel Borkmann a0b5b7cf5c bpf: consolidate dumps to use bpf_dump_prog_info
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.

Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Daniel Borkmann 1b736dc469 bpf: minor cleanups for bpf_trace_pipe
Just minor nits, e.g. no need to fflush() and instead of returning
right away, just break and close the fd.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Simon Horman b75e0f6f4b tc actions: store and dump correct length of user cookies
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel

As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.

Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.

Before:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 RTNETLINK answers: Invalid argument

After:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 # tc filter show dev eth0 ingress
   eth_type ipv4
   ip_proto udp
   not_in_hw
	 action order 1: gact action drop
	  random type none pass val 0
	  index 1 ref 1 bind 1 installed 1 sec used 1 sec
	 Action statistics:
	 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	 backlog 0b 0p requeues 0
	 cookie len 16 0123456789abcdef0123456789abcdef

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-09-05 09:25:46 -07:00
Phil Sutter 7c87c7fed1 lib/bpf: Fix bytecode-file parsing
The signedness of char type is implementation dependent, and there are
architectures on which it is unsigned by default. In that case, the
check whether fgetc() returned EOF failed because the return value was
assigned an (unsigned) char variable prior to comparison with EOF (which
is defined to -1). Fix this by using int as type for 'c' variable, which
also matches the declaration of fgetc().

While being at it, fix the parser logic to correctly handle multiple
empty lines and consecutive whitespace and tab characters to further
improve the parser's robustness. Note that this will still detect double
separator characters, so doesn't soften up the parser too much.

Fixes: 3da3ebfca8 ("bpf: Make bytecode-file reading a little more robust")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-04 12:06:50 -07:00
Stephen Hemminger 731e28cc28 Merge branch 'master' into net-next 2017-09-01 14:15:31 -07:00
Michal Kubecek 460c03f3f3 iplink: double the buffer size also in iplink_get()
Commit 72b365e8e0 ("libnetlink: Double the dump buffer size") increased
the buffer size for "ip link show" command to 32 KB to handle NICs with
large number of VFs. With "dev" filter, a different code path is taken and
iplink_get() still uses only 16 KB buffer.

The size of 32768 is not very future-proof as NICs supporting 120-128 VFs
are already in use so that single RTM_NEWLINK message in the dump can
exceed 30000 bytes. But it's what rtnl_talk() and rtnl_dump_filter_l() use
so let's be consistent. Once this proves insufficient, all three sizes
should be increased.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-09-01 14:15:00 -07:00
Michal Kubecek 6599162b95 iplink: check for message truncation in iplink_get()
If message length exceeds maxlen argument of rtnl_talk(), it is truncated
to maxlen but unlike in the case of truncation to the length of local
buffer in rtnl_talk(), the caller doesn't get any indication of a problem.

In particular, iplink_get() passes the truncated message on and parsing it
results in various warnings and sometimes even a segfault (observed with
"ip link show dev ..." for a NIC with 125 VFs).

Handle message truncation in iplink_get() the same way as truncation in
rtnl_talk() would be handled: return an error.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-09-01 14:15:00 -07:00
Stephen Hemminger 2e706e12d9 Merge branch 'master' into net-next
Needed to add JSON support to tclass.
2017-09-01 12:17:48 -07:00
Phil Sutter bc4a57b879 lnstat_util: Make sure buffer is NUL-terminated
Can't use strlcpy() here since lnstat is not linked against libutil.

While being at it, fix coding style in that chunk as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 9376314b49 tc_util: No need to terminate an snprintf'ed buffer
snprintf() won't leave the buffer unterminated, so manually terminating
is not necessary here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 44cc6c792a ipxfrm: Replace STRBUF_CAT macro with strlcat()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 532b8874fe Convert harmful calls to strncpy() to strlcpy()
This patch converts spots where manual buffer termination was missing to
strlcpy() since that does what is needed.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 18f156bfec Convert the obvious cases to strlcpy()
This converts the typical idiom of manually terminating the buffer after
a call to strncpy().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 8d15e012a3 utils: Implement strlcpy() and strlcat()
By making use of strncpy(), both implementations are really simple so
there is no need to add libbsd as additional dependency.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 50f81afd4d link_gre6: Print the tunnel's tclass setting
Print the value analogous to flowlabel. While being at it, also break
the overlong lines to not exceed 80 characters boundary.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:09:42 -07:00
Phil Sutter e7fefb3214 link_gre6: Fix for changing tclass/flowlabel
When trying to change tclass or flowlabel of a GREv6 tunnel which has
the respective value set already, the code accidentally bitwise OR'ed
the old and the new value, leading to unexpected results. Fix this by
clearing the relevant bits of flowinfo variable prior to assigning the
new value.

Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:09:42 -07:00
David Lebrun 9d563d52f6 man: add documentation for seg6 l2encap mode
This patch adds documentation for the seg6 L2ENCAP encapsulation mode.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-30 08:29:36 -07:00
David Lebrun cf87da417b iproute: add support for seg6 l2encap mode
This patch adds support for the L2ENCAP seg6 mode, enabling to encapsulate
L2 frames within SRv6 packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-30 08:29:36 -07:00
Alexander Aring 3ee52855a0 man: tc-ife: add default type note
This patch updates the tc-ife man page that the default IFE ethertype
will be used if it's not specified.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring 38060de1eb tc: m_ife: report about kernels default type
This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring 664f35aa7c tc: m_ife: print IEEE ethertype format
This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring bf338b60d4 tc: m_ife: allow ife type to zero
This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Stephen Hemminger 8707fd8c93 update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-30 08:26:43 -07:00
Stephen Hemminger c5e2692b66 Merge branch 'master' into net-next 2017-08-30 08:24:57 -07:00
Phil Sutter 6c6bbc30f4 ss: Fix for added diag support check
Commit 9f66764e30 ("libnetlink: Add test for error code returned from
netlink reply") changed rtnl_dump_filter_l() to return an error in case
NLMSG_DONE would contain one, even if it was ENOENT.

This in turn breaks ss when it tries to dump DCCP sockets on a system
without support for it: The function tcp_show(), which is shared between
TCP and DCCP, will start parsing /proc since inet_show_netlink() returns
an error - yet it parses /proc/net/tcp which doesn't make sense for DCCP
sockets at all.

On my system, a call to 'ss' without further arguments prints the list
of connected TCP sockets twice.

Fix this by introducing a dedicated function dccp_show() which does not
have a fallback to /proc, just like sctp_show(). And since tcp_show()
is no longer "multi-purpose", drop it's socktype parameter.

Fixes: 9f66764e30 ("libnetlink: Add test for error code returned from netlink reply")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-30 08:18:13 -07:00
Stephen Hemminger b43b5b9acc devlink: header update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:31:57 -07:00
Stephen Hemminger f474588028 Merge branch 'master' into net-next 2017-08-24 15:30:32 -07:00
Stephen Hemminger c4fc474b88 tc: use named initializer for default mqprio options
Use C99 initializer

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:28:15 -07:00
Phil Sutter 893deac4c4 lib/libnetlink: Don't pass NULL parameter to memcpy()
Both addattr_l() and rta_addattr_l() may be called with NULL data
pointer and 0 alen parameters. Avoid calling memcpy() in that case.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter ac3415f5c1 lib/fs: Fix and simplify make_path()
Calling stat() before mkdir() is racey: The entry might change in
between. Also, the call to stat() seems to exist only to check if the
directory exists already. So simply call mkdir() unconditionally and
catch only errors other than EEXIST.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter b5c78e1b2c lib/bpf: Check return value of write()
This is merely to silence the compiler warning. If write to stderr
failed, assume that printing an error message will fail as well so don't
even try.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 92963d136d netem/maketable: Check return value of fscanf()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 0aa03350c0 ss: Make sure scanned index value to unix_state_map is sane
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:10 -07:00
Phil Sutter 4cbf5224f2 ss: Make struct tcpstat fields 'timer' and 'timeout' unsigned
Both 'timer' and 'timeout' variables of struct tcpstat are either
scanned as unsigned values from /proc/net/tcp{,6} or copied from
'idiag_timer' and 'idiag_expries' fields of struct inet_diag_msg, which
itself are unsigned. Therefore they may be unsigned as well, which
eliminates the need to check for negative values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 15:22:09 -07:00
Stephen Hemminger 0b5eadc54f bpf: drop unused parameter to bpf_report_map_in_map
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:58 -07:00
Stephen Hemminger 0efa625765 libnetlink: drop unused parameter to rtnl_dump_done
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:48 -07:00
Stephen Hemminger 8f478ec2b3 rdma: fix duplicate initialization in port_names
Build with warnings enable spotted this.
link.c:51:58: note: (near initialization for ‘rdma_port_names[23]’)
   rdma_port_names[] = { RDMA_PORT_FLAGS(RDMA_BITMAP_NAMES) };

Assume that fields were in order and 25 is the missing value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:02:16 -07:00
Phil Sutter 4b9e917822 lib/ll_map: Choose size of new cache items at run-time
Instead of having a fixed buffer of 16 bytes for the interface name,
tailor size of new ll_cache entry using the interface name's actual
length. This also makes sure the following call to strcpy() is safe.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 56270e5466 tc/m_xt: Fix for potential string buffer overflows
- Use strncpy() when writing to target->t->u.user.name and make sure the
  final byte remains untouched (xtables_calloc() set it to zero).
- 'tname' length sanitization was completely wrong: If it's length
  exceeded the 16 bytes available in 'k', passing a length value of 16
  to strncpy() would overwrite the previously NULL'ed 'k[15]'. Also, the
  sanitization has to happen if 'tname' is exactly 16 bytes long as
  well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter bc27878d21 lnstat_util: Simplify alloc_and_open() a bit
Relying upon callers and using unsafe strcpy() is probably not the best
idea. Aside from that, using snprintf() allows to format the string for
lf->path in one go.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter cfda500a7d lib/inet_proto: Review inet_proto_{a2n,n2a}()
The original intent was to make sure strings written by those functions
are NUL-terminated at all times, though it was suggested to get rid of
the 15 char protocol name limit as well which this patch accomplishes.

In addition to that, simplify inet_proto_a2n() a bit: Use the error
checking in get_u8() to find out whether passed 'buf' contains a valid
decimal number instead of checking the first character's value manually.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter eab4507898 lib/fs: Fix format string in find_fs_mount()
A field width of 4096 allows fscanf() to store that amount of characters
into the given buffer, though that doesn't include the terminating NULL
byte. Decrease the value by one to leave space for it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 45c2ec9e95 ipntable: Avoid memory allocation for filter.name
The original issue was that filter.name might end up unterminated if
user provided string was too long. But in fact it is not necessary to
copy the commandline parameter at all: just make filter.name point to it
instead.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 70a6df3962 tipc/bearer: Prevent NULL pointer dereference
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter 75716932a0 tc/tc_filter: Make sure filter name is not empty
The later check for 'k[0] != 0' requires a non-empty filter name,
otherwise NULL pointer dereference in 'q' might happen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter a754de3ccd tc/q_netem: Don't dereference possibly NULL pointer
Assuming 'opt' might be NULL, move the call to RTA_PAYLOAD to after the
check since it dereferences its parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter 6d02518fdc ifstat, nstat: Check fdopen() return value
Prevent passing NULL FILE pointer to fgets() later.

Fix both tools in a single patch since the code changes are basically
identical.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:43 -07:00
Andreas Henriksson ae4e21c93f ss: fix help/man TCP-STATE description for listening
There's some misleading information in --help and ss(8) manpage about
TCP-STATE named 'listen'.
ss doesn't know such a state, but it knows 'listening' state.

$ ss -tua state listen
ss: wrong state name: listen

$ ss -tua state listening
[...]

Addresses: https://bugs.debian.org/872990
Reported-by: Pavel Lyulchenko <p.lyulchenko@gmail.com>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2017-08-24 11:01:34 -07:00
William Tu 9a1381d509 gre: add support for ERSPAN tunnel
The patch adds ERSPAN type II tunnel support. The implementation is
based on the draft at
 https://tools.ietf.org/html/draft-foschiano-erspan-01.

One of the purposes is for Linux box to be able to receive ERSPAN
monitoring traffic sent from the Cisco switch, by creating a ERSPAN
tunnel device. In addition, the patch also adds ERSPAN TX, so traffic
can also be encapsulated into ERSPAN and sent out.

The implementation reuses the key as ERSPAN session ID, and
field 'erspan' as ERSPAN Index fields:
./ip link add dev ers11 type erspan seq key 100 erspan 123 \
		local 172.16.1.200 remote 172.16.1.100

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
2017-08-23 10:06:54 -07:00
Stephen Hemminger fb14560b76 add ERSPAN headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:05:08 -07:00
Stephen Hemminger 5f1df307b4 config: put CFLAGS/LDLIBS in config.mk
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.

Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:03:09 -07:00
Stephen Hemminger 51186362ba Merge branch 'master' into net-next 2017-08-21 17:37:15 -07:00
Phil Sutter c3724e4bc3 lib/bpf: Don't leak fp in bpf_find_mntpt()
If fopen() succeeded but len != PATH_MAX, the function leaks the open
FILE pointer. Fix this by checking len value before calling fopen().

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-21 17:35:07 -07:00
Phil Sutter 6e33f7b0f6 devlink: Check return code of strslashrsplit()
This function shouldn't fail because all callers of
__dl_argv_handle_port() make sure the passed string contains enough
slashes already, but better make sure if this changes in future the
function won't access uninitialized data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:28:03 -07:00
Phil Sutter 84b6a3f4b5 iplink_vrf: Complain if main table is not found
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsahern@gmail.com>
2017-08-21 17:28:03 -07:00
Phil Sutter 7c66d89828 iproute: Check mark value input
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:28:03 -07:00
Phil Sutter 82ed9ffa2b tc/q_multiq: Don't pass garbage in TCA_OPTIONS
multiq_parse_opt() doesn't change 'opt' at all. So at least make sure
it doesn't fill TCA_OPTIONS attribute with garbage from stack.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter d304b05c12 netem/maketable: Check return value of fstat()
Otherwise info.st_size may contain garbage.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter 301826beb3 ss: Use C99 initializer in netlink_show_one()
This has the additional benefit of initializing st.ino to zero which is
used later in is_sctp_assoc() function.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter b48a1161f5 ipmaddr: Avoid accessing uninitialized data
Looks like this can only happen if /proc/net/igmp is malformed, but
better be sure.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter 258b7c0fa7 iplink_can: Prevent overstepping array bounds
can_state_names array contains at most CAN_STATE_MAX fields, so allowing
an index to it to be equal to that number is wrong. While here, also
make sure the array is indeed that big so nothing bad happens if
CAN_STATE_MAX ever increases.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Phil Sutter d044ea3e78 ipaddress: Avoid accessing uninitialized variable lcl
If no address was given, ipaddr_modify() accesses uninitialized data
when assigning to req.ifa.ifa_prefixlen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:17:00 -07:00
Stephen Hemminger a4b8e88d87 Merge branch 'master' into net-next 2017-08-21 17:14:19 -07:00
Phil Sutter 73aa988868 tc/m_gact: Drop dead code
The use of 'ok' variable in parse_gact() is ineffective: The second
conditional increments it either if *argv is 'gact' or if
parse_action_control() doesn't fail (in which case exit() is called).
So this is effectively an unconditional increment and since no decrement
happens anywhere, all remaining checks for 'ok != 0' can be dropped.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter e469523e8e ss: Drop useless assignment
After '*b = *a', 'b->next' already has the same value as 'a->next'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 44448a90ea ss: Skip useless check in parse_hostcond()
The passed 'addr' parameter is dereferenced by caller before and in
parse_hostcond() multiple times before this check, so assume it is
always true.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter b3c5f84493 lib/rt_names: Drop dead code in rtnl_rttable_n2a()
Since 'id' is 32bit unsigned, it can never exceed RT_TABLE_MAX (which is
defined to 0xFFFFFFFF). Therefore drop that never matching conditional.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 2a86625619 iproute: Fix for missing 'Oifs:' display
Covscan complained about dead code but after reading it, I assume the
author's intention was to prefix the interface list with 'Oifs: '.
Initializing first to 1 and setting it to 0 after above prefix was
printed should fix it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 2869262144 ipntable: No need to check and assign to parms_rta
This variable is initialized at declaration and nowhere else does any
assignment to it happen, so just drop the check.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-21 17:12:21 -07:00
Phil Sutter 8579a398c5 devlink: No need for this self-assignment
dl_argv_handle_both() will either assign to handle_bit or error out in
which case the variable is not used by the caller.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-08-21 17:12:21 -07:00
Leon Romanovsky dbc76eb6cc rdma: Add initial manual for the tool
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 7fc75744c0 rdma: Add json output to link object
An example for the JSON output for two devices system.

root@mtr-leonro:~# rdma link -d -p -j
[{
        "ifindex": 1,
        "port": 1,
        "ifname": "mlx5_0/1",
        "subnet_prefix": "fe80:0000:0000:0000",
        "lid": 13399,
        "sm_lid": 49151,
        "lmc": 0,
        "state": "ACTIVE",
        "physical_state": "LINK_UP",
        "caps": ["AUTO_MIG"
        ]
    },{
        "ifindex": 2,
        "port": 1,
        "ifname": "mlx5_1/1",
        "subnet_prefix": "fe80:0000:0000:0000",
        "lid": 13400,
        "sm_lid": 49151,
        "lmc": 0,
        "state": "ACTIVE",
        "physical_state": "LINK_UP",
        "caps": ["AUTO_MIG"
        ]
    }
]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky ef353e2e94 rdma: Implement json output for dev object
The example output for machine with two devices

root@mtr-leonro:~# rdma dev -j -p
[{
	"ifindex": 1,
	"ifname": "mlx5_0",
	"node_type": "ca",
	"fw": "2.8.9999",
	"node_guid": "5254:00c0:fe12:3457",
	"sys_image_guid": 5254:00c0:fe12:3457",
	"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
		  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
		  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
		  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
		  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
		  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
	},{
	"ifindex": 2,
	"ifname": mlx5_1,
	"node_type": "ca",
	"fw": "2.8.9999",
	"node_guid": "5254:00c0:fe12:3458",
	"sys_image_guid": "5254:00c0:fe12:3458",
	"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
		  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
		  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
		  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
		  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
		  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
	}
]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky ab6e2b7bdb rdma: Add json and pretty outputs
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky da990ab40a rdma: Add link object
Link (port) object represent struct ib_port to the user space.

Link properties:
 * Port capabilities
 * IB subnet prefix
 * LID, SM_LID and LMC
 * Port state
 * Physical state

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 40df8263a0 rdma: Add dev object
Device (dev) object represents struct ib_device to the user space.

Device properties:
 * Device capabilities
 * FW version to the device output
 * node_guid and sys_image_guid
 * node_type

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky 74bd75c2b6 rdma: Add basic infrastructure for RDMA tool
RDMA devices are cross-functional devices from one side,
but very tailored for the specific markets from another.

Such diversity caused to spread of RDMA related configuration
across various tools, e.g. devlink, ip, ethtool, ib specific and
vendor specific solutions.

This patch adds ability to fill device and port information
by reading RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Leon Romanovsky afdc119410 utils: Move BIT macro to common header
BIT() macro was implemented and used by devlink for now, but following
patches of rdmatool will reuse the same macro, so put it in common
header file.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-21 17:07:44 -07:00
Stephen Hemminger 18d7817c60 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-21 16:55:15 -07:00
Stephen Hemminger fa93d9a8aa Merge branch 'master' into net-next 2017-08-18 09:43:00 -07:00
Phil Sutter be55416add tipc/bearer: Fix resource leak in error path
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 46131577cf ss: Fix potential memleak in unix_stats_print()
Fixes: 2d0e538f3e ("ss: Drop list traversal from unix_stats_print()")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter b530cef0e3 ifstat: Fix memleak in dump_kern_db() for json output
Looks like this was forgotten when converting to common json output
formatter.

Fixes: fcc16c2287 ("provide common json output formatter")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 35f6adefb8 ifstat: Fix memleak in error case
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 6ac5943bdd ipvrf: Fix error path of vrf_switch()
Apart from trying to close(-1), this also leaked memory.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:39:52 -07:00
Phil Sutter 3e587d9f43 tc/em_ipset: Don't leak sockfd on error path
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:16:59 -07:00
Phil Sutter 4b45ae221e ss: Don't leak fd in tcp_show_netlink_file()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:16:59 -07:00
Phil Sutter 08806fb019 iproute_lwtunnel: csum_mode value checking was ineffective
ila_csum_name2mode() returning -1 on error but being declared as
returning __u8 doesn't make much sense. Change the code to correctly
detect this issue. Checking for __u8 overruns shouldn't be necessary
though since ila_csum_name2mode() return values are well-defined.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 58a15e6c7e iproute_lwtunnel: Argument to strerror must be positive
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 436270a45d tipc/node: Fix socket fd check in cmd_node_get_addr()
socket() returns -1 on error, not 0.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:13:17 -07:00
Phil Sutter 1e3197e0fd ifcfg: Quote left-hand side of [ ] expression
This prevents word-splitting and therefore leads to more accurate error
message in case 'grep -c' prints something other than a number.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:11:00 -07:00
Phil Sutter 2313b6bfe4 examples: Some shell fixes to cbq.init
This addresses the following issues:

- $@ is an array, so don't use it in quoted strings - use $* instead.

- Add missing quotes to components of [ ] expressions. These are not
  strictly necessary since the output of 'wc -l' should be a single word
  only, but in case of errors, bash prints "integer expression expected"
  instead of "too many arguments".

- Use -print0/-0 when piping from find to xargs to allow for filenames
  which contain whitespace.

- Quote arguments to 'eval' to prevent word-splitting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-18 09:11:00 -07:00
David Ahern e5fa0e6fe7 libnetlink: Fix extack attribute parsing
Initialize tb in nl_dump_ext_err since not all attributes will be
sent in the messages.

Add error checking on mnl_attr_parse and print messages on the off
chance the ext ack attributes fail to validate.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-18 08:47:34 -07:00
Julien Fortin 43bc20ae73 ip: iplink_vlan.c: add json output support
Schema:
{
    "protocol": {
        "type": "string",
        "attr": "IFLA_VLAN_PROTOCOL"
    },
    "id": {
        "type": "uint",
        "attr": "IFLA_VLAN_ID"
    },
    "flags": {
        "type": "array",
        "attr": "IFLA_VLAN_FLAGS",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "ingress_qos": {
        "type": "array",
        "attr": "IFLA_VLAN_INGRESS_QOS",
        "array": [
            {
                "type": "dict",
                "dict": {
                    "from": {
                        "type": "uint"
                    },
                    "to": {
                        "type": "uint"
                    }
                }
            }
        ]
    },
    "egress_qos": {
        "type": "array",
        "attr": "IFLA_VLAN_EGRESS_QOS",
        "array": [
            {
                "type": "dict",
                "dict": {
                    "from": {
                        "type": "uint"
                    },
                    "to": {
                        "type": "uint"
                    }
                }
            }
        ]
    }
}

$ ip link add name eth0.42 link eth0 type vlan id 42
$ ip -details -json link show
[{
        "ifindex": 30,
        "ifname": "eth0.42",
        "link": "eth0",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:db:31:88",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vlan",
            "info_data": {
                "protocol": "802.1Q",
                "id": 42,
                "flags": ["REORDER_HDR"]
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 92b7454c31 ip: link_macvlan.c: add json output support
Schema:
{
    "mode": {
        "type": "string",
        "attr": "IFLA_MACVLAN_MODE"
    },
    "nopromisc": {
        "type": "bool",
        "attr": "MACVLAN_FLAG_NOPROMISC"
    },
    "macaddr_count": {
        "type": "int",
        "attr": "IFLA_MACVLAN_MACADDR_COUNT"
    },
    "macaddr_data": {
        "type": "array",
        "attr": "IFLA_MACVLAN_MACADDR_DATA",
        "array": [
            {
                "type": "string"
            }
        ]
    },
}

$ ip link add name peth0 link eth0 type macvlan
$ ip -details -json link show peth0
[{
        "ifindex": 26,
        "ifname": "peth0",
        "link": "eth0",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "7a:84:48:3e:7b:1c",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "macvlan",
            "info_data": {
                "mode": "vepa"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 063dd06cc1 ip: link_vti6.c: add json output support
Schema:
{
    "remote": {
        "type": "string",
        "attr": "IFLA_VTI_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VTI_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VTI_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_VTI_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_VTI_OKEY"
    }
}

➜  ~ ip -6 tunnel add name vti6 mode vti6 local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
➜  ~ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
11: ip6_vti0@NONE: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
12: vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default
    link/tunnel6 2001:db8:1::1 peer 2001:db8:85a3::8a2e:370:7334
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6_vti0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti6",
            "info_data": {
                "remote": "::",
                "local": "::",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "vti6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "2001:db8:1::1",
        "link_pointtopoint": true,
        "broadcast": "2001:db8:85a3::8a2e:370:7334",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti6",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 4c42a1c103 ip: link_vti.c: add json output support
Schema:
{
    "remote": {
        "type": "string",
        "attr": "IFLA_VTI_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VTI_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VTI_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_VTI_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_VTI_OKEY"
    }
}

$ ip tunnel add vti0 mode vti local 192.0.2.1 remote 198.51.100.3
$ ip link show
10: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop state DOWN mode DEFAULT group
default
    link/ipip 0.0.0.0 brd 0.0.0.0
11: vti0@NONE: <POINTOPOINT,NOARP> mtu 1428 qdisc noop state DOWN mode
DEFAULT group default
    link/ipip 192.0.2.1 peer 198.51.100.3
$ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip_vti0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1428,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "vti0",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1428,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "192.0.2.1",
        "link_pointtopoint": true,
        "broadcast": "198.51.100.3",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vti",
            "info_data": {
                "remote": "198.51.100.3",
                "local": "192.0.2.1",
                "ikey": "0.0.0.0",
                "okey": "0.0.0.0"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 2539a407bb ip: link_iptnl.c: add json output support
Schema
{
    "remote": {
        "type": "string",
        "attr": "IFLA_IPTUN_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_IPTUN_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_IPTUN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_IPTUN_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_IPTUN_TOS"
    },
    "pmtudisc": {
        "type": "bool",
        "attr": "IFLA_IPTUN_PMTUDISC"
    },
    "isatap": {
        "type": "bool",
        "attr": "SIT_ISATAP & IFLA_IPTUN_FLAGS"
    },
    "6rd": {
        "type": "dict",
        "attr": "IFLA_IPTUN_6RD_PREFIXLEN",
        "dict": {
            "prefix": {
                "type": "string"
            },
            "prefixlen": {
                "type": "uint",
                "attr": "IFLA_IPTUN_6RD_PREFIXLEN"
            },
            "relay_prefix": {
                "type": "string"
            },
            "relay_prefixlen": {
                "type": "uint",
                "attr": "IFLA_IPTUN_6RD_PREFIXLEN"
            }
        }
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_IPTUN_ENCAP_TYPE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_IPTUN_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip tunnel add tun0 mode ipip local 192.0.2.1 remote 198.51.100.3
$ ip link show
10: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group
default
    link/ipip 0.0.0.0 brd 0.0.0.0
11: tun0@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
DEFAULT group default
    link/ipip 192.0.2.1 peer 198.51.100.3
$ ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "tunl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1480,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ipip",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "tun0",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1480,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ipip",
        "address": "192.0.2.1",
        "link_pointtopoint": true,
        "broadcast": "198.51.100.3",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ipip",
            "info_data": {
                "remote": "198.51.100.3",
                "local": "192.0.2.1",
                "ttl": 0,
                "pmtudisc": true
            }
        },
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 1facc1c61c ip: link_ip6tnl.c: add json output support
Schema
{
    "proto": {
        "type": "string",
        "attr": "IFLA_IPTUN_PROTO"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_IPTUN_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_IPTUN_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_IPTUN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ip6_tnl_f_ign_encap_limit": {
        "type": "bool",
        "attr": "IP6_TNL_F_IGN_ENCAP_LIMIT"
    },
    "encap_limit": {
        "type": "uint",
        "attr": "IFLA_IPTUN_ENCAP_LIMIT"
    },
    "ttl": {
        "type": "uint",
        "attr": "IFLA_IPTUN_TTL"
    },
    "ip6_tnl_f_use_orig_tclass": {
        "type": "",
        "attr": "IP6_TNL_F_USE_ORIG_TCLASS"
    },
    "flowinfo_tclass": {
        "type": "string",
        "attr": "IP6_FLOWINFO_TCLASS"
    },
    "ip6_tnl_f_use_orig_flowlabel": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FLOWLABEL"
    },
    "flowlabel": {
        "type": "string",
        "attr": "IP6_FLOWINFO_FLOWLABEL"
    },
    "flowinfo": {
        "type": "string"
    },
    "ip6_tnl_f_rcv_dscp_copy": {
        "type": "bool",
        "attr": "IP6_TNL_F_RCV_DSCP_COPY"
    },
    "ip6_tnl_f_mip6_dev": {
        "type": "bool",
        "attr": "IP6_TNL_F_MIP6_DEV"
    },
    "ip6_tnl_f_use_orig_fwmark": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FWMARK"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_IPTUN_ENCAP_TYPE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_IPTUN_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_IPTUN_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip -6 tunnel add name tun6 mode ip6gre local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
$ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group
default
    link/tunnel6 :: brd ::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group
default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
12: tun6@NONE: <POINTOPOINT,NOARP> mtu 1448 qdisc noop state DOWN mode
DEFAULT group default
    link/gre6 20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01 peer
20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "encap_limit": 0,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01",
        "link_pointtopoint": true,
        "broadcast": "20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ttl": 64,
                "encap_limit": 4,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 6856fb6548 ip: link_gre6.c: add json output support
Schema
{
    "remote": {
        "type": "string",
        "attr": "IFLA_GRE_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_GRE_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_GRE_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GRE_TTL"
    },
    "ip6_tnl_f_ign_encap_limit": {
        "type": "bool",
        "attr": "IP6_TNL_F_IGN_ENCAP_LIMIT"
    },
    "encap_limit": {
        "type": "int",
        "attr": "IFLA_GRE_ENCAP_LIMIT"
    },
    "ip6_tnl_f_use_orig_flowlabel": {
        "type": "bool",
        "attr": "IP6_TNL_F_USE_ORIG_FLOWLABEL"
    },
    "flowlabel": {
        "type": "string",
        "attr": "IP6_FLOWINFO_FLOWLABEL"
    },
    "ip6_tnl_f_rcv_dscp_copy": {
        "type": "bool",
        "attr": "IP6_TNL_F_RCV_DSCP_COPY"
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_GRE_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_GRE_OKEY"
    },
    "iseq": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_SEQ"
    },
    "oseq": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_SEQ"
    },
    "icsum": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_CSUM"
    },
    "ocsum": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_CSUM"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_GRE_ENCAP_TYPE != TUNNEL_ENCAP_NONE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_GRE_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip -6 tunnel add name tun6 mode ip6gre local 2001:db8:1::1/64 remote
2001:0db8:85a3:0000:0000:8a2e:0370:7334
$ ip link show
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT
group default
    link/tunnel6 :: brd ::
11: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT
group default
    link/gre6 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 brd
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
12: tun6@NONE: <POINTOPOINT,NOARP> mtu 1448 qdisc noop state DOWN mode
DEFAULT group default
    link/gre6 20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01 peer
20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34
➜  ~ ./ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "ip6tnl0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1452,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "tunnel6",
        "address": "::",
        "broadcast": "::",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6tnl",
            "info_data": {
                "proto": "ip6ip6",
                "remote": "::",
                "local": "::",
                "encap_limit": 0,
                "ttl": 0,
                "flowinfo_tclass": "0x00",
                "flowlabel": "0x00000",
                "flowinfo": "0x00000000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "ip6gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "encap_limit": 0,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun6",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1448,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre6",
        "address": "20:01:0d:b8:00:01:00:00:00:00:00:00:00:00:00:01",
        "link_pointtopoint": true,
        "broadcast": "20:01:0d:b8:85:a3:00:00:00:00:8a:2e:03:70:73:34",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "ip6gre",
            "info_data": {
                "remote": "2001:db8:85a3::8a2e:370:7334",
                "local": "2001:db8:1::1",
                "ttl": 64,
                "encap_limit": 4,
                "flowlabel": "0x00000"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin e2d4588331 ip: link_gre.c: add json output support
Schema
{
    "external": {
        "type": "bool",
        "comment": "!tb[IFLA_GRE_COLLECT_METADATA]"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_GRE_REMOTE"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_GRE_LOCAL"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_GRE_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
            }
        }
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GRE_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_GRE_TOS"
    },
    "pmtudisc": {
        "type": "bool",
        "attr": "IFLA_GRE_PMTUDISC"
    },
    "ikey": {
        "type": "string",
        "attr": "IFLA_GRE_IKEY"
    },
    "okey": {
        "type": "string",
        "attr": "IFLA_GRE_OKEY"
    },
    "iseq": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_SEQ"
    },
    "oseq": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_SEQ"
    },
    "icsum": {
        "type": "bool",
        "attr": "IFLA_GRE_IFLAGS & GRE_CSUM"
    },
    "ocsum": {
        "type": "bool",
        "attr": "IFLA_GRE_OFLAGS & GRE_CSUM"
    },
    "ignore_df": {
        "type": "bool",
        "attr": "IFLA_GRE_IGNORE_DF"
    },
    "encap": {
        "type": "dict",
        "attr": "IFLA_GRE_ENCAP_TYPE != TUNNEL_ENCAP_NONE",
        "dict": {
            "type": {
                "type": "string",
                "attr": "IFLA_GRE_ENCAP_TYPE"
            },
            "sport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_SPORT"
            },
            "dport": {
                "type": "uint",
                "attr": "IFLA_GRE_ENCAP_DPORT"
            },
            "csum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM"
            },
            "csum6": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_CSUM6"
            },
            "remcsum": {
                "type": "bool",
                "attr": "TUNNEL_ENCAP_FLAG_REMCSUM"
            }
        }
    }
}

$ ip link show
$ ip tunnel add tun42 mode gre local 192.0.2.42 remote 203.0.113.42 key 42
$ ip link show
10: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group
default
    link/gre 0.0.0.0 brd 0.0.0.0
11: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
12: tun42@NONE: <POINTOPOINT,NOARP> mtu 1472 qdisc noop state DOWN mode
DEFAULT group default
    link/gre 192.0.2.42 peer 203.0.113.42
$ ip -details -json link show
[{
        "ifindex": 10,
        "ifname": "gre0",
        "link": null,
        "flags": ["NOARP"],
        "mtu": 1476,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre",
        "address": "0.0.0.0",
        "broadcast": "0.0.0.0",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gre",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "gretap0",
        "link": null,
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1462,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "00:00:00:00:00:00",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gretap",
            "info_data": {
                "remote": "any",
                "local": "any",
                "ttl": 0,
                "pmtudisc": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "tun42",
        "link": null,
        "flags": ["POINTOPOINT","NOARP"],
        "mtu": 1472,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "gre",
        "address": "192.0.2.42",
        "link_pointtopoint": true,
        "broadcast": "203.0.113.42",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "gre",
            "info_data": {
                "remote": "203.0.113.42",
                "local": "192.0.2.42",
                "ttl": 0,
                "pmtudisc": true,
                "ikey": "0.0.0.42",
                "okey": "0.0.0.42"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin c339834682 ip: ipmacsec.c: add json output support
Schema
{
    "sci": {
        "type": "string",
        "attr": "IFLA_MACSEC_SCI"
    },
    "protect": {
        "type": "string",
        "attr": "IFLA_MACSEC_PROTECT"
    },
    "cipher_suite": {
        "type": "string",
        "attr": "IFLA_MACSEC_CIPHER_SUITE"
    },
    "icv_len": {
        "type": "uint",
        "attr": "IFLA_MACSEC_ICV_LEN"
    },
    "encoding_sa": {
        "type": "uint",
        "attr": "IFLA_MACSEC_ENCODING_SA"
    },
    "validation": {
        "type": "string",
        "attr": "IFLA_MACSEC_VALIDATION"
    },
    "encrypt": {
        "type": "string",
        "attr": "IFLA_MACSEC_ENCRYPT"
    },
    "inc_sci": {
        "type": "string",
        "attr": "IFLA_MACSEC_INC_SCI"
    },
    "es": {
        "type": "string",
        "attr": "IFLA_MACSEC_ES"
    },
    "scb": {
        "type": "string",
        "attr": "IFLA_MACSEC_SCB"
    },
    "replay_protect": {
        "type": "string",
        "attr": "IFLA_MACSEC_REPLAY_PROTECT"
    },
    "window": {
        "type": "int",
        "attr": ""
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 2f26065e25 ip: iplink_xdp.c: add json output support
Schema
{
    "attached": {
        "type": "uint",
        "attr": "IFLA_XDP_ATTACHED"
    },
    "prog_id": {
        "type": "uint",
        "attr": "IFLA_XDP_PROG_ID"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 3b98d9b804 ip: iplink_vxlan.c: add json output support
Schema:
{
    "id": {
        "type": "uint",
        "attr": "IFLA_VXLAN_ID"
    },
    "group": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP"
    },
    "group6": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP6"
    },
    "remote6": {
        "type": "string",
        "attr": "IFLA_VXLAN_GROUP6"
    },
    "local": {
        "type": "string",
        "attr": "IFLA_VXLAN_LOCAL"
    },
    "local6": {
        "type": "string",
        "attr": "IFLA_VXLAN_LOCAL6"
    },
    "link": {
        "type": "string",
        "attr": "IFLA_VXLAN_LINK",
        "mutually_exclusive": {
            "link_index": {
                "type": "uint",
                "comment": "if not ifname for ifindex"
            }
        }
    },
    "port_range": {
        "type": "dict",
        "attr": "IFLA_VXLAN_PORT_RANGE",
        "dict": {
            "low": {
                "type": "uint"
            },
            "high": {
                "type": "uint"
            }
        }
    },
    "port": {
        "type": "uint",
        "attr": "IFLA_VXLAN_PORT"
    },
    "learning": {
        "type": "bool",
        "attr": "IFLA_VXLAN_LEARNING"
    },
    "proxy": {
        "type": "bool",
        "attr": "IFLA_VXLAN_PROXY"
    },
    "rsc": {
        "type": "bool",
        "attr": "IFLA_VXLAN_RSC"
    },
    "l2miss": {
        "type": "bool",
        "attr": "IFLA_VXLAN_L2MISS"
    },
    "l3miss": {
        "type": "bool",
        "attr": "IFLA_VXLAN_L3MISS"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_VXLAN_TOS"
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_VXLAN_TTL"
    },
    "label": {
        "type": "string",
        "attr": "IFLA_VXLAN_LABEL"
    },
    "ageing": {
        "type": "uint",
        "attr": "IFLA_VXLAN_AGEING"
    },
    "limit": {
        "type": "uint",
        "attr": "IFLA_VXLAN_LIMIT"
    },
    "udp_csum": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_CSUM"
    },
    "udp_zero_csum6_tx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_ZERO_CSUM6_TX"
    },
    "udp_zero_csum6_rx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_UDP_ZERO_CSUM6_RX"
    },
    "remcsum_tx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_REMCSUM_TX"
    },
    "remcsum_rx": {
        "type": "bool",
        "attr": "IFLA_VXLAN_REMCSUM_RX"
    },
    "collect_metadata": {
        "type": "bool",
        "attr": "IFLA_VXLAN_COLLECT_METADATA"
    },
    "gbp": {
        "type": "bool",
        "attr": "IFLA_VXLAN_GBP"
    },
    "gpe": {
        "type": "bool",
        "attr": "IFLA_VXLAN_GPE"
    }
}

$ ip link add name vxlan42 type vxlan id 42 dev eth0 remote 203.0.113.6
local 192.0.2.1 dstport 4789
$ ip link add name vxlan43 type vxlan id 43 dev eth0 group 239.0.0.1
dstport 4789
$ ip -details -json link show
[{
        "ifindex": 17,
        "ifname": "vxlan42",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1450,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "b2:92:0e:1a:c6:42",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {
                "id": 42,
                "remote": "203.0.113.6",
                "local": "192.0.2.1",
                "link": "eth0",
                "port_range": {
                    "low": 0,
                    "high": 0
                },
                "port": 4789,
                "learning": true,
                "ttl": 0,
                "ageing": 300,
                "udp_csum": false,
                "udp_zero_csum6_tx": false,
                "udp_zero_csum6_rx": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 18,
        "ifname": "vxlan43",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1450,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "c6:51:4d:7f:f9:2f",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {
                "id": 43,
                "group": "239.0.0.1",
                "link": "eth0",
                "port_range": {
                    "low": 0,
                    "high": 0
                },
                "port": 4789,
                "learning": true,
                "ttl": 0,
                "ageing": 300,
                "udp_csum": false,
                "udp_zero_csum6_tx": false,
                "udp_zero_csum6_rx": false
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin d9e84ec27c ip: iplink_vrf.c: add json output support
Schema:
{
    "table": {
        "type": "uint",
        "attr": "IFLA_VRF_TABLE"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 8f24afc9d4 ip: iplink_ipvlan.c: add json output support
Schema:
{
    "mode": {
        "type": "string",
        "attr": "IFLA_IPVLAN_MODE"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 3bec1cf84e ip: iplink_ipoib.c: add json output support
Schema:
{
    "key": {
        "type": "string",
        "attr": "IFLA_IPOIB_PKEY"
    },
    "mode": {
        "type": "string",
        "attr": "IFLA_IPOIB_PKEY"
    },
    "umcast": {
        "type": "string",
        "attr": "IFLA_IPOIB_UMCAST"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 119fae0aa5 ip: iplink_geneve.c: add json output support
Schema:
{
    "id": {
        "type": "uint",
        "attr": "IFLA_GENEVE_ID"
    },
    "remote": {
        "type": "string",
        "attr": "IFLA_GENEVE_REMOTE"
    },
    "remote6": {
        "type": "string",
        "attr": "IFLA_GENEVE_REMOTE6"
    },
    "ttl": {
        "type": "int",
        "attr": "IFLA_GENEVE_TTL"
    },
    "tos": {
        "type": "string",
        "attr": "IFLA_GENEVE_TOS"
    },
    "label": {
        "type": "string",
        "attr": "IFLA_GENEVE_LABEL"
    },
    "port": {
        "type": "uint",
        "attr": "IFLA_GENEVE_PORT"
    },
    "collect_metadata": {
        "type": "bool",
        "attr": "IFLA_GENEVE_COLLECT_METADATA"
    },
    "udp_csum": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_CSUM"
    },
    "udp_zero_csum6_tx": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_ZERO_CSUM6_TX"
    },
    "udp_zero_csum6_rx": {
        "type": "bool",
        "attr": "IFLA_GENEVE_UDP_ZERO_CSUM6_RX"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 529226009f ip: iplink_can.c: add json output support
Schema: IFLA_INFO_DATA
{
    "ctrlmode": {
        "type": "array",
        "attr": "IFLA_CAN_CTRLMODE",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "state": {
        "type": "string",
        "attr": "IFLA_CAN_STATE"
    },
    "berr_counter": {
        "type": "dict",
        "attr": "IFLA_CAN_BERR_COUNTER",
        "dict": {
            "tx": {
                "type": "int"
            },
            "rx": {
                "type": "int"
            }
        }
    },
    "restart_ms": {
        "type": "int",
        "attr": "IFLA_CAN_RESTART_MS"
    },
    "bittiming": {
        "type": "dict",
        "attr": "IFLA_CAN_BITTIMING",
        "dict": {
            "bitrate": {
                "type": "int"
            },
            "sample_point": {
                "type": "float"
            },
            "tq": {
                "type": "int"
            },
            "prop_seg": {
                "type": "int"
            },
            "phase_seg1": {
                "type": "int"
            },
            "phase_seg2": {
                "type": "int"
            },
            "sjw": {
                "type": "int"
            }
        }
    },
    "bittiming_const": {
        "type": "dict",
        "attr": "IFLA_CAN_BITTIMING_CONST",
        "dict": {
            "name": {
                "type": "string"
            },
            "tseg1": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "tseg2": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "sjw": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp_inc": {
                "type": "int"
            }
        }
    },
    "bittiming_bitrate": {
        "type": "uint",
        "attr": "IFLA_CAN_BITTIMING"
    },
    "bitrate_const": {
        "type": "array",
        "attr": "IFLA_CAN_BITRATE_CONST",
        "array": [
            {
                "type": "uint"
            }
        ]
    },
    "data_bittiming": {
        "type": "dict",
        "attr": "IFLA_CAN_DATA_BITTIMING",
        "dict": {
            "bitrate": {
                "type": "int"
            },
            "sample_point": {
                "type": "float"
            },
            "tq": {
                "type": "int"
            },
            "prop_seg": {
                "type": "int"
            },
            "phase_seg1": {
                "type": "int"
            },
            "phase_seg2": {
                "type": "int"
            },
            "sjw": {
                "type": "int"
            }
        }
    },
    "data_bittiming_const": {
        "type": "dict",
        "attr": "IFLA_CAN_DATA_BITTIMING_CONST",
        "dict": {
            "name": {
                "type": "string"
            },
            "tseg1": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "tseg2": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "sjw": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp": {
                "type": "dict",
                "dict": {
                    "min": {
                        "type": "int"
                    },
                    "max": {
                        "type": "int"
                    }
                }
            },
            "brp_inc": {
                "type": "int"
            }
        }
    },
    "data_bittiming_bitrate": {
        "type": "uint",
        "attr": "IFLA_CAN_DATA_BITTIMING"
    },
    "data_bitrate_const": {
        "type": "array",
        "attr": "IFLA_CAN_DATA_BITRATE_CONST",
        "array": [
            {
                "type": "uint"
            }
        ]
    },
    "termination": {
        "type": "unsigned short",
        "attr": "IFLA_CAN_TERMINATION"
    },
    "termination_const": {
        "type": "array",
        "attr": "IFLA_CAN_TERMINATION_CONST",
        "array": [
            {
                "type": "unsigned short"
            }
        ]
    },
    "clock": {
        "type": "int",
        "attr": "IFLA_CAN_CLOCK"
    }
}

IFLA_INFO_XSTATS
{
    "restarts": {
        "type": "int"
    },
    "bus_error": {
        "type": "int"
    },
    "arbitration_lost": {
        "type": "int"
    },
    "error_warning": {
        "type": "int"
    },
    "error_passive": {
        "type": "int"
    },
    "bus_off": {
        "type": "int"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 165a703909 ip: iplink_bridge_slave.c: add json output support
Schema:
bridge_slave: IFLA_INFO_SLAVE_DATA
{
    "state": {
        "type": "string",
        "attr": "IFLA_BRPORT_STATE",
        "mutually_exclusive": {
            "state_index": {
                "type": "uint",
                "comment": "if (state > BR_STATE_BLOCKING)"
            }
        }
    },
    "priority": {
        "type": "int",
        "attr": "IFLA_BRPORT_PRIORITY"
    },
    "cost": {
        "type": "int",
        "attr": "IFLA_BRPORT_COST"
    },
    "mode": {
        "type": "bool",
        "attr": "IFLA_BRPORT_MODE"
    },
    "guard": {
        "type": "bool",
        "attr": "IFLA_BRPORT_GUARD"
    },
    "protect": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROTECT"
    },
    "fast_leave": {
        "type": "bool",
        "attr": "IFLA_BRPORT_FAST_LEAVE"
    },
    "learning": {
        "type": "bool",
        "attr": "IFLA_BRPORT_LEARNING"
    },
    "unicast_flood": {
        "type": "bool",
        "attr": "IFLA_BRPORT_UNICAST_FLOOD"
    },
    "id": {
        "type": "string",
        "attr": "IFLA_BRPORT_ID"
    },
    "no": {
        "type": "string",
        "attr": "IFLA_BRPORT_NO"
    },
    "designated_port": {
        "type": "uint",
        "attr": "IFLA_BRPORT_DESIGNATED_PORT"
    },
    "designated_cost": {
        "type": "uint",
        "attr": "IFLA_BRPORT_DESIGNATED_COST"
    },
    "bridge_id": {
        "type": "string",
        "attr": "IFLA_BRPORT_BRIDGE_ID"
    },
    "root_id": {
        "type": "string",
        "attr": "IFLA_BRPORT_ROOT_ID"
    },
    "hold_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_HOLD_TIMER"
    },
    "message_age_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_MESSAGE_AGE_TIMER"
    },
    "forward_delay_timer": {
        "type": "float",
        "attr": "IFLA_BRPORT_FORWARD_DELAY_TIMER"
    },
    "topology_change_ack": {
        "type": "uint",
        "attr": "IFLA_BRPORT_TOPOLOGY_CHANGE_ACK"
    },
    "config_pending": {
        "type": "uint",
        "attr": "IFLA_BRPORT_CONFIG_PENDING"
    },
    "proxyarp": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROXYARP"
    },
    "proxyarp_wifi": {
        "type": "bool",
        "attr": "IFLA_BRPORT_PROXYARP_WIFI"
    },
    "multicast_router": {
        "type": "uint",
        "attr": "IFLA_BRPORT_MULTICAST_ROUTER"
    },
    "mcast_flood": {
        "type": "bool",
        "attr": "IFLA_BRPORT_MCAST_FLOOD"
    }
}

$ ip link add dev br42 type bridge
$ ip link add dev bond42 type bond
$ ip link set dev bond42 master br42
$ ip link set dev bond42 up
$ ip link set dev br42 up
$ ip -details link show
$ ip -details link show
15: br42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  199.11 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64
16: bond42: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br42 state UNKNOWN mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state forwarding priority 8 cost 100 hairpin off guard
off root_block off fastleave off learning on flood on port_id 0x8001
port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64

$ ip -details -json link show
[{
        "ifindex": 15,
        "ifname": "br42",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
                "forward_delay": 1500,
                "hello_time": 200,
                "max_age": 2000,
                "ageing_time": 30000,
                "stp_state": 0,
                "priority": 32768,
                "vlan_filtering": 0,
                "vlan_protocol": "802.1Q",
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "root_port": 0,
                "root_path_cost": 0,
                "topology_change": 0,
                "topology_change_detected": 0,
                "hello_timer": 0.00,
                "tcn_timer": 0.00,
                "topology_change_timer": 0.00,
                "gc_timer": 298.27,
                "vlan_default_pvid": 1,
                "vlan_stats_enabled": 0,
                "group_fwd_mask": "0",
                "group_addr": "01:80:c2:00:00:00",
                "mcast_snooping": 1,
                "mcast_router": 1,
                "mcast_query_use_ifaddr": 0,
                "mcast_querier": 0,
                "mcast_hash_elasticity": 4096,
                "mcast_hash_max": 4096,
                "mcast_last_member_cnt": 2,
                "mcast_startup_query_cnt": 2,
                "mcast_last_member_intvl": 100,
                "mcast_membership_intvl": 26000,
                "mcast_querier_intvl": 25500,
                "mcast_query_intvl": 12500,
                "mcast_query_response_intvl": 1000,
                "mcast_startup_query_intvl": 3125,
                "mcast_stats_enabled": 0,
                "mcast_igmp_version": 2,
                "mcast_mld_version": 1,
                "nf_call_iptables": 0,
                "nf_call_ip6tables": 0,
                "nf_call_arptables": 0
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 16,
        "ifname": "bond42",
        "flags": ["BROADCAST","MULTICAST","MASTER","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "master": "br42",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            },
            "info_slave_kind": "bridge",
            "info_slave_data": {
                "state": "forwarding",
                "priority": 8,
                "cost": 100,
                "hairpin": false,
                "guard": false,
                "root_block": false,
                "fastleave": false,
                "learning": true,
                "flood": true,
                "id": "0x8001",
                "no": "0x1",
                "designated_port": 32769,
                "designated_cost": 0,
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "hold_timer": 0.00,
                "message_age_timer": 0.00,
                "forward_delay_timer": 11.97,
                "topology_change_ack": 0,
                "config_pending": 0,
                "proxy_arp": false,
                "proxy_arp_wifi": false,
                "multicast_router": 1,
                "mcast_flood": true
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:41 -07:00
Julien Fortin 20aeecfbf5 ip: iplink_bridge.c: add json output support
Schema and live example:
bridge: IFLA_INFO_DATA
{
    "forward_delay": {
        "type": "uint",
        "attr": "IFLA_BR_FORWARD_DELAY"
    },
    "hello_time": {
        "type": "uint",
        "attr": "IFLA_BR_HELLO_TIME"
    },
    "max_age": {
        "type": "uint",
        "attr": "IFLA_BR_MAX_AGE"
    },
    "ageing_time": {
        "type": "uint",
        "attr": "IFLA_BR_AGEING_TIME"
    },
    "stp_state": {
        "type": "uint",
        "attr": "IFLA_BR_STP_STATE"
    },
    "priority": {
        "type": "uint",
        "attr": "IFLA_BR_PRIORITY"
    },
    "vlan_filtering": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_FILTERING"
    },
    "vlan_protocol": {
        "type": "string",
        "attr": "IFLA_BR_VLAN_PROTOCOL"
    },
    "bridge_id": {
        "type": "string",
        "attr": "IFLA_BR_BRIDGE_ID"
    },
    "root_id": {
        "type": "string",
        "attr": "IFLA_BR_ROOT_ID"
    },
    "root_port": {
        "type": "uint",
        "attr": "IFLA_BR_ROOT_PORT"
    },
    "root_path_cost": {
        "type": "uint",
        "attr": "IFLA_BR_ROOT_PATH_COST"
    },
    "topology_change": {
        "type": "uint",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE"
    },
    "topology_change_detected": {
        "type": "uint",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE_DETECTED"
    },
    "hello_timer": {
        "type": "float",
        "attr": "IFLA_BR_HELLO_TIMER"
    },
    "tcn_timer": {
        "type": "float",
        "attr": "IFLA_BR_TCN_TIMER"
    },
    "topology_change_timer": {
        "type": "float",
        "attr": "IFLA_BR_TOPOLOGY_CHANGE_TIMER"
    },
    "gc_timer": {
        "type": "float",
        "attr": "IFLA_BR_GC_TIMER"
    },
    "vlan_default_pvid": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_DEFAULT_PVID"
    },
    "vlan_stats_enabled": {
        "type": "uint",
        "attr": "IFLA_BR_VLAN_STATS_ENABLED"
    },
    "group_fwd_mask": {
        "type": "string",
        "attr": "IFLA_BR_GROUP_FWD_MASK"
    },
    "group_addr": {
        "type": "string",
        "attr": "IFLA_BR_GROUP_ADDR"
    },
    "mcast_snooping": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_SNOOPING"
    },
    "mcast_router": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_ROUTER"
    },
    "mcast_query_use_ifaddr": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_QUERY_USE_IFADDR"
    },
    "mcast_querier": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_QUERIER"
    },
    "mcast_hash_elasticity": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_HASH_ELASTICITY"
    },
    "mcast_hash_max": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_HASH_MAX"
    },
    "mcast_last_member_cnt": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_LAST_MEMBER_CNT"
    },
    "mcast_startup_query_cnt": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_STARTUP_QUERY_CNT"
    },
    "mcast_last_member_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_LAST_MEMBER_INTVL"
    },
    "mcast_membership_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_MEMBERSHIP_INTVL"
    },
    "mcast_querier_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERIER_INTVL"
    },
    "mcast_query_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERY_INTVL"
    },
    "mcast_query_response_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_QUERY_RESPONSE_INTVL"
    },
    "mcast_startup_query_intvl": {
        "type": "lluint",
        "attr": "IFLA_BR_MCAST_STARTUP_QUERY_INTVL"
    },
    "mcast_stats_enabled": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_STATS_ENABLED"
    },
    "mcast_igmp_version": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_IGMP_VERSION"
    },
    "mcast_mld_version": {
        "type": "uint",
        "attr": "IFLA_BR_MCAST_MLD_VERSION"
    },
    "nf_call_iptables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_IPTABLES"
    },
    "nf_call_ip6tables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_IP6TABLES"
    },
    "nf_call_arptables": {
        "type": "uint",
        "attr": "IFLA_BR_NF_CALL_ARPTABLES"
    }
}

$ ip link add dev br42 type bridge
$ ip link add dev bond42 type bond
$ ip link set dev bond42 master br42
$ ip link set dev bond42 up
$ ip link set dev br42 up

$ ip -details link show
15: br42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  199.11 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64
16: bond42: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br42 state UNKNOWN mode DEFAULT group default
    link/ether 22:8f:91:bb:9f:09 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state forwarding priority 8 cost 100 hairpin off guard
off root_block off fastleave off learning on flood on port_id 0x8001
port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.22:8f:91:bb:9f:9 designated_root 8000.22:8f:91:bb:9f:9 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64

$ ip -details -json link show
[{
        "ifindex": 15,
        "ifname": "br42",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {
                "forward_delay": 1500,
                "hello_time": 200,
                "max_age": 2000,
                "ageing_time": 30000,
                "stp_state": 0,
                "priority": 32768,
                "vlan_filtering": 0,
                "vlan_protocol": "802.1Q",
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "root_port": 0,
                "root_path_cost": 0,
                "topology_change": 0,
                "topology_change_detected": 0,
                "hello_timer": 0.00,
                "tcn_timer": 0.00,
                "topology_change_timer": 0.00,
                "gc_timer": 298.27,
                "vlan_default_pvid": 1,
                "vlan_stats_enabled": 0,
                "group_fwd_mask": "0",
                "group_addr": "01:80:c2:00:00:00",
                "mcast_snooping": 1,
                "mcast_router": 1,
                "mcast_query_use_ifaddr": 0,
                "mcast_querier": 0,
                "mcast_hash_elasticity": 4096,
                "mcast_hash_max": 4096,
                "mcast_last_member_cnt": 2,
                "mcast_startup_query_cnt": 2,
                "mcast_last_member_intvl": 100,
                "mcast_membership_intvl": 26000,
                "mcast_querier_intvl": 25500,
                "mcast_query_intvl": 12500,
                "mcast_query_response_intvl": 1000,
                "mcast_startup_query_intvl": 3125,
                "mcast_stats_enabled": 0,
                "mcast_igmp_version": 2,
                "mcast_mld_version": 1,
                "nf_call_iptables": 0,
                "nf_call_ip6tables": 0,
                "nf_call_arptables": 0
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 16,
        "ifname": "bond42",
        "flags": ["BROADCAST","MULTICAST","MASTER","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "master": "br42",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "22:8f:91:bb:9f:09",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            },
            "info_slave_kind": "bridge",
            "info_slave_data": {
                "state": "forwarding",
                "priority": 8,
                "cost": 100,
                "hairpin": false,
                "guard": false,
                "root_block": false,
                "fastleave": false,
                "learning": true,
                "flood": true,
                "id": "0x8001",
                "no": "0x1",
                "designated_port": 32769,
                "designated_cost": 0,
                "bridge_id": "8000.22:8f:91:bb:9f:9",
                "root_id": "8000.22:8f:91:bb:9f:9",
                "hold_timer": 0.00,
                "message_age_timer": 0.00,
                "forward_delay_timer": 11.97,
                "topology_change_ack": 0,
                "config_pending": 0,
                "proxy_arp": false,
                "proxy_arp_wifi": false,
                "multicast_router": 1,
                "mcast_flood": true
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 69ffd27325 ip: iplink_hsr.c: add json output support
Schema:
hsr: IFLA_INFO_DATA
{
    "slave1": {
        "type": "string",
        "attr": "IFLA_HSR_SLAVE1"
    },
    "slave2": {
        "type": "string",
        "attr": "IFLA_HSR_SLAVE2"
    },
    "seq_nr": {
        "type": "int",
        "attr": "IFLA_HSR_SEQ_NR"
    },
    "supervision_addr": {
        "type": "int",
        "attr": "IFLA_HSR_SUPERVISION_ADDR"
    }
}

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 707cce5a63 ip: iplink_bond_slave.c: add json output support (info_slave_data)
Schema and live example:
bond_slave: IFLA_INFO_SLAVE_DATA
{
    "state": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_STATE",
        "mutually_exclusive": {
            "state_index": {
                "type": "int",
                "comment": "if (state >= ARRAY_SIZE(slave_states))"
            }
        }
    },
    "mii_status": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_MII_STATUS",
        "mutually_exclusive": {
            "mii_status_index": {
                "type": "int",
                "comment": "if (status >= ARRAY_SIZE(slave_mii_status))"
            }
        }
    },
    "link_failure_count": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_LINK_FAILURE_COUNT"
    },
    "perm_hwaddr": {
        "type": "string",
        "attr": "IFLA_BOND_SLAVE_PERM_HWADDR"
    },
    "queue_id": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_QUEUE_ID"
    },
    "ad_aggregator_id": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_AGGREGATOR_ID"
    },
    "ad_actor_oper_port_state": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE"
    },
    "ad_partner_oper_port_state": {
        "type": "int",
        "attr": "IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE"
    }
}

$ ip link add dev bond42 type bond
$ ip link set dev swp5 master bond42
$ ip link set dev bond42 up
$ ip link set dev swp5 up
$ ip -details -json link show
[{
        "ifindex": 7,
        "ifname": "swp5",
        "flags": ["BROADCAST","MULTICAST","SLAVE","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "master": "bond42",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_slave_kind": "bond",
            "info_slave_data": {
                "state": "BACKUP",
                "mii_status": "UP",
                "link_failure_count": 0,
                "perm_hwaddr": "08:00:27:5c:03:c6",
                "queue_id": 0,
                "ad_aggregator_id": 1,
                "ad_actor_oper_port_state": 79,
                "ad_partner_oper_port_state": 1
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 14,
        "ifname": "bond42",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","MASTER","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_info": {
                    "aggregator": 1,
                    "num_ports": 1,
                    "actor_key": 0,
                    "partner_key": 1,
                    "partner_mac": "00:00:00:00:00:00"
                },
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 7ff60b090f ip: iplink_bond.c: add json output support
Schema and live example:
bond: IFLA_INFO_DATA
{
    "mode": {
        "type": "string",
        "attr": "IFLA_BOND_MODE"
    },
    "active_slave": {
        "type": "string",
        "attr": "IFLA_BOND_ACTIVE_SLAVE",
        "mutually_exclusive": {
            "active_slave_index": {
                "type": "int",
                "comment": "if active slave doesn't have a valid ifname"
            }
        }
    },
    "miimon": {
        "type": "uint",
        "attr": "IFLA_BOND_MIIMON"
    },
    "updelay": {
        "type": "uint",
        "attr": "IFLA_BOND_UPDELAY"
    },
    "downdelay": {
        "type": "uint",
        "attr": "IFLA_BOND_DOWNDELAY"
    },
    "use_carrier": {
        "type": "uint",
        "attr": "IFLA_BOND_USE_CARRIER"
    },
    "arp_interval": {
        "type": "uint",
        "attr": "IFLA_BOND_ARP_INTERVAL"
    },
    "arp_ip_target": {
        "type": "array",
        "attr": "IFLA_BOND_ARP_IP_TARGET",
        "array": [
            {
                "type": "string"
            }
        ]
    },
    "arp_validate": {
        "type": "string",
        "attr": "IFLA_BOND_ARP_VALIDATE"
    },
    "arp_all_targets": {
        "type": "string",
        "attr": "IFLA_BOND_ARP_ALL_TARGETS"
    },
    "primary": {
        "type": "string",
        "attr": "IFLA_BOND_PRIMARY",
        "mutually_exclusive": {
            "primary_index": {
                "type": "int",
                "comment": "if primary doesn't have a valid ifname"
            }
        }
    },
    "primary_reselect": {
        "type": "string",
        "attr": "IFLA_BOND_PRIMARY_RESELECT"
    },
    "fail_over_mac": {
        "type": "string",
        "attr": "IFLA_BOND_FAIL_OVER_MAC"
    },
    "xmit_hash_policy": {
        "type": "string",
        "attr": "IFLA_BOND_XMIT_HASH_POLICY"
    },
    "resend_igmp": {
        "type": "uint",
        "attr": "IFLA_BOND_RESEND_IGMP"
    },
    "num_peer_notif": {
        "type": "uint",
        "attr": "IFLA_BOND_NUM_PEER_NOTIF"
    },
    "all_slaves_active": {
        "type": "uint",
        "attr": "IFLA_BOND_ALL_SLAVES_ACTIVE"
    },
    "min_links": {
        "type": "uint",
        "attr": "IFLA_BOND_MIN_LINKS"
    },
    "lp_interval": {
        "type": "uint",
        "attr": "IFLA_BOND_LP_INTERVAL"
    },
    "packets_per_slave": {
        "type": "uint",
        "attr": "IFLA_BOND_PACKETS_PER_SLAVE"
    },
    "ad_lacp_rate": {
        "type": "string",
        "attr": "IFLA_BOND_AD_LACP_RATE"
    },
    "ad_select": {
        "type": "string",
        "attr": "IFLA_BOND_AD_SELECT"
    },
    "ad_info": {
        "type": "dict",
        "attr": "IFLA_BOND_AD_INFO",
        "dict": {
            "aggregator": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_AGGREGATOR"
            },
            "num_ports": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_NUM_PORTS"
            },
            "actor_key": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_ACTOR_KEY"
            },
            "partner_key": {
                "type": "int",
                "attr": "IFLA_BOND_AD_INFO_PARTNER_KEY"
            },
            "partner_mac": {
                "type": "string",
                "attr": "IFLA_BOND_AD_INFO_PARTNER_MAC"
            }
        }
    },
    "ad_actor_sys_prio": {
        "type": "uint",
        "attr": "IFLA_BOND_AD_ACTOR_SYS_PRIO"
    },
    "ad_user_port_key": {
        "type": "uint",
        "attr": "IFLA_BOND_AD_USER_PORT_KEY"
    },
    "ad_actor_system": {
        "type": "string",
        "attr": "IFLA_BOND_AD_ACTOR_SYSTEM"
    },
    "tlb_dynamic_lb": {
        "type": "uint",
        "attr": "IFLA_BOND_TLB_DYNAMIC_LB"
    }
}

$ ip link add dev bond42 type bond
$ ip link set dev swp5 master bond42
$ ip link set dev bond42 up
$ ip link set dev swp5 up
$ ip -details -json link show
[{
        "ifindex": 7,
        "ifname": "swp5",
        "flags": ["BROADCAST","MULTICAST","SLAVE","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "master": "bond42",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_slave_kind": "bond",
            "info_slave_data": {
                "state": "BACKUP",
                "mii_status": "UP",
                "link_failure_count": 0,
                "perm_hwaddr": "08:00:27:5c:03:c6",
                "queue_id": 0,
                "ad_aggregator_id": 1,
                "ad_actor_oper_port_state": 79,
                "ad_partner_oper_port_state": 1
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 14,
        "ifname": "bond42",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","MASTER","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5c:03:c6",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {
                "mode": "802.3ad",
                "miimon": 100,
                "updelay": 0,
                "downdelay": 0,
                "use_carrier": 1,
                "arp_interval": 0,
                "arp_validate": null,
                "arp_all_targets": "any",
                "primary_reselect": "always",
                "fail_over_mac": "none",
                "xmit_hash_policy": "layer3+4",
                "resend_igmp": 1,
                "num_peer_notif": 1,
                "all_slaves_active": 0,
                "min_links": 1,
                "lp_interval": 1,
                "packets_per_slave": 1,
                "ad_lacp_rate": "fast",
                "ad_select": "stable",
                "ad_info": {
                    "aggregator": 1,
                    "num_ports": 1,
                    "actor_key": 0,
                    "partner_key": 1,
                    "partner_mac": "00:00:00:00:00:00"
                },
                "ad_actor_sys_prio": 65535,
                "ad_user_port_key": 0,
                "ad_actor_system": "00:00:00:00:00:00"
            }
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin e4a1216aeb ip: iplink.c: open/close json obj for ip -brief -json link show dev DEV
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin d0e720111a ip: ipaddress.c: add support for json output
This patch converts all output (mostly fprintfs) to the new ip_print api
which handle both regular and json output.
Initialize a json_writer and open an array object if -json was specified.
Note that the JSON attribute naming follows the NETLINK_ATTRIBUTE naming.

In many places throughout the code, IP, matches integer values with
hardcoded strings tables, such as link mode, link operstate or link
family.
In JSON context, this will result in a named string field. In the
very unlikely event that the requested index is out of bound, IP
displays the raw integer value. For JSON context this result in
having a different integer field example bellow:

if (mode >= ARRAY_SIZE(link_modes))
	print_int(PRINT_ANY, "linkmode_index", "mode %d ", mode);
else
	print_string(PRINT_ANY, "linkmode", "mode %s ",
		     link_modes[mode]);

The "_index" suffix is open to discussion and it is something that I came
up with. The bottom line is that you can't have a string field that may
become an int field in specific cases. Programs written in strongly type
languages (like C) might break if they are expecting a string value and
got an integer instead. We don't want to confuse anybody or make the code
even more complicated handling these specifics cases.
Hence the extra "_index" field that is easy to check for and deal with.

JSON schema, followed by live example:

Live config used:
$ ip link add dev vxlan42 type vxlan id 42
$ ip link add dev bond0 type bond
$ ip link add name swp1.50 link swp1 type vlan id 50
$ ip link add dev br0 type bridge
$ ip link set dev vxlan42 master br0
$ ip link set dev bond0 master br0
$ ip link set dev swp1.50 master br0
$ ip link set dev br0 up

$ ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0
addrgenmode eui64
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:db:31:88 brd ff:ff:ff:ff:ff:ff promiscuity 0
addrgenmode eui64
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 0
addrgenmode eui64
10: vxlan42: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master br0 state
DOWN mode DEFAULT group default
    link/ether 4a:d9:91:42:a2:d2 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 42 srcport 0 0 dstport 8472 ageing 300
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8001 port_no
0x1 designated_port 32769 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop master br0
state DOWN mode DEFAULT group default
    link/ether e2:aa:7b:17:c5:14 brd ff:ff:ff:ff:ff:ff promiscuity 1
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer3+4 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 1 lp_interval 1
packets_per_slave 1 lacp_rate fast ad_select stable ad_actor_sys_prio
65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8002 port_no
0x2 designated_port 32770 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
12: swp1.50@swp1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master
br0 state DOWN mode DEFAULT group default
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vlan protocol 802.1Q id 50 <REORDER_HDR>
    bridge_slave state disabled priority 8 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on port_id 0x8003 port_no
0x3 designated_port 32771 designated_cost 0 designated_bridge
8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75 hold_timer
0.00 message_age_timer    0.00 forward_delay_timer    0.00
topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off
mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off
addrgenmode eui64
13: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state
DOWN mode DEFAULT group default
    link/ether 08:00:27:5b:b1:75 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time
30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
bridge_id 8000.8:0:27:5b:b1:75 designated_root 8000.8:0:27:5b:b1:75
root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0
hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00
gc_timer  244.44 vlan_default_pvid 1 vlan_stats_enabled 0 group_fwd_mask 0
group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1
mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4096
mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2
mcast_last_member_interval 100 mcast_membership_interval 26000
mcast_querier_interval 25500 mcast_query_interval 12500
mcast_query_response_interval 1000 mcast_startup_query_interval 3125
mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1
nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode
eui64

// Schema for: ip -brief link show
[
    {
        "deleted": {
            "type": "bool",
            "attr": "RTM_DELLINK"
        },
        "link": {
            "type": "string",
            "attr": "IFLA_LINK"
        },
        "ifname": {
            "type": "string",
            "attr": "IFNAME"
        },
        "operstate": {
            "type": "string",
            "attr": "IFLA_OPERSTATE",
            "mutually_exclusive": {
                "operstate_index": {
                    "type": "uint",
                    "comment": "if state >= ARRAY_SIZE(oper_states)"
                }
            }
        },
        "address": {
            "type": "string",
            "attr": "IFLA_ADDRESS"
        },
        "flags": {
            "type": "array",
            "attr": "IFF_LOOPBACK, IFF_BROADCAST...IFF_*"
        },
        "addr_info": {
            "type": "array",
            "array": [
                {
                    "deleted": {
                        "type": "bool",
                        "attr": "RTM_DELADDR"
                    },
                    "family": {
                        "type": "string",
                        "attr": "ifa->ifa_family",
                        "mutually_exclusive": {
                            "family_index": {
                                "type": "uint",
                                "comment": "if family is not known"
                            }
                        }
                    },
                    "local": {
                        "type": "string",
                        "attr": "IFA_LOCAL"
                    },
                    "address": {
                        "type": "string",
                        "attr": "IFLA_LOCAL && IFA_ADDRESS"
                    },
                    "prefixlen": {
                        "type": "int",
                        "attr": "IFLA_LOCAL"
                    }
                }
            ]
        }
    }
]

$ ip -json -brief link show
[{
        "ifname": "lo",
        "operstate": "UNKNOWN",
        "address": "00:00:00:00:00:00",
        "flags": ["LOOPBACK","UP","LOWER_UP"]
    },{
        "ifname": "eth0",
        "operstate": "UP",
        "address": "08:00:27:db:31:88",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"]
    },{
        "ifname": "swp1",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["BROADCAST","MULTICAST"]
    },{
        "ifname": "vxlan42",
        "operstate": "DOWN",
        "address": "4a:d9:91:42:a2:d2",
        "flags": ["BROADCAST","MULTICAST"]
    },{
        "ifname": "bond0",
        "operstate": "DOWN",
        "address": "e2:aa:7b:17:c5:14",
        "flags": ["BROADCAST","MULTICAST","MASTER"]
    },{
        "link": "swp1",
        "ifname": "swp1.50",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["BROADCAST","MULTICAST","M-DOWN"]
    },{
        "ifname": "br0",
        "operstate": "DOWN",
        "address": "08:00:27:5b:b1:75",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","UP"]
    }
]

Schema for normal plus -details: ip -json -details link show

[
    {
        "deleted": {
            "type": "bool",
            "attr": "RTM_DELLINK"
        },
        "ifindex": {
            "type": "int"
        },
        "ifname": {
            "type": "string",
            "attr": "IFLA_IFNAME"
        },
        "link": {
            "type": "string",
            "attr": "IFLA_LINK",
            "mutually_exclusive": {
                "link_index": {
                    "type": "int",
                    "comment": "if IFLA_LINK_NETNSID exists"
                }
            }
        },
        "flags": {
            "type": "array",
            "attr": "IFF_LOOPBACK, IFF_BROADCAST...IFF_*"
        },
        "mtu": {
            "type": "int",
            "attr": "IFLA_MTU"
        },
        "xdp": {
            "type": "object",
            "attr": "IFLA_XDP",
            "object": {
                "mode": {
                    "type": "utin",
                    "attr": "IFLA_XDP_ATTACHED"
                },
                "prog_id": {
                    "type": "uint",
                    "attr": "IFLA_XDP_PROG_ID"
                }
            }
        },
        "qdisc": {
            "type": "string",
            "attr": "IFLA_QDISC"
        },
        "master": {
            "type": "string",
            "attr": "IFLA_MASTER"
        },
        "operstate": {
            "type": "string",
            "attr": "IFLA_OPERSTATE",
            "mutually_exclusive": {
                "operstate_index": {
                    "type": "uint",
                    "comment": "if state >= ARRAY_SIZE(oper_states)"
                }
            }
        },
        "linkmode": {
            "type": "string",
            "attr": "IFLA_LINKMODE",
            "mutually_exclusive": {
                "linkmode_index": {
                    "type": "uint",
                    "comment": "if mode >= ARRAY_SIZE(link_modes)"
                }
            }
        },
        "group": {
            "type": "string",
            "attr": "IFLA_GROUP"
        },
        "txqlen": {
            "type": "int",
            "attr": "IFLA_TXQLEN"
        },
        "event": {
            "type": "string",
            "attr": "IFLA_EVENT",
            "mutually_exclusive": {
                "event_index": {
                    "type": "uint",
                    "attr": "IFLA_OPERSTATE",
                    "comment": "if event >= ARRAY_SIZE(link_events)"
                }
            }
        },
        "link_type": {
            "type": "string",
            "attr": "ifi_type"
        },
        "address": {
            "type": "string",
            "attr": "IFLA_ADDRESS"
        },
        "link_pointtopoint": {
            "type": "bool",
            "attr": "IFF_POINTOPOINT"
        },
        "broadcast": {
            "type": "string",
            "attr": "IFLA_BROADCAST"
        },
        "link_netnsid": {
            "type": "int",
            "attr": "IFLA_LINK_NETNSID"
        },
        "proto_down": {
            "type": "bool",
            "attr": "IFLA_PROTO_DOWN"
        },

        //
        // if -details
        //

        "promiscuity": {
            "type": "uint",
            "attr": "IFLA_PROMISCUITY"
        },
        "linkinfo": {
            "type": "dict",
            "attr": "IFLA_LINKINFO",
            "dict": {
                "info_kind": {
                    "type": "string",
                    "attr": "IFLA_INFO_KIND"
                },
                "info_data": {
                    "type": "dict",
                    "attr": "IFLA_INFO_DATA",
                    "dict": {}
                },
                "info_xstats": {
                    "type": "dict",
                    "attr": "IFLA_INFO_XSTATS",
                    "dict": {}
                },
                "info_slave_data": {
                    "type": "dict",
                    "attr": "IFLA_INFO_SLAVE_DATA",
                    "dict": {}
                }
            }
        },
        "inet6_addr_gen_mode": {
            "type": "string",
            "attr": "IFLA_INET6_ADDR_GEN_MODE"
        },
        "num_tx_queues": {
            "type": "uint",
            "attr": "IFLA_NUM_TX_QUEUES"
        },
        "num_rx_queues": {
            "type": "uint",
            "attr": "IFLA_NUM_RX_QUEUES"
        },
        "gso_max_size": {
            "type": "uint",
            "attr": "IFLA_GSO_MAX_SIZE"
        },
        "gso_max_segs": {
            "type": "uint",
            "attr": "IFLA_GSO_MAX_SEGS"
        },
        "phys_port_name": {
            "type": "string",
            "attr": "IFLA_PHYS_PORT_NAME"
        },
        "phys_port_id": {
            "type": "string",
            "attr": "IFLA_PHYS_PORT_ID"
        },
        "phys_switch_id": {
            "type": "string",
            "attr": "IFLA_PHYS_SWITCH_ID"
        },
        "ifalias": {
            "type": "string",
            "attr": "IFLA_IFALIAS"
        },
        "stats": {
            "type": "dict",
            "attr": "IFLA_STATS",
            "dict": {
                "rx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "over_errors": {
                            "type": "uint"
                        },
                        "multicast": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "length_errors": {
                            "type": "uint"
                        },
                        "crc_errors": {
                            "type": "uint"
                        },
                        "frame_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "missed_errors": {
                            "type": "uint"
                        },
                        "nohandler": {
                            "type": "uint"
                        }
                    }
                },
                "tx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "carrier_errors": {
                            "type": "uint"
                        },
                        "collisions": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "aborted_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "window_errors": {
                            "type": "uint"
                        },
                        "heartbeat_errors": {
                            "type": "uint"
                        },
                        "carrier_changes": {
                            "type": "uint"
                        }
                    }
                }
            }
        },
        "stats64": {
            "type": "dict",
            "attr": "IFLA_STATS64",
            "dict": {
                "rx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "over_errors": {
                            "type": "uint"
                        },
                        "multicast": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "length_errors": {
                            "type": "uint"
                        },
                        "crc_errors": {
                            "type": "uint"
                        },
                        "frame_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "missed_errors": {
                            "type": "uint"
                        },
                        "nohandler": {
                            "type": "uint"
                        }
                    }
                },
                "tx": {
                    "type": "dict",
                    "dict": {
                        "bytes": {
                            "type": "uint"
                        },
                        "packets": {
                            "type": "uint"
                        },
                        "errors": {
                            "type": "uint"
                        },
                        "dropped": {
                            "type": "uint"
                        },
                        "carrier_errors": {
                            "type": "uint"
                        },
                        "collisions": {
                            "type": "uint"
                        },
                        "compressed": {
                            "type": "uint"
                        },
                        "aborted_errors": {
                            "type": "uint"
                        },
                        "fifo_errors": {
                            "type": "uint"
                        },
                        "window_errors": {
                            "type": "uint"
                        },
                        "heartbeat_errors": {
                            "type": "uint"
                        },
                        "carrier_changes": {
                            "type": "uint"
                        }
                    }
                }
            }
        },
        "vfinfo_list": {
            "type": "array",
            "attr": "IFLA_VFINFO_LIST",
            "array": [
                {
                    "vf": {
                        "type": "int"
                    },
                    "mac": {
                        "type": "string"
                    },
                    "vlan_list": {
                        "type": "array",
                        "attr": "IFLA_VF_VLAN_LIST",
                        "array": [
                            {
                                "vlan": {
                                    "type": "int"
                                },
                                "qos": {
                                    "type": "int"
                                },
                                "protocol": {
                                    "type": "string"
                                }
                            }
                        ]
                    },
                    "vlan": {
                        "type": "int",
                        "attr": "!IFLA_VF_VLAN_LIST && IFLA_VF_VLAN"
                    },
                    "qos": {
                        "type": "int",
                        "attr": "!IFLA_VF_VLAN_LIST && IFLA_VF_VLAN"
                    },
                    "tx_rate": {
                        "type": "int"
                    },
                    "rate": {
                        "type": "dict",
                        "attr": "IFLA_VF_RATE",
                        "dict": {
                            "max_tx": {
                                "type": "int"
                            },
                            "min_tx": {
                                "type": "int"
                            }
                        }
                    },
                    "spoofchk": {
                        "type": "bool",
                        "attr": "IFLA_VF_SPOOFCHK"
                    },
                    "link_state": {
                        "type": "string",
                        "attr": "IFLA_VF_LINK_STATE"
                    },
                    "trust": {
                        "type": "bool",
                        "attr": "IFLA_VF_TRUST"
                    },
                    "query_rss_en": {
                        "type": "bool",
                        "attr": "IFLA_VF_RSS_QUERY_EN"
                    },
                    "stats": {
                        "type": "dict",
                        "attr": "IFLA_VF_STATS",
                        "dict": {
                            "rx": {
                                "type": "dict",
                                "dict": {
                                    "bytes": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_RX_BYTES"
                                    },
                                    "packets": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_RX_PACKETS"
                                    },
                                    "multicast": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_MULTICAST"
                                    },
                                    "broadcast": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_BROADCAST"
                                    }
                                }
                            },
                            "tx": {
                                "type": "dict",
                                "dict": {
                                    "bytes": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_TX_BYTES"
                                    },
                                    "packets": {
                                        "type": "uint",
                                        "attr": "IFLA_VF_STATS_TX_PACKETS"
                                    }
                                }
                            }
                        }
                    }
                }
            ]
        }
    }
]

Example with the config previously given:
Note that here, linkinfo attributes are not populated.
The schemas are provided in each link type patches.

$ ip -details -json link show
[{
        "ifindex": 1,
        "ifname": "lo",
        "flags": ["LOOPBACK","UP","LOWER_UP"],
        "mtu": 65536,
        "qdisc": "noqueue",
        "operstate": "UNKNOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "loopback",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 2,
        "ifname": "eth0",
        "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
        "mtu": 1500,
        "qdisc": "pfifo_fast",
        "operstate": "UP",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:db:31:88",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 3,
        "ifname": "swp1",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 10,
        "ifname": "vxlan42",
        "flags": ["BROADCAST","MULTICAST"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "4a:d9:91:42:a2:d2",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "vxlan",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {}
        },
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 11,
        "ifname": "bond0",
        "flags": ["BROADCAST","MULTICAST","MASTER"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "e2:aa:7b:17:c5:14",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "bond",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 16,
        "num_rx_queues": 16,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 12,
        "ifname": "swp1.50",
        "link": "swp1",
        "flags": ["BROADCAST","MULTICAST","M-DOWN"],
        "mtu": 1500,
        "qdisc": "noop",
        "master": "br0",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 1,
        "linkinfo": {
            "info_kind": "vlan",
            "info_data": {},
            "info_slave_kind": "bridge",
            "info_slave_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    },{
        "ifindex": 13,
        "ifname": "br0",
        "flags": ["NO-CARRIER","BROADCAST","MULTICAST","UP"],
        "mtu": 1500,
        "qdisc": "noqueue",
        "operstate": "DOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "link_type": "ether",
        "address": "08:00:27:5b:b1:75",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "promiscuity": 0,
        "linkinfo": {
            "info_kind": "bridge",
            "info_data": {},
        "inet6_addr_gen_mode": "eui64",
        "num_tx_queues": 1,
        "num_rx_queues": 1,
        "gso_max_size": 65536,
        "gso_max_segs": 65535
    }
]

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 6377572f0a ip: ip_print: add new API to print JSON or regular format output
To avoid code duplication and have a ligther impact on most of the files,
these functions were made to handle both stdout (FP context) or JSON
output. Using this api, the changes are easier to read and the code
stays as compact as possible.

includes json_writer.h in ip_common.h to make the lib/json_writer.c
functions available to the new "ip_print" api.

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 7252f16b2d json_writer: add new json handlers (null, float with format, lluint, hu)
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 5df6077259 ip: add new command line argument -json (mutually exclusive with -color)
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
Julien Fortin 959f142863 color: add new COLOR_NONE and disable_color function
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
2017-08-17 18:02:40 -07:00
David Lebrun 0439990238 man: add documentation for seg6local lwt
This patch adds documentation in the ip-route man page
about the seg6local lightweight tunnel.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
David Lebrun 8db158b9ca iproute: add support for SRv6 local segment processing
This patch adds support for the seg6local lightweight tunnel
("ip route add ... encap seg6local ...").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
David Lebrun 00e76d4da3 iproute: add helper functions for SRH processing
This patch adds two helper functions to print and parse
Segment Routing Headers.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-08-15 16:44:23 -07:00
Stephen Hemminger b7f7c1b817 include: add pfkeyv2.h drop ipv6.h
pfkeyv2.h is include by ipsec.
linux/ipv6.h is not included by any code in current tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-15 16:43:16 -07:00
Stephen Hemminger e0495b84ab seg6: add include/linux/seg6_local.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-15 16:35:30 -07:00
Stephen Hemminger 3af3d358a3 more BPF headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-10 16:42:35 -07:00
Stephen Hemminger 16ab6c47ba Merge branch 'master' into net-next 2017-08-10 16:41:59 -07:00
Daniel Borkmann 8cc360fe48 bpf: unbreak libelf linkage for bpf obj loader
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-10 16:40:02 -07:00
Stephen Hemminger 5926276a2a Merge branch 'master' into net-next 2017-08-09 09:15:30 -07:00
David Ahern fb6cb30774 lib: Dump ext-ack string by default
In time, errfn can be implemented for link, route, etc commands to
give a much more detailed response (e.g., point to the attribute
that failed). Doing so is much more complicated to process the
message and convert attribute ids to names.

In any case the error string returned by the kernel should be dumped
to the user, so make that happen now.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-08-09 09:14:01 -07:00
Stephen Hemminger 7ef36c8cea Merge branch 'master' into net-next 2017-08-09 09:11:48 -07:00
Stephen Hemminger fcfcc40b7d vti: print keys in hex not dotted notation
The ikey and okey value are normal u32 values. The input accepts
them in dotted, hex or decimal form. For output, hex seems like
the best form since they are not really addresses.

Suggested-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 09:11:02 -07:00
Stephen Hemminger 69fed534a5 change how Config is used in Makefile's
The recent LIBMNL changes was made more difficult to debug because
of how Config is handle in clean make. The Config file is generated
by top level make, but since it is not recursive, the values generated
would not be visible on a clean make.

The change is to not include Config in top level make, and move
all the conditionals down into sub makefiles. Not ideal, but beter
than going full autoconf route. Or forcing separate configure
step.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 09:10:52 -07:00
Stephen Hemminger e9155685b7 Merge branch 'master' into net-next 2017-08-09 08:41:34 -07:00
Stephen Hemminger 2a80154fde vti6: fix local/remote any addr handling
According to the IPv4 behavior of 'ip' it should be possible
to omit the arguments for local and remote address.
Without this patch omitting these parameters would lead to
uninitialized memory being interpreted as IPv6 addresses.

Reported-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:39:27 -07:00
Stephen Hemminger 6ff66acc60 tc, ip: more Makefile updates for LIBMNL
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:38:51 -07:00
Stephen Hemminger 96421f92ef include: update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:35:26 -07:00
Stephen Hemminger ee66d60ee8 Merge branch 'master' into net-next 2017-08-09 08:34:38 -07:00
Alexander Alemayhu 4a340fe9b4 examples/bpf: update list of examples
Remove deleted examples and add the new map in map example.

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-09 08:34:13 -07:00
Stephen Hemminger 089f85694a lib: need to pass LIBMNL flag
Missed on earlier conversion.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:33:31 -07:00
Stephen Hemminger 9d08319d08 Merge branch 'master' into net-next 2017-08-07 12:29:19 -07:00
Stephen Hemminger 7d23fa5591 lib: fix extended ack with and without libmnl
The code was always building without libmnl support, so it was
doing nothing.

Fixes: b6432e68ac ("iproute: Add support for extended ack to rtnl_talk")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-07 12:01:49 -07:00
Jamal Hadi Salim 5c8176ddbc actions: update the man page to describe the "since" time filter
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-04 13:16:51 -07:00
Jamal Hadi Salim 9e71352581 tc actions: Improved batching and time filtered dumping
dump more than TCA_ACT_MAX_PRIO actions per batch when the kernel
supports it.

Introduced keyword "since" for time based filtering of actions.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
  match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
...

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
  match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-04 13:16:51 -07:00
Stephen Hemminger c936d85e19 Merge branch 'master' into net-next 2017-08-04 13:16:47 -07:00
Casey Callendrello d6a4076b6b netns: make /var/run/netns bind-mount recursive
When ip netns {add|delete} is first run, it bind-mounts /var/run/netns
on top of itself, then marks it as shared. However, if there are already
bind-mounts in the directory from other tools, these would not be
propagated. Fix this by recursively bind-mounting.

Signed-off-by: Casey Callendrello <casey.callendrello@coreos.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-08-04 12:08:52 -07:00
Stephen Hemminger 0562af4a07 Merge branch 'master' into net-next 2017-08-04 12:05:31 -07:00
Stephen Hemminger aba9c23a6e ss: enclose IPv6 address in brackets
Based on patch by Lehner Florian <dev@der-flo.net>

Adds support for RFC2732 IPv6 address format with brackets.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-04 12:04:04 -07:00
Stephen Hemminger 566421c8f7 Merge branch 'master' into net-next 2017-08-04 09:54:44 -07:00
Stephen Hemminger b6432e68ac iproute: Add support for extended ack to rtnl_talk
Add support for extended ack error reporting via libmnl.
Add a new function rtnl_talk_extack that takes a callback as an input
arg. If a netlink response contains extack attributes, the callback is
is invoked with the the err string, offset in the message and a pointer
to the message returned by the kernel.

If iproute2 is built without libmnl, it will still work but
extended error reports from kernel will not be available.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-04 09:54:00 -07:00
Ido Schimmel 4b3409d863 iproute: Display offload indication per-nexthop
Since kernel commit 475abbf1ef67 ("ipv4: fib: Set offload indication
according to nexthop flags") offload indication is reported on a
per-nexthop basis.

Adjust iproute2 to display it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
2017-08-03 16:13:16 -07:00
Stephen Hemminger 72e4ea5eb6 update headers from 4.13 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 16:12:40 -07:00
Stephen Hemminger 89bcb455a1 Merge branch 'master' into net-next 2017-08-03 16:11:22 -07:00
Stephen Hemminger 620fc6696d tc: fix m_simple usage
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 16:10:18 -07:00
Phil Sutter e2a055dd23 tc-simple: Fix documentation
- CONTROL has to come last, otherwise 'index' applies to gact and not
  simple itself.
- Man page wasn't updated to reflect syntax changes.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-03 16:02:44 -07:00
Phil Sutter 34705c807a Really fix get_addr() and get_prefix() error messages
Both functions take the desired address family as a parameter. So using
that to notify the user what address family was expected is correct,
unlike using dst->family which will tell the user only what address
family was specified.

The situation which commit 334af76143 tried to fix was when 'ip'
would accept addresses from multiple families. In that case, the family
parameter is set to AF_UNSPEC so that get_addr_1() may accept any valid
address.

This patch introduces a wrapper around family_name() which returns the
string "any valid" for AF_UNSPEC instead of the three question marks
unsuitable for use in error messages.

Tests for AF_UNSPEC:

| # ip a a 256.10.166.1/24 dev d0
| Error: any valid prefix is expected rather than "256.10.166.1/24".

| # ip neighbor add proxy 2001:db8::g dev d0
| Error: any valid address is expected rather than "2001:db8::g".

Tests for explicit address family:

| # ip -6 addrlabel add prefix 1.1.1.1/24 label 123
| Error: inet6 prefix is expected rather than "1.1.1.1/24".

| # ip -4 addrlabel add prefix dead:beef::1/24 label 123
| Error: inet prefix is expected rather than "dead:beef::1/24".

Reported-by: Jaroslav Aster <jaster@redhat.com>
Fixes: 334af76143 ("fix get_addr() and get_prefix() error messages")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-03 16:01:03 -07:00
Stephen Hemminger cc21ebe843 update headers from 4.13-rc4
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-03 15:57:26 -07:00
Phil Sutter 3da3ebfca8 bpf: Make bytecode-file reading a little more robust
bpf_parse_string() will now correctly handle:

- Extraneous whitespace,
- OPs on multiple lines and
- overlong file names.

The added feature of allowing to have OPs on multiple lines (like e.g.
tcpdump prints them) is rather a side effect of fixing detection of
malformed bytecode files having random content on a second line, like
e.g.:

| 4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0
| foobar

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-03 15:56:48 -07:00
Stephen Hemminger f73ac674d0 ip: change flag names to an array
For the most of the address flags, use a table of values rather
than open coding every value.  This allows for easier inevitable
expansion of flags.

This also fixes the missing stable-privacy flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-01 08:37:53 -07:00
Stephen Hemminger c369dc803b Update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-31 20:55:27 -07:00
Stephen Hemminger 1f215a308b Merge branch 'master' into net-next 2017-07-31 17:05:09 -07:00
Hangbin Liu 5ce897a03b utils: return default family when rtm_family is not RTNL_FAMILY_IPMR/IP6MR
When we get a multicast route, the rtm_type is RTN_MULTICAST, but the
rtm_family may be AF_INET. If we only check the type with RTNL_FAMILY_IPMR,
we will get malformed address. e.g.

+ ip -4 route add multicast 172.111.1.1 dev em1 table main

Before fix:
+ ip route list type multicast table main
multicast ac6f:101:800:400:400:0:3c00:0 dev em1 scope link

After fix:
+ ip route list type multicast table main
multicast 172.111.1.1 dev em1 scope link

Fixes: 56e3eb4c34 ("ip: route: fix multicast route dumps")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-07-27 11:27:17 -07:00
Matteo Croce d3f0b09197 netns: more input validation
ip netns accepts invalid input as namespace name like an empty string or a
string longer than the maximum file name length.
Check that the netns name is not empty and less than or equal to NAME_MAX.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-27 11:25:20 -07:00
Girish Moodalbail c2a85c3bcd geneve: support for modifying geneve device
Ability to change geneve device attributes was added to kernel through
commit 5b861f6baa3a ("geneve: add rtnl changelink support"), however one
cannot do the same through ip-link(8) command.  Changing the allowed
geneve device attributes using 'ip link set <geneve_name> type geneve id
<geneve_id> <allowed_attributes>' currently fails with 'operation not
supported' error.  This patch adds support for it.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
2017-07-27 11:22:50 -07:00
Stephen Hemminger 08f5fdb201 Merge branch 'master' into net-next 2017-07-25 11:59:13 -07:00
Daniel Borkmann 95ae9a4870 bpf: fix mnt path when from env
When bpf fs mount path is from env, behavior is currently broken as
we continue to search in default paths, thus fix this up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-25 11:43:28 -07:00
Daniel Borkmann ecb05c0f99 bpf: improve error reporting around tail calls
Currently, it's still quite hard to figure out if a prog passed the
verifier, but later gets rejected due to different tail call ownership.
Figure out whether that is the case and provide appropriate error
messages to the user.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-25 11:43:28 -07:00
Élie Bouttier 2f406f2d0b ip route: replace exits with returns
This patch replaces exits with returns in ip route
commands.

Allows to continue when invoked with ip -batch.

Signed-off-by: Élie Bouttier <elie@bouttier.eu>
2017-07-25 11:37:49 -07:00
Philip Prindeville adbb296594 iproute2: add support for GRE ignore-df knob
In the presence of firewalls which improperly block ICMP Unreachable
(including Fragmentation Required) messages, Path MTU Discovery is
prevented from working.

The workaround is to handle IPv4 payloads opaquely, ignoring the DF
bit.

Kernel commit 22a59be8b7693eb2d0897a9638f5991f2f8e4ddd ("net: ipv4:
Add ability to have GRE ignore DF bit in IPv4 payloads") is
complemented by this user-space changeset which exposes control of
this setting.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
2017-07-20 17:25:54 -07:00
Matteo Croce 79928fd055 netns: avoid directory traversal
ip netns keeps track of created namespaces with bind mounts named
/var/run/netns/<namespace>. No input sanitization is done, allowing creation and
deletion of files relatives to /var/run/netns or, if the path is non existent or
invalid, allows to create "untracked" namespaces (invisible to the tool).

This commit denies creation or deletion of namespaces with names contaning
"/" or matching exactly "." or "..".

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-20 17:23:52 -07:00
Nikhil Gajendrakumar 44e0f6f3cd bridge: this patch adds json support for bridge mdb show
This patch adds json output to bridge mdb show

Normal Output:
$ bridge -d -s mdb show
dev br0 port swp3 grp 239.0.0.1 temp  vid 128 172.26
dev br0 port swp3 grp 239.0.0.1 temp  vid 64 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 1024 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 256 172.26
dev br0 port swp2 grp 239.0.0.2 temp  vid 1 172.26
dev br0 port swp3 grp 239.0.0.1 temp  vid 1 172.26
router ports on br0: swp4    0.00 permanent
router ports on br0: swp5    0.00 permanent

Json Output:
$ bridge -d -s -j mdb show
{
    "mdb": [{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 128,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 64,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 1024,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 256,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp2",
            "grp": "239.0.0.2",
            "state": "temp",
            "vid": 1,
            "timer": " 166.74"
        },{
            "dev": "br0",
            "port": "swp3",
            "grp": "239.0.0.1",
            "state": "temp",
            "vid": 1,
            "timer": " 166.74"
        }
    ],
    "router": {
        "br0": [{
                "port": "swp4",
                "timer": "   0.00",
                "type": "permanent"
            },{
                "port": "swp5",
                "timer": "   0.00",
                "type": "permanent"
            }
        ]
    }
}

Signed-off-by: Nikhil Gajendrakumar <nikhil@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-07-18 17:32:38 -07:00
Stephen Hemminger edadd6b076 Merge branch 'master' into net-next 2017-07-18 17:31:09 -07:00
Matteo Croce b09515553f tc: fix typo in manpage
Fix a typo in the 'tc' manpage and reword some sentences.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
2017-07-18 17:25:59 -07:00
Daniel Borkmann 779525cd77 bpf: dump id/jited info for cls/act programs
Make use of TCA_BPF_ID/TCA_ACT_BPF_ID that we exposed and print the ID
of the programs loaded and use the new BPF_OBJ_GET_INFO_BY_FD command
for dumping further information about the program, currently whether
the attached program is jited.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Daniel Borkmann 612ff099a1 bpf: support loading map in map from obj
Add support for map in map in the loader and add a small example program.
The outer map uses inner_id to reference a bpf_elf_map with a given ID
as the inner type. Loading maps is done in three passes, i) all non-map
in map maps are loaded, ii) all map in map maps are loaded based on the
inner_id map spec of a non-map in map with corresponding id, and iii)
related inner maps are attached to the map in map with given inner_idx
key. Pinned objetcs are assumed to be managed externally, so they are
only retrieved from BPF fs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Daniel Borkmann 23b2ed2d64 bpf: remove obsolete samples
Remove old samples that have been added in pre BPF fs days which were
using file descriptor passing. It's long obsolete and not encouraged
to use this method given BPF fs is the default way like in the other
samples.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-07-18 17:20:45 -07:00
Roopa Prabhu 2e86ed542d iproute: extend route get for mpls routes
This patch extends route get to support mpls specific
route attributes like RTA_NEWDST.

Input:
RTA_DST - input label
RTA_NEWDST - labels in packet for multipath selection

By default the getroute handler returns matched
nexthop label, via and oif

With fibmatch keyword (RTM_F_FIB_MATCH flag), full matched
route is returned.

example:
$ip -f mpls route show
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
201
        nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
        nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12

$ip -f mpls route get 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get 101
101 as to 102/103 via inet 172.16.2.2 dev virt1-2

$ip -f mpls route get as to 302/303 101
101 as to 302/303 via inet 172.16.12.2 dev virt1-12

$ip -f mpls route get fibmatch 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get fibmatch 101
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2017-07-18 17:17:27 -07:00
Stephen Hemminger 89ec74a3ea remove duplicated #include's
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-18 17:17:15 -07:00
Stephen Hemminger 517771e271 update headers to 4.13-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-18 17:16:56 -07:00
Stephen Hemminger f0b9b79572 update kernel headers from net-next
Just as net-next merge window opens.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-07-17 18:32:03 -07:00
Stephen Hemminger ef513fb04e Merge branch 'master' into net-next 2017-07-05 09:12:16 -07:00
Stephen Hemminger cdb90ce406 v4.12.0 2017-07-05 09:07:31 -07:00
Stephen Hemminger 79e7918a2a Merge branch 'master' into net-next 2017-07-05 09:07:30 -07:00
Krister Johansen 288c28bc11 iptunnel: add support for mpls/ip to ipip tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:59 -07:00
Krister Johansen f005b700cf iptunnel: add support for mpls/ip to sit tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:59 -07:00
Krister Johansen 7baca946c4 iptunnel: document mode parameter for sit tunnels
Original-Author: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-05 09:04:58 -07:00
Lucas Bates 2ce280de9f Add new man page for tc actions.
This page is to highlight all operations and options that are
applicable to all tc actions.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-07-05 09:00:37 -07:00
Roman Mashak 81ba3e6fbd tc: updated ife man page.
Explain when skbmark encoding may fail.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-30 15:27:07 -07:00
Jakub Kicinski 1b5e809466 bpf: allow requesting XDP HW offload
Let XDP link set command request that the program be offloaded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Jakub Kicinski 1468381415 bpf: add xdpdrv for requesting XDP driver mode
Allow user to select XDP DRV_MODE flag by using xdpdrv keyword
instead of xdp or xdpgeneric.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Jakub Kicinski 2de3379701 bpf: print xdp offloaded mode
Add interpretation of XDP_ATTACHED_HW mode on dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Martin KaFai Lau 0b4ea60b5a bpf: Add support for IFLA_XDP_PROG_ID
This patch adds support to the newly added IFLA_XDP_PROG_ID.

./ip link show dev eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:2 qdisc [...]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:13:55 -07:00
Stephen Hemminger 35a004dc8a update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-27 16:11:12 -07:00
Stephen Hemminger 1fd8a8e23d Merge branch 'master' into net-next 2017-06-27 16:10:55 -07:00
Daniel Borkmann c9c3720d14 bpf: indicate lderr when bpf_apply_relo_data fails
When LLVM wrongly generates a rodata relo entry (llvm BZ #33599),
then just bail out instead of probing for prog w/o reloc, which
will fail in this case anyway.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-06-27 16:08:52 -07:00
Lukas Braun 3288e9b426 man: ip-route.8: Mention that lower metric means higher priority
This is quite counter-intuitive when using the 'preference' keyword.

Signed-off-by: Lukas Braun <koomi@moshbit.net>
2017-06-27 16:07:28 -07:00
Phil Sutter f2ca4a7a6f man: Collect names of man pages automatically
As it turned out, forgetting to add a man page to the respective
Makefile when introducing it is a common mistake. Overcome this once and
for all by using $(wildcard) function in Makefiles.

Fixes: 7124942942 ("genl: add manpage")
Fixes: 958cd21094 ("ifcfg: add manpage")
Fixes: e1b7f883e5 ("man: add documentation for IPv6 SR commands")
Fixes: 1949f82cdf ("Introduce ip vrf command")
Fixes: 535194a172 ("tipc: add peer remove functionality")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-06-27 16:00:09 -07:00
Roman Mashak 7cca407e28 tc: updated tc-u32 man page to reflect skip_sw and skip_hw parameters.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-21 08:34:29 -07:00
Roman Mashak fb12cea8d9 tc: fixed typo in usage text.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-06-21 08:34:28 -07:00
Jiri Benc 59eb271d1d tc: m_tunnel_key: add csum/nocsum option
Allows control of UDP zero checksum.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Benc 50907a8245 tc: m_tunnel_key: reformat the usage text
Adding new tunnel key fields would cause the usage line overflow 80 chars.
Make the usage text similar to other commands.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:11:42 -07:00
Jiri Pirko c794b7b179 tc: don't print error message on miss when parsing action with default
In case default control action parsing takes place, it is ok to miss.
So don't print error message.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jiri Benc <jbenc@redhat.com>
2017-06-16 09:07:31 -07:00
Stephen Hemminger 39f3776b50 update headers to get TCA_TUNNEL_CSUM
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-16 09:06:47 -07:00
Stephen Hemminger 236211a763 Merge branch 'master' into net-next 2017-06-16 09:05:53 -07:00
David Lebrun e4319590f7 iproute: fix compilation issue with older glibc
If a header that includes linux/in6.h is included before
iproute's utils.h, then iproute2 fails to compile on older
glibc versions.

Fixes: e8493916a8 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-06-16 09:03:48 -07:00
Hangbin Liu ad0a6a2c63 ip neigh: allow flush FAILED neighbour entry
After upstream commit 5071034e4af7 ('neigh: Really delete an arp/neigh entry
on "ip neigh delete" or "arp -d"'), we could delete a single FAILED neighbour
entry now. But `ip neigh flush` still skip the FAILED entry.

Move the filter after first round flush so we can flush FAILED entry on fixed
kernel and also do not keep retrying on old kernel.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-06-16 09:01:02 -07:00
Stephen Hemminger be8b93e3e2 Merge branch 'master' into net-next 2017-06-15 08:32:53 -07:00
Donald Sharp 3dc98cf2f5 ip: mroute: Add table output to show command
When the user specifies `table all` or `table 0` to
the `ip mroute show` command we dump the entirety of
the known mroute tables.  Without some sort of
divisor to tell us what table we are looking at
the command is useless.

Add `Table: <vrf name>` to the output of 'ip mroute show table 0'

Follow the convention established by 'ip route show table 0'
for when to display

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-06-15 08:29:30 -07:00
Nicolas Dichtel a11b7b71a6 link_gre6: really support encaplimit option
This option is documented in gre6 help, but was not supported.

Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2017-06-15 08:29:30 -07:00
Stephen Hemminger a9ae195a21 xfrm: get #define's from linux includes
Use linux/ipsec.h and linux/in.h to get the definition of IP related
protocols.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-14 10:25:39 -07:00
Jakub Sitnicki 7b201d6019 iproute: Remove useless check for nexthop keyword when setting RTA_OIF
When modifying a route we set the RTA_OIF attribute only if a device was
specified with "dev" or "oif" keyword. But for some unknown reason we
earlier alternatively check also for the presence of "nexthop" keyword,
even though it has no effect. So remove the pointless check.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
2017-06-14 09:56:05 -07:00
Stephen Hemminger b68581d43e more bpf header updates
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-14 09:52:44 -07:00
Arkadi Sharshevsky 8a38e44fad bridge: Distinguish between externally learned vs offloaded FDBs
Distinguish between externally learned vs offloaded FDBs. This is done
in order to indicate that FDBs added by software was successfully
offloaded.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-14 09:50:25 -07:00
Jiri Pirko d5ebd6fdde tc: add support for TRAP action
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Jiri Pirko 18f05d0601 tc: gact: fix control action parsing
parse_action_control helper does advancing of the arg inside. So don't
do it outside.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 11:03:12 -07:00
Or Gerlitz 6ea2c2b1cf tc: flower: add support for matching on ip tos and ttl
Allow users to set flower classifier filter rules which
include matches for ip tos and ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-06-08 10:59:53 -07:00
Stephen Hemminger 410556ad99 update headers from net-next (bpf and tc)
More BPF and tc_action values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-08 10:56:14 -07:00
Vlad Yasevich 735a52ceda ip: Add IFLA_EVENT output to ip monitor
Add IFLA_EVENT output so that event types can be viewed with
'monitor' command.  This gives a little more information for why
a given message was received.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
2017-06-05 12:38:19 -07:00
Roopa Prabhu aa883d86c0 ip: extend route get to return matching fib route
Uses newly introduced RTM_GETROUTE flag RTM_F_FIB_MATCH
to return a matching fib route. Introduces 'fibmatch'
keyword to ip route get.

ipv4:
----
$ip route show
default via 192.168.0.2 dev eth0
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

$ip route get 10.0.14.2
10.0.14.2 via 172.16.1.3 dev dummy1  src 172.16.1.1
    cache

$ip route get fibmatch 10.0.14.2
10.0.14.0/24
        nexthop via 172.16.0.3  dev dummy0 weight 1
        nexthop via 172.16.1.3  dev dummy1 weight 1

ipv6:
----
$ip -6 route show
2001:db9:100::/120  metric 1024
        nexthop via 2001:db8:2::2  dev dummy0 weight 1
        nexthop via 2001:db8:12::2  dev dummy1 weight 1

$ip -6 route get 2001:db9:100::1
2001:db9:100::1 from :: via 2001:db8:12::2 dev dummy1  \
                src 2001:db8:12::1  metric 1024  pref medium

$ip -6 route get fibmatch 2001:db9:100::1
2001:db9:100::/120  metric 1024
        nexthop via 2001:db8:12::2  dev dummy1 weight 1
        nexthop via 2001:db8:2::2  dev dummy0 weight 1

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
2017-06-05 12:33:50 -07:00
Stephen Hemminger d9bcafb4fe updated headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-06-05 12:31:52 -07:00
Stephen Hemminger a5445c56e1 Merge branch 'master' into net-next 2017-06-05 12:31:19 -07:00
Eli Cohen 5a3ec4ba64 iplink: Update usage in help message
Add to usage message a description of how to configure Infiniband node
and port GUIDs. Also modify the man page to emphasize the GUIDs are
configured for Infiniband VFs.

Fixes: d91fb3f4c7 ("Add support for configuring Infiniband GUIDs")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2017-06-05 12:29:36 -07:00
Oliver Hartkopp efe459c76d ip: link add vxcan support
Since commit a8f820a380a2a06 ('can: add Virtual CAN Tunnel driver (vxcan)')
for Linux 4.12 a virtual CAN tunnel driver analogue to veth is available in
Linux.

This patch adds the ability to create vxcan device pairs.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2017-06-05 12:27:32 -07:00
Stephen Hemminger 309d5c2f83 Merge branch 'master' into net-next 2017-05-30 17:55:17 -07:00
David Ahern 1dddb60503 ip vrf: Add show command
Add show command to list all configured VRF and their table ids.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 63891c7013 ip address: Change print_linkinfo_brief to take filter as an input
Change print_linkinfo_brief to take the filter as an input arg.
If the arg is NULL, use the global filter in ipaddress.c.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 741dd5cd9c ip address: Move filter struct to ip_common.h
Move filter struct to ip_common.h as struct link_filter.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
David Ahern 4ad875944f ip address: Export ip_linkaddr_list
ipaddr_list_flush_or_save generates a list of nlmsg's for links and
optionally for addresses. Move the code into ip_linkaddr_list and
export it along with the supporting infrastructure.

API to use this function is:
        struct nlmsg_chain linfo = { NULL, NULL};
        struct nlmsg_chain ainfo = { NULL, NULL};

        ip_linkaddr_list(family, filter_req, &linfo, &ainfo);

        ... error checking and code looping over linfo/ainfo ...

        free_nlmsg_chain(&linfo);
        free_nlmsg_chain(&ainfo);

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:54:03 -07:00
Stephen Hemminger 6ef590ed88 Merge branch 'master' into net-next 2017-05-30 17:50:47 -07:00
Daniel Borkmann 218560185d bpf: dump error to the user when retrieving pinned prog fails
I noticed we currently don't dump an error message when a pinned
program couldn't be retrieved, thus add a hint to the user.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-30 17:49:09 -07:00
Daniel Borkmann 077bb1803c bpf: update printing of generic xdp mode
Follow-up to d67b9cd28c1d ("xdp: refine xdp api with regards to
generic xdp") in order to update the XDP dumping part.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-30 17:49:09 -07:00
Jiri Pirko 0c30d14d0a tc: flower: add support for tcp flags
Allow user to insert a flower classifier filter rule which includes
match for tcp flags.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-30 17:41:32 -07:00
Stephen Hemminger 2ecb169280 Merge branch 'master' into net-next 2017-05-30 17:40:57 -07:00
Remigiusz Kołłątaj 759fa6086e ip: add handling for new CAN netlink interface
This patch adds handling for new CAN netlink interface introduced in
4.11 kernel:
- IFLA_CAN_TERMINATION,
- IFLA_CAN_TERMINATION_CONST,
- IFLA_CAN_BITRATE_CONST,
- IFLA_CAN_DATA_BITRATE_CONST

Output example:
$ip -d link show can0
6: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 80000
             [   20000,    33333,    50000,    80000,    83333,   100000,
                125000,   150000,   175000,   200000,   225000,   250000,
                275000,   300000,   500000,   625000,   800000,  1000000 ]
          termination 0 [ 0, 120 ]
          clock 0numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Signed-off-by: Remigiusz Kołłątaj <remigiusz.kollataj@mobica.com>
2017-05-30 17:39:33 -07:00
Phil Sutter f6fc1055e4 tc: m_xt: Prevent a segfault in libipt
This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These
are still not usable with this patch, but at least tc doesn't crash
anymore when one tries to use them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-30 17:38:19 -07:00
Roi Dayan d315b706e9 devlink: Add option to set and show eswitch encapsulation support
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.

The actual encap/decap is carried out (along with the matching and other
actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel
key action.

Possible values are enable/disable.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-05-30 17:36:52 -07:00
David Ahern 05a14fc121 netlink: Change rtnl_dump_done to always show error
The original code which became rtnl_dump_done only shows netlink errors
if the protocol is NETLINK_SOCK_DIAG, but netlink dumps always appends
the length which contains any error encountered during the dump. Update
rtnl_dump_done to always show the error if there is one.

As an *example* without this patch, dumping a route object that exceeds
the internal buffer size terminates with no message to the user -- the
dump just ends because the NLMSG_DONE attribute was received. With this
patch the user at least gets a message that the dump was aborted.

$ ip ro ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.10.0.0/16 dev veth1 proto kernel scope link src 10.10.0.1
172.16.1.0/24 dev br0.11 proto kernel scope link src 172.16.1.1
Error: Buffer too small for object
Dump terminated

The point of this patch is to notify the user of a failure versus
silently exiting on a partial dump. Because the NLMSG_DONE attribute
was received, the entire dump needs to be restarted to use a larger
buffer for EMSGSIZE errors. That could be done automatically but it
has other user impacts (e.g., duplicate output if the dump is
restarted) and should be the subject of a different patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
2017-05-30 17:32:38 -07:00
Baruch Siach 98447086f8 ip: include libc headers first
Including libc headers first helps as a workaround to redefinition of struct
ethhdr with a suitably patched musl libc that suppresses the kernel
if_ether.h.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2017-05-30 17:27:58 -07:00
Stephen Hemminger 8612ca2f13 update headers to get IFLA_EVENT
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-30 10:14:01 -07:00
Stephen Hemminger 0071f3d058 update headers to get changes for TCA_FLOWER
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-26 17:12:41 -07:00
Stephen Hemminger d4473c0257 update to current net-next headers 2017-05-26 17:11:02 -07:00
Roman Mashak cba134ae70 tc: fix Makefile to build skbmod
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-05-22 13:33:51 -07:00
Jiri Pirko d19f72f789 tc/actions: introduce support for goto chain action
Allow user to set control action "goto" with filter chain index as
a parameter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko e67aba5595 tc: actions: add helpers to parse and print control actions
Each tc action is terminated by a control action. Each action parses and
prints then intividually. Introduce set of helpers and allow to share
this code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Jiri Pirko 732f03461b tc_filter: add support for chain index
Allow user to put filter to a specific chain identified by index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-22 13:31:51 -07:00
Stephen Hemminger cda81a4ea5 include: remove no longer used iptables_common.h
Reported-by: Baruch Siach <baruch@tkos.co.il>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-22 13:22:22 -07:00
Khem Raj ae717baf15 tc: include stdint.h explicitly for UINT16_MAX
Fixes
| tc_core.c:190:29: error: 'UINT16_MAX' undeclared (first use in this function); did you mean '__INT16_MAX__'?
|    if ((sz >> s->size_log) > UINT16_MAX) {
|                              ^~~~~~~~~~

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2017-05-22 11:41:53 -07:00
Stephen Hemminger a2325adf0f update headers from 4.12-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-22 11:06:29 -07:00
David Ahern 4af4471606 ip: add support for more MPLS labels
Kernel now supports up to 30 labels but not defined as part of the uapi.
iproute2 handles up to 8 labels but in a non-consistent way. Update ip
to handle more labels, but in a more programmatic way.

For the MPLS address family, the data field in inet_prefix is used for
labels.  Increase that field to 64 u32's -- 64 as nothing more than a
convenient power of 2 number.

Update mpls_pton to take the length of the address field, convert that
length to number of labels and add better error handling to the parsing
of the user supplied string.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-22 11:03:02 -07:00
Amir Vadai f3e1b2448a pedit: Introduce ipv6 support
Add support for modifying IPv6 headers using pedit.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai a13426fe1a pedit: Check for extended capability in protocol parser
Do not allow using eth and udp header types if non-extended pedit kABI
is being used. Other protocol parsers already have this check.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai cdca191862 pedit: Do not allow using retain for too big fields
Using retain for fields longer than 32 bits is not supported.
Do not allow user to do it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai 290cdc058d pedit: Fix a typo in warning
'ex' attribute should be placed after 'action pedit' and not after
'munge'.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Girish Moodalbail 01c659955a vxlan: Add support for modifying vxlan device attributes
Ability to change vxlan device attributes was added to kernel through
commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
cannot do the same through ip(8) command.  Changing the allowed vxlan
device attributes using 'ip link set dev <vxlan_name> type vxlan
<allowed_attributes>' currently fails with 'operation not supported'
error.  This failure is due to the incorrect rtnetlink message
construction for the 'ip link set' operation.

The vxlan_parse_opt() callback function is called for parsing options
for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
down default values for those attributes that were not provided as CLI
options. However, for the 'set' case we should be only passing down the
explicitly provided attributes and not any other (default) attributes.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
2017-05-11 11:11:08 -07:00
David Ahern aac40403ea ip: mpls: fix printing of mpls labels
If the kernel returns more labels than iproute2 expects, none of
the labels are printed and (null) is shown instead:
    $ ip -f mpls ro ls
    101 as to (null) via inet 172.16.2.2 dev virt12
    201 as to 202/203 via inet6 2001:db8:2::2 dev virt12

Remove the use of MPLS_MAX_LABELS and rely on buffer length that is
passed to mpls_ntop. With this change ip can print the label stack
returned by the kernel up to 255 characters (limit is due to size of
buf passed in) which amounts to 31 labels with a separator.

With this change the above is:
    $ ip/ip -f mpls ro ls
    101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev virt12

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-11 11:08:02 -07:00
Alexander Alemayhu 5be9971c73 tc: bpf: add ppc64 and sparc64 to list of archs with eBPF support
sparc64 support was added in 7a12b5031c6b (sparc64: Add eBPF JIT., 2017-04-17)[0]
and ppc64 in 156d0e290e96 (powerpc/ebpf/jit: Implement JIT compiler for extended BPF, 2016-06-22)[1].

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7a12b5031c6b
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=156d0e290e96
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-05-08 23:05:35 -07:00
Or Gerlitz e57285b81a tc: Reflect HW offload status
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-05-05 09:49:25 -07:00
Stephen Hemminger 76557951f5 update kernel headers during 4.12 merge window
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-05 09:48:54 -07:00
Arkadi Sharshevsky 153c1a9b21 devlink: Add support for pipeline debug (dpipe)
Add support for pipeline debug (dpipe). The headers are used both the
gain visibillity into the headers supported by the hardware, and to
build the headers/field database which is used by other commands.

Examples:

First we can see the headers supported by the hardware:

$devlink dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta
  field:
    name erif_port bitwidth 32 mapping_type ifindex
    name l3_forward bitwidth 1
    name l3_drop bitwidth 1

Note that mapping_type is presented only if relevant. Also the header/
field id's are reported by the kernel they are not shown by default.
They can be observed by using the -v option. Also the headers scope
(global/local) is specified.

$devlink -v dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta id 0 global false
  field:
    name erif_port id 0 bitwidth 32 mapping_type ifindex
    name l3_forward id 1 bitwidth 1
    name l3_drop id 2 bitwidth 1

Second we can examine the tables supported by the hardware. In order
to dump all the tables no table name should be provided:
$devlink dpipe table show pci/0000:03:00.0

In order to examine specific table its name have to be specified:
$devlink dpipe table show pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  name mlxsw_erif size 800 counters_enabled true
  match:
    type field_exact header mlxsw_meta field erif_port mapping ifindex
  action:
    type field_modify header mlxsw_meta field l3_forward
    type field_modify header mlxsw_meta field l3_drop

To enable/disable counters on the table:
$devlink dpipe table set pci/0000:03:00.0 name erif counters enable
$devlink dpipe table set pci/0000:03:00.0 name erif counters disable

In order to see the current entries in the hardware for specific table:
$devlink dpipe table dump pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  index 0 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 383 value 0
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

  index 1 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 381 value 1
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

In the above example the table contains two entries which does match
on erif port and forwards the packet or drop it (currently only the
forward count is implemented). The counter values are provided for
example. In case the counting is not enabled on the table the counters
will not be available.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-05-03 09:29:43 -07:00
Arkadi Sharshevsky 4f10cede93 devlink: Change netlink attribute validation
Currently the netlink attribute resolving is done by a sequence of
if's. Change the attribute resolving to table lookup.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
2017-05-03 09:29:42 -07:00
Phil Sutter 6a78ef97b6 man: ip.8: Document -brief flag
Brief output is especially useful for new users, so at least mention
it's existence in ip man page.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-05-03 09:28:40 -07:00
Stephen Hemminger 5bffc12ef4 Merge branch 'net-next' 2017-05-03 09:28:10 -07:00
Stephen Hemminger cbc7929b21 v4.11.0 2017-05-01 09:32:25 -07:00
Boris Pismenny cfd2e727f0 ip xfrm: Add xfrm state crypto offload
syntax:
ip xfrm state .... offload dev <if-name> dir <in or out>

Example to add inbound offload:
  ip xfrm state .... offload dev mlx0 dir in
Example to add outbound offload:
  ip xfrm state .... offload dev mlx0 dir out

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
2017-05-01 09:30:25 -07:00
Daniel Borkmann a872b870a5 bpf: add support for generic xdp
Follow-up to commit c7272ca720 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca720, but the driver still lacks native XDP
support.

  # ip link
  [...]
  3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
      link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
  [...]

In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca720,
but with 'xdpgeneric' instead of 'xdp' command for loading:

  # ip -force link set dev eno1 xdpgeneric obj xdp.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
2017-05-01 09:28:19 -07:00
Stephen Hemminger 7ff1fce549 update headers to 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:27:46 -07:00
Stephen Hemminger d2b9100a08 Merge branch 'master' into net-next 2017-05-01 09:26:51 -07:00
Stephen Hemminger 1e600da057 pedit: fix whitespace
Add newlines to break long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:25:22 -07:00
Or Gerlitz 3d2a7781ec tc/pedit: p_udp: introduce pedit udp support
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 999 \
    action pedit ex munge \
      udp dport set 888 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 2c6eb12ab8 tc/pedit: p_tcp: introduce pedit tcp support
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
      dst_port 80 \
    action pedit ex munge \
      tcp dport set 8080 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 3cd5149ecd tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai fa4652ff3b tc/pedit: Support fields bigger than 32 bits
Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 8d193d9607 tc/pedit: p_ip: introduce editing ttl header
Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai c05ddaf9e0 tc/pedit: Introduce 'add' operation
This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 7c71a40cbd tc/pedit: Extend pedit to specify offset relative to mac/transport headers
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Amir Vadai 51536ebbe8 tc/pedit: Fix a typo in pedit usage message
Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Stephen Hemminger bb6ab47b16 iplink: whitespace cleanup
Break lines to conform to 80 col guideline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-05-01 09:13:09 -07:00
Zhang Shengju 432b92a702 iplink: add support for IFLA_CARRIER attribute
Add support to set IFLA_CARRIER attribute.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2017-05-01 09:06:54 -07:00
Michal Kubeček 6ec14f1abb routel: fix infinite loop in line parser
As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.

Improve the parser to

  (1) only set variables for keywords we care about
  (2) recognize (currently) known keywords without value

This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2017-04-27 16:42:29 -07:00
Phil Sutter 843fc90068 man: ip-rule.8: Further clarify how to interpret priority value
Despite the past changes, users seemed to get confused by the seemingly
contradictory relation of priority value and actual rule priority.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-24 11:43:09 -07:00
Craig Gallek ad4b1425c3 iplink: Expose IFLA_*_FWMARK attributes for supported link types
This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.

Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:14:46 -07:00
Stephen Hemminger 590dde3a98 Merge branch 'master' into net-next 2017-04-23 09:14:35 -07:00
Craig Gallek 35893864c8 gre6: fix copy/paste bugs in GREv6 attribute manipulation
Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek <kraig@google.com>
2017-04-23 09:13:07 -07:00
Jamal Hadi Salim fd8b3d2c1b actions: Add support for user cookies
Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-04-23 09:10:02 -07:00
Stephen Hemminger 0e3cdd9ce0 remove unused header file sysctl.h
Not referred to in current source tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:47:30 -07:00
Stephen Hemminger 5b0aa88737 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-21 17:41:33 -07:00
David Lebrun e1b7f883e5 man: add documentation for IPv6 SR commands
This patch adds information about seg6 encapsulation in the ip-route
manual, as well as the ip-sr manual page.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun e8493916a8 iproute: add support for SR-IPv6 lwtunnel encapsulation
This patch adds support for SEG6 encapsulation type
("ip route add ... encap seg6 ...").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
David Lebrun 9386332823 ip: add ip sr command to control SR-IPv6 internal structures
This patch adds commands to support the tunnel source properties
("ip sr tunsrc") and the HMAC key -> secret, algorithm binding
("ip sr hmac").

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
2017-04-16 10:21:43 -07:00
Stephen Hemminger 85dd6ab510 add seg6.h kernel headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:21:34 -07:00
Stephen Hemminger 2c6a0636e2 Update kernel headers from 4.11 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-16 10:19:32 -07:00
David Ahern 0da8250be8 ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:19:32 -07:00
David Ahern f443565f8d ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-16 10:06:33 -07:00
David Ahern c6858ef431 ip netconf: show all families on dev request
Currently specifying a device to ip netconf and it dumps only values
for IPv4. Change this to dump data for all families unless a specific
family is given.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern f052f5dfe0 ip netconf: Show all address families by default in dumps
Currently, 'ip netconf' only shows ipv4 and ipv6 netconf settings. If IPv6
is not enabled, the dump ends with
    RTNETLINK answers: Operation not supported

when IPv6 request is attempted. Further, if the mpls_router module is also
loaded a separate request is needed to get MPLS settings.

To make this better going forward, use the new PF_UNSPEC dump all option
if the kernel supports it. If the kernel does not, it sets NLMSG_ERROR and
returns EOPNOTSUPP which is trapped and we fall back to the existing output
to maintain compatibility with existing kernels.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
David Ahern 3ad6d17638 netlink: Add flag to suppress print of nlmsg error
Allow callers of the dump API to handle nlmsg errors (e.g., an
unsupported feature). Setting RTNL_HANDLE_F_SUPPRESS_NLERR in the
rtnl_handle avoids unnecessary messages to the users in some case.
For example,

  RTNETLINK answers: Operation not supported

when probing for support of a new feature.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-14 16:00:15 -07:00
Stephen Hemminger dfb60ddd29 Merge branch 'master' into net-next 2017-04-14 15:59:12 -07:00
Stephen Hemminger 2d3af1675d netem: fix out of bounds access in maketable
The maketable program used to generate one of the configuration
files at build time for netem would access past the end of the array
for one input value. This is a bug inherited from original NISTnet.
Just fold the value, like other code there.

This is not a runtime error security problem.
It only impacts the build process if the build machine
had extra hardening enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-12 10:13:14 -07:00
Robert Shearman 9688cf3b7a iproute: Add support for MPLS LWT ttl attribute
Add support for setting and displaying the ttl attribute
for MPLS IP lighweight tunnels.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-12 10:02:15 -07:00
Robert Shearman c44d18ea96 iproute: Add support for ttl-propagation attribute
Add support for setting and displaying the ttl-propagation attribute
initially used by MPLS to control propagation of MPLS TTL to IPv4/IPv6
TTL/hop-limit on popping final label on a per-route basis.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2017-04-12 10:02:15 -07:00
Stephen Hemminger 19beb1aa16 Merge branch 'master' into net-next 2017-04-12 10:02:07 -07:00
Timothy Redaelli 5551ed44d3 ip-route: Prevent some other double spaces in output
Print spaces only after text.

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-04-12 09:53:23 -07:00
Stephen Hemminger 45f78b4dec update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-04 14:57:46 -07:00
Stephen Hemminger f4878dfae4 Merge branch 'master' into net-next 2017-04-04 14:56:41 -07:00
Phil Sutter 058d28b44c man: ip-link: Specify min/max values for bridge slave priority and cost
The values are parsed as u16/u32, but kernel limits allowed values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:54:44 -07:00
Phil Sutter 9fd7b86c2d ip: link: Add missing link type help texts
These are basically stubs: The types which lacked their own help text
simply don't accept any options (yet). Still it might be a bit confusing
to users if they are presented with the generic 'ip link' help text
instead of something saying there are no type specific options.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter 8b47135474 ip: link: Unify link type help functions a bit
Take help function in iplink_bridge.c as an example and make other link
types' help functions similar:

* Use a single fprintf() call (if possible).
* Don't state a full command line, just "... type OPTIONS".
* Put every option in it's own line, align options by column.
* List mandatory options first.

link_veth.c is intentionally left untouched because it's 'peer' option
eats all kinds of generic link options and the help text points this out
without duplicating all the options there again.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter e336868e09 ip: link: macvlan: Add newline to help output
A newline between synopsis and variable definition looks nice and is
consistent with others.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Phil Sutter be985020ab ip: link: bond: Fix whitespace in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-04-04 14:51:29 -07:00
Sabrina Dubroca 3fbb5d43bb man: ip-link.8: document bridge options
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2017-04-04 14:50:02 -07:00
Roman Mashak 878babffec tc: print skbedit action when dumping actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-04-04 14:48:54 -07:00
Alexander Alemayhu 5caba410c2 man: fix man page warnings
While generating PDFs from the man pages, I saw the warning below from
several files. Compared the tc-matchall.8 with bridge.8 and used .RI
instead of .R. It should have no effect on the man page rendering.

    `R' is a string (producing the registered sign), not a macro.

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
2017-04-04 14:46:34 -07:00
Stephen Hemminger b285ba9ea4 update headers from net-next (post 4.11-rc3)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-20 10:18:50 -07:00
Stephen Hemminger 10b9d499b6 Merge branch 'master' into net-next 2017-03-20 10:18:17 -07:00
Stephen Hemminger cfca3b356a update headers from 4.11-rc3
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-20 10:17:01 -07:00
Vincent Bernat 97d564b90c vxlan: use preferred address family when neither group or remote is specified
When neither group or remote is specified (or if they are specified with
the any address), nothing is sent to the kernel. In this case, the
kernel defaults to IPv4. This makes impossible to use IPv6 with
unspecified unicast remote ("bridge fdb add" will return
EAFNOTSUPPORT).

If the user specifies a preferred address family (eg, "ip -6 link add"),
then send either IFLA_VXLAN_GROUP or IFLA_VXLAN_GROUP6 to enforce the
use of the appropriate family.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
2017-03-20 10:16:09 -07:00
David Ahern 3e14fd0411 ip route: Add missing space between nexthop and via for mpls multipath routes
MPLS multipath routes are missing a space between 'nexthop' and 'via':

$ ip -net ns1 -f mpls ro ls
100
	nexthopvia inet 172.16.2.2  dev virt12
	nexthopvia inet 172.16.3.2  dev br0

Add it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-03-20 10:14:44 -07:00
Stephen Hemminger 4303c0dd99 Merge branch 'master' into net-next 2017-03-14 16:42:59 -07:00
Alexander Alemayhu 0db70c59e1 man: add examples to ip.8
Having some examples in the top level man page might make it a little bit easier
for new users to get started. Reused some words / sentences from the existing
man pages.

Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org>
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
2017-03-14 16:41:13 -07:00
Jiri Kosina 7c581a124d iproute2: add support for invisible qdisc dumping
Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking
kernel to perform 'full qdisc dump', as for historical reasons some of the
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-14 16:37:08 -07:00
Stephen Hemminger 2099b98385 update headers from net-next
Get TCA_DUMP_INVISIBLE and SCTP changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-14 16:36:36 -07:00
Stephen Hemminger 8fded9ffad update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-13 08:32:13 -07:00
Stephen Hemminger a4280cfa72 update headers from 4.11-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-13 08:30:55 -07:00
Robert Shearman ad0e37403f man: Fix formatting of vrf parameter of ip-link show command
Add missing opening " [" for the vrf parameter.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
2017-03-10 08:58:17 -08:00
Stephen Hemminger 60ccfcd7f2 pie: remove always false condition
When built with GCC warnings enabled:
q_pie.c: In function ‘pie_parse_opt’:
q_pie.c:78:38: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (alpha > ALPHA_MAX) || (alpha < ALPHA_MIN)) {
                                      ^
q_pie.c:85:35: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (beta > BETA_MAX) || (beta < BETA_MIN)) {
                                   ^

This is because MIN is 0 and unsigned number can never be less than 0.
Therefore just remove the _MIN values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-10 08:58:01 -08:00
Robert Shearman 837552b445 iplink: add support for afstats subcommand
Add support for new afstats subcommand. This uses the new
IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
per-device, AF-specific stats. At the moment the kernel only supports
MPLS AF stats, so that is all that's implemented here.

The print_num function is exposed from ipaddress.c to be used for
printing the new stats so that the human-readable option, if set, can
be respected.

Example of use:

    $ ./ip/ip -f mpls link afstats dev eth1
    3: eth1
        mpls:
            RX: bytes  packets  errors  dropped  noroute
            9016       98       0       0        0
            TX: bytes  packets  errors  dropped
            7232       113      0       0

Signed-off-by: Robert Shearman <rshearma@brocade.com>
2017-03-10 08:44:55 -08:00
Phil Sutter 32b1a12713 man: ss.8: Add missing protocols to description of -A
The list was missing dccp and sctp protocols.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-03-10 08:42:13 -08:00
Roi Dayan 639785ff30 devlink: Add json and pretty options to help and man
While at it also fixed missing double dash for long opts.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-03-08 17:59:01 -08:00
Daniel Borkmann 51361a9f1c bpf: test for valid type in bpf_get_work_dir
Jan-Erik reported an assertion in bpf_prog_to_subdir() failed where
type was BPF_PROG_TYPE_UNSPEC, which is only used in bpf_init_env()
to auto-mount and cache the bpf fs mount point.

Therefore, make sure when bpf_init_env() is called multiple times
(f.e. eBPF classifier with eBPF action attached) and bpf_mnt_cached
is set already that the type is also valid. In bpf_init_env(), we're
only interested in the mount point and not a type-specific subdir.

Fixes: e42256699c ("bpf: make tc's bpf loader generic and move into lib")
Reported-by: Jan-Erik Rediger <janerik@rediger.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-03-08 17:57:00 -08:00
Petr Vorel 54eab4c79a color: use "light" colors for dark background
COLORFGBG environment variable is used to detect dark background.

Idea and a bit of code is borrowed from Vim, thanks.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-03-03 09:58:05 -08:00
Stephen Hemminger d896797c7b bpf: remove unnecessary cast
No need to cast RTA_DATA

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:25:02 -08:00
Stephen Hemminger a59b616200 tc: use rta_getattr_u32
Don't cast RTA_DATA use newish accessors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:24:34 -08:00
Stephen Hemminger 84da4099e9 xfrm: remove unnecessary casts
Since RTA_DATA() returns void * no need to cast it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 15:23:14 -08:00
Jiri Kosina be67f81297 iproute2: tc: introduce build dependency on libnetlink
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-24 15:11:32 -08:00
Stephen Hemminger 9f1370c0e5 netlink route attribute cleanup
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA().  Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-24 08:56:38 -08:00
Daniel Borkmann e37d706b56 {f,m}_bpf: dump tag over insns
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-02-23 09:02:19 -08:00
Roi Dayan 164a9ff401 tc: flower: Fix parsing ip address
Fix order of arguments when passed to __flower_parse_ip_addr.

Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-23 09:01:15 -08:00
David Ahern 76f7d89d4d ip: Add support for MPLS netconf
Add support for MPLS netconf to ip monitor and ip netconf commands.
Changes to header files not included as those are typically pulled
in my a header sync with the kernel.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-23 08:58:40 -08:00
Stephen Hemminger 3f34574d0f Update headers based on 4.11 merge window
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-23 08:58:11 -08:00
Stephen Hemminger ae429903d7 update headers from net-next
updated sctp.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-20 08:53:50 -08:00
Stephen Hemminger 2b99748a60 add missing iplink_xstats.c
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-20 08:53:40 -08:00
Stephen Hemminger 29926015ea Merge branch 'master' into net-next 2017-02-20 08:51:22 -08:00
Stephen Hemminger f36ba8a4cd v4.10.0 2017-02-20 08:47:52 -08:00
Jiri Pirko cdd2f7ccd7 devlink: use DEVLINK_CMD_ESWITCH_* instead of DEVLINK_CMD_ESWITCH_MODE_*
Sync with kernel and don't use the obsolete enum values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-02-19 12:01:47 -08:00
Nikolay Aleksandrov 217264a079 iplink: bridge_slave: add support for displaying xstats
This patch adds support to the bridge_slave link type for displaying
xstats by reusing the previously added bridge xstats callbacks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-02-18 16:37:24 -08:00
Nikolay Aleksandrov 60ec0ecf0f iplink: bridge: add support for displaying xstats
Add support for the new parse/print_ifla_xstats callbacks and use them to
print the per-bridge multicast stats.
Example:
$ ip link xstats type bridge
br0
                    IGMP queries:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP reports:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP leaves: RX: 0 TX: 0
                    IGMP parse errors: 0
                    MLD queries:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD reports:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD leaves: RX: 0 TX: 0
                    MLD parse errors: 0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-02-18 16:36:58 -08:00
Nikolay Aleksandrov 94f1a22aa7 iplink: add support for xstats subcommand
This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:36:01 -08:00
Stephen Hemminger bb8771573a Merge branch 'master' into net-next 2017-02-18 16:32:16 -08:00
Leon Romanovsky b77c77d294 devlink: Call dl_free in early exit case
Prior to parsing command options, the devlink tool allocates memory
to store results. In case of early exit (wrong parameters or version
check), this memory wasn't freed.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2017-02-18 16:29:56 -08:00
Lucas Bates 5e4dc1951e man page: add page for skbmod action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-18 16:27:41 -08:00
Stephen Hemminger d250da9c68 Merge branch 'master' into net-next 2017-02-18 16:21:20 -08:00
Stephen Hemminger 2bf1a81a2f utils: hex2mem get rid of unnecessary goto
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:18:55 -08:00
Stephen Hemminger c72dab6624 Merge branch 'master' into net-next 2017-02-18 16:07:32 -08:00
Stephen Hemminger 835784525a update headers from 4.10-rc8
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-18 16:05:37 -08:00
Stephen Hemminger b6d8c4a606 Merge branch 'merge-4.10' of /tmp/iproute2 2017-02-18 16:04:25 -08:00
Stephen Hemminger ac94e16ca2 Merge branch 'merge-4.10' into next-merge 2017-02-17 15:34:24 -08:00
David Ahern b5377431df ip vrf: Detect invalid vrf name in pids command
Verify VRF name is valid before attempting to read cgroups files.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 6a9783831c ip vrf: Handle VRF nesting in namespace
Since cgroups are not namespace aware, the directory heirarchy used by
ip vrf should account for network namespaces. In this case, change the
path from CGRP/BASE/vrf/NAME to CGRP/BASE/NETNS/vrf/NAME where CGRP is
the cgroup2 mount path, BASE in any base heirarchy inherited before VRF
is applied and NAME is the VRF name.

The intent is as follows: a user logs into the box into some namespace
with a name known to iproute2. Some other policy may have put the
process into a BASE heirarchy. From there the user executes a task in
a VRF and in doing so the task heirarchy becomes CGRP/BASE/NETNS/vrf/NAME.
The namespace level is omitted for the default namespace.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 9c49438a67 ip netns: refactor netns_identify
Move guts of netns_identify into a standalone function that returns
the netns name in a given buffer.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
David Ahern 46afa6947b ip vrf: Handle vrf in a cgroup hierarchy
Add support for VRF in a pre-existing hierarchy. For example, if the
current process is running in CGRP/foo/bar, the 'ip vrf exec NAME CMD'
should run CMD in the cgroup CGRP/foo/bar/vrf/NAME.

When listing process ids in a VRF, search for the directory vrf/NAME
regardless of base path (foo/bar/vrf/NAME and vrf/NAME) are still
running against the same vrf NAME.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:33:24 -08:00
Stephen Hemminger 732b18af97 Merge branch 'merge-4.10' into next-merge 2017-02-17 15:32:28 -08:00
Simon Horman 6374961a00 tc: flower: support masked ICMP code and type match
Extend ICMP code and type match to support masks.

Also add missing documentation to synopsis in manpage.

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman 9d36e54f36 tc: flower: provide generic masked u8 print helper
Provide generic masked u8 print helper and use it to print arp operations.

Also:
* Make name parameter of arp op print helper const.
* Consistently use __u8 rather than uint8_t, in keeping with the
  pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Simon Horman 180136e540 tc: flower: provide generic masked u8 parser helper
Provide generic masked u8 paser helper and use it to parse arp operations.

Also consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:32:03 -08:00
Stephen Hemminger cad5493448 update headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-17 15:30:50 -08:00
Stephen Hemminger 9b0d47e58a Merge branch 'master' into next-merge 2017-02-17 15:29:24 -08:00
Asbjørn Sloth Tønnesen d754a64aed testsuite: search for kernel config in /boot
Add support for finding the kernel config in Debian
and derivatives.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2017-02-17 15:26:30 -08:00
Asbjørn Sloth Tønnesen 3064a44c69 testsuite: refactor kernel config search
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2017-02-17 15:26:30 -08:00
Or Gerlitz afdc1fed24 tc: matchall: Print skip flags when dumping a filter
Print the skip flags when we dump a filter.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2017-02-17 15:25:24 -08:00
David Ahern 1ca2e08bd0 ip route: Make name of protocol 0 consistent
iproute2 can inconsistently show the name of protocol 0 if a route with
a custom protocol is added. For example:
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto none  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto none  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto none  metric 0  pref medium

protocol 0 is pretty printed as "none". Add a route with a custom protocol:
  dsa@cartman:~$ sudo ip -6 ro add  2001:db8:200::1/128 dev eth0 proto 123

And now display has switched from "none" to "unspec":
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto unspec  metric 0  pref medium

The rt_protos file has the id to name mapping as "unspec" while
rtnl_rtprot_tab[0] has "none". The presence of a custom protocol id
triggers reading the rt_protos file and overwriting the string in
rtnl_rtprot_tab. All of this is logic from 2004 and earlier.

Update rtnl_rtprot_tab to "unspec" to match the enum value.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-02-17 15:12:29 -08:00
Hangbin Liu e83435fcd7 man: ip-link.8: Document bridge_slave fdb_flush option
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-02-09 17:31:43 -08:00
Phil Sutter 3cef95926b testsuite: Search kernel config in modules dir also
At least in Fedora there is no /proc/config.gz but instead
/lib/modules/`uname -r`/config, so use that as a fallback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-09 17:28:48 -08:00
Phil Sutter 886f2c43b5 testsuite: Generate nlmsg blob at runtime
Since netlink messages are in host byte order, shipping a pre-generated
nlmsg blob won't suffice on systems with different endianness. Therefore
generate the blob at runtime, so it's content fits the hosts endianness.

Note that the generated message will contain only a single interface
featuring two VFs instead of the full list before. Yet this is
sufficient, as it triggers the crash with iproute versions prior to
commit 8c29ae7cc2 ("ip link: Fix crash on older kernels when show VF
dev").

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-09 17:28:48 -08:00
Simon Horman c7ec052bb8 tc: flower: Update documentation to indicate ARP takes IPv4 prefixes
Unlike other PREFIXes documented in the usage for tc flower, which accept
both IPv4 and IPv6 prefixes, arp_sip and arp_tip only accepts IPv4
prefixes.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:39:33 -08:00
Simon Horman 81f6e5a727 tc: flower: use correct type when calling flower_icmp_attr_type
Use enum flower_icmp_field rather than bool as type of third parameter
when calling flower_icmp_attr_type.

Fixes: eb3b5696f1 ("tc: flower: support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-02-08 11:37:44 -08:00
Hangbin Liu 1e5b0e80ff man: ip-link.8: Document bridge_slave fdb_flush option
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-02-08 11:36:22 -08:00
Stephen Hemminger f0337c4475 tc: add missing sample file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-07 11:53:24 -08:00
Stephen Hemminger 985091aa8c update headers from bridge tunnel metadata
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-07 11:52:49 -08:00
Yotam Gigi b32c0b64fa tc: bash-completion: Add support for matchall
Add support for the matchall classifier and its parameters.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi a9d2f4d861 tc: bash-completion: Add support for filter actions
Previously, the autocomplete routine did not complete actions after a
filter keyword, for example:

$ tc filter add dev eth0 u32 [...] action <TAB>

did not suggest the actions list, and:

$ tc filter add dev eth0 u32 [...] action mirred <TAB>

did not suggest the specific mirred parameters. Add the support for this
kind of completion by adding the _tc_filter_action_options routine and
invoking it from inside _tc_filter_options.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 57086f7b25 tc: bash-completion: Make the *_KIND variables global
The QDISC_KIND, FILTER_KIND, ACTION_KIND variables may be used by other
routines, thus make them global variables.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi f62b54a106 tc: bash-completion: Prepare action autocomplete to support several actions
The action autocomplete routine (_tc_action_options) currently does not
support several actions statements in one tc command line as it uses the
_tc_once_attr and _tc_one_from_list.

For example, in that case:

$ tc filter add dev eth0 handle ffff: u32 [...]  \
	   action sample group 5 rate 12 	 \
	   action sample <TAB>

the _tc_once_attr function, when invoked with "group rate" will not
suggest those as they already exist on the command line.

Fix the function to use the _from variant, thus allowing each action
autocomplete start from the action keyword, and not from the beginning of
the command line.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 26e0996a87 tc: bash-completion: Add the _from variant to _tc_one* funcs
The _tc_one_of_list and _tc_once_attr functions simplfy the bash
completion task by validating each attr exist only once on the command
line.

For example, for the command line:

$ a b c d e

and the call to _tc_once_attr with "a f g", the function will suggest
"f g" as "a" existed in the command line in args 0.

Add the _from variant to those functions, which allows having the command
line option once from a specified index. In the previous example, calling
_tc_once_attr with 4 and "a f g" will suggest "a f g".

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-07 11:44:53 -08:00
Yotam Gigi 787317f50a tc: man: matchall: Update examples to include sample
Add an example of packet sampling to the tc-matchall man page examples
section. The example uses the matchall classifier and the sample action to
create packet sampling on a port.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Yotam Gigi 515e943d76 tc: man: Add man entry for the tc-sample action
In addition to general information about the tc action, the man entry
contains common usage examples and information about the tlv fields packed
within each sampled packet.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Yotam Gigi 0b1abd84fb tc: Add support for the sample tc action
The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).

The sampled packets contain informative metadata, for example, the input
interface and the original packet length.

The action syntax:
tc filter add [...] \
	action sample rate <RATE> group <GROUP> [trunc <SIZE>]
	[...]

Where:
  RATE := The sampling rate which is the ratio of packets observed at the
	  data source to the samples generated
  GROUP := the psample module sampling group
  SIZE := optional truncation size

An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
       matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
Stephen Hemminger 818a10a77f Merge branch 'master' into net-next 2017-02-06 14:13:27 -08:00
Stephen Hemminger 17c4c446bd tcp: header file update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-02-06 14:08:07 -08:00
Stephen Hemminger b5de688592 Merge branch 'master' into net-next 2017-02-06 14:07:13 -08:00
Phil Sutter 72dfff6e11 man: ip-route.8: Fix 'expires' indenting
Descriptions of each route sub-command's arguments are enclosed in
.RS/.RE pairs. For 'replace' sub-command, '.RE' was incorrectly put
before the last argument ('expires').

Fixes: 3fbe7ca847 ("iproute2: ip-route.8.in: Add expires option for ip route")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-02-06 13:52:52 -08:00
Eric Dumazet 38e6dbc4b3 ss: print tcpi_rcv_mss and tcpi_advmss
tcpi_rcv_mss and tcpi_advmss tcp info fields were not yet reported
by ss.

While adding GRO support to packetdrill, I found this was useful.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2017-02-06 13:50:29 -08:00
Ralf Baechle e7867c34e8 ip: HSR: Fix cut and paste error
Fixes: 5c0aec93a5 ("ip: Add HSR support")
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-02-06 13:49:02 -08:00
Nogah Frankel aaacdfd570 ifstat: Add xstat to ifstat man page
Add documentation about the extended statistics to the ifstat man page.
Add ifstat man age to the man8 Makefile

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 1c2df61344 ifstat: Add "sw only" extended statistics to ifstat
Add support for extended statistics of SW only type, for counting only the
packets that went via the cpu. (useful for systems with forward
offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.

It is under the name 'cpu_hits'
(or any shorten of it as 'cpu' or simply 'c')

For example:
ifstat -x c

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 5a52102b7c ifstat: Add extended statistics to ifstat
Extended stats are part of the RTM_GETSTATS method. This patch adds them
to ifstat.
While extended stats can come in many forms, we support only the
rtnl_link_stats64 struct for them (which is the 64 bits version of struct
rtnl_link_stats).
We support stats in the main nesting level, or one lower.
The extension can be called by its name or any shorten of it. If there is
more than one matched, the first one will be picked.

To get the extended stats the flag -x <stats type> is used.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Nogah Frankel 3d8048dcc3 ifstat: Includes reorder
Reorder the includes in misc/ifstat.c to match convention.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
2017-02-03 09:20:15 -08:00
Yotam Gigi d65a744cdb tc: man: matchall: Fix example indentation
The man page contains two examples, which have different indentation. Fix
the indentation of the two examples to match.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-01-31 16:18:33 -08:00
Stephen Hemminger b479a7d75b update kernel headers from net-next 2017-01-29 20:31:31 -08:00
Stephen Hemminger fefc93bb28 Merge branch 'master' into net-next 2017-01-29 20:30:05 -08:00
Roman Mashak 31951c47e9 tc: distinguish Add/Replace action operations.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2017-01-29 20:26:44 -08:00
Phil Sutter 6bbe5e6290 man: tc-csum.8: Fix example
This fixes two issues with the provided example:

- Add missing 'dev' keyword to second command.
- Use a real IPv4 address instead of a bogus hex value since that will
  be rejected by get_addr_ipv4().

Fixes: dbfb17a67f ("man: tc-csum.8: Add an example")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-29 20:25:35 -08:00
Benjamin LaHaise 4f7d406f5d f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"
v2 - update to address changes in 00697ca19a.

When using the tc flower filter, rules marked with "protocol all" do not
actually match all packets.  This is due to a bug in f_flower.c that passes
in ETH_P_ALL in the TCA_FLOWER_KEY_ETH_TYPE attribute when adding a rule.
Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to
ETH_P_ALL.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2017-01-29 20:23:58 -08:00
Paul Blakey 08f66c80c0 tc: flower: Refactor matching flags to be more user friendly
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no"  (e.g nofrag) unsets the flag,
otherwise it wil be set.

Example:
    # add a flower filter that will drop fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags frag \
    action drop

    # add a flower filter that will drop non-fragmented packets
    tc filter add dev ens4f0 protocol ip parent ffff: \
            flower \
            src_mac e4:1d:2d:fd:8b:01 \
            dst_mac e4:1d:2d:fd:8b:02 \
            indev ens4f0 \
            ip_flags nofrag \
    action drop

Fixes: 22a8f01989 ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 10:36:45 -08:00
Hangbin Liu d1b41236e1 iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH
This patch implements support for the IFLA_BRPORT_FLUSH attribute
in iproute2 so it can flush bridge slave's fdb dynamic entries.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu c3f1e3c425 iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION
This patch implements support for the IFLA_BR_MCAST_MLD_VERSION
attribute in iproute2 so it can change the mcast mld version.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 19756950f7 iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
This patch implements support for the IFLA_BR_MCAST_IGMP_VERSION
attribute in iproute2 so it can change the mcast igmp version.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 6ddad009e2 iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
This patch implements support for the IFLA_BR_MCAST_STATS_ENABLED
attribute in iproute2 so it can enable/disable mcast stats accounting.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu 7d93b567ea iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
This patch implements support for the IFLA_BR_VLAN_STATS_ENABLED
attribute in iproute2 so it can enable/disable vlan stats accounting.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-20 10:32:34 -08:00
Hangbin Liu f3372d62eb iplink: bridge: add support for IFLA_BR_FDB_FLUSH
This patch implements support for the IFLA_BR_FDB_FLUSH attribute
in iproute2 so it can flush bridge fdb dynamic entries.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2017-01-20 10:32:34 -08:00
Nikolay Aleksandrov e763e3310e ipmroute: add support for RTNH_F_UNRESOLVED
This patch adds a new field that is printed in the end of the line which
denotes the real entry state. Before this patch an entry's IIF could
disappear and it would look like an unresolved one (iif = unresolved):
(3.0.16.1, 225.11.16.1)          Iif: unresolved

with no way to really distinguish it from an unresolved entry.
After the patch if the dumped entry has RTNH_F_UNRESOLVED set we get:
(3.0.16.1, 225.11.16.1)          Iif: unresolved  State: unresolved

for unresolved entries and:
(0.0.0.0, 225.11.11.11)          Iif: eth4       Oifs: eth3  State: resolved

for resolved entries after the OIF list. Note that "State:" has ':' in
it so it cannot be mistaken for an interface name.

And for the example above, we'd get:
(0.0.0.0, 225.11.11.11)          Iif: unresolved     State: resolved

Also when dumping all routes via ip route show table all,
 it will show up as:
multicast 225.11.16.1/32 from 3.0.16.1/32 table default proto 17 unresolved

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:41:26 -08:00
David Ahern 11f2c75315 ip route: error out on multiple via without nexthop keyword
To specify multiple nexthops in a route the user is expected to use the
"nexthop" keyword which ip route uses to create the RTA_MULTIPATH.
However, ip route always accepts multiple 'via' keywords where only the
last one is used in the route leading to confusion. For example, ip
accepts this syntax:
    $ ip ro add vrf red  1.1.1.0/24 via 10.100.1.18 via 10.100.2.18

but the route entered inserted by the kernel is just the last gateway:
    1.1.1.0/24 via 10.100.2.18 dev eth2

which is not the full request from the user. Detect the presense of
multiple 'via' and give the user a hint to add nexthop:

    $ ip ro add vrf red  1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
    Error: argument "via" is wrong: use nexthop syntax to specify multiple via

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:38:20 -08:00
Davide Caratti 6561cb28f2 tc: m_csum: add support for SCTP checksum
'sctp' parameter can now be used as 'csum' target to enable CRC32c
computation on SCTP packets.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2017-01-20 09:32:08 -08:00
Stephen Hemminger a044b36af3 update kernel headers from 4.10 net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:28:36 -08:00
Stephen Hemminger 9174b4cf3e Merge branch 'master' into net-next 2017-01-20 09:27:57 -08:00
Roi Dayan 00697ca19a tc: flower: Fix incorrect error msg about eth type
addattr16 may return an error about the nl msg size
but not about incorrect eth type.

Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Roi Dayan c85609b25f tc: flower: Add missing err check when parsing flower options
addattr32 may return an error.

Fixes: cfcabf18d8 ("tc: flower: Add skip_{hw|sw} support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2017-01-20 09:27:34 -08:00
Stephen Hemminger 6166cc35be update kernel headers (from 4.10-rc4)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:26:27 -08:00
Alexander Heinlein d5eb0564da ip/xfrm: Fix deleteall when having many policies installed
Fix "Policy buffer overflow" when trying to use deleteall with many
policies installed.

Signed-off-by: Alexander Heinlein <alexander.heinlein@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-20 09:21:02 -08:00
Jiri Benc c3d09fba93 Revert "man pages: add man page for skbmod action"
This reverts commit a40995d1c7.

The patch is missing the actual tc-skbmod.8 file which causes 'make
install' to fail:

install -m 0755 -d /tmp/ip/usr/share/man/man8
install -m 0644 ip-address.8 ip-link.8 ip-route.8 ip.8 arpd.8 lnstat.8
routel.8 rtacct.8 rtmon.8 rtpr.8 ss.8 tc.8 tc-bfifo.8 tc-bpf.8 tc-cbq.8
tc-cbq-details.8 tc-choke.8 tc-codel.8 tc-fq.8 tc-drr.8 tc-ematch.8
tc-fq_codel.8 tc-hfsc.8 tc-htb.8 tc-pie.8 tc-mqprio.8 tc-netem.8 tc-pfifo.8
tc-pfifo_fast.8 tc-prio.8 tc-red.8 tc-sfb.8 tc-sfq.8 tc-stab.8 tc-tbf.8
bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 ip-addrlabel.8 ip-fou.8 ip-gue.8
ip-l2tp.8 ip-macsec.8 ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8
ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 ip-tcp_metrics.8
ip-netconf.8 ip-token.8 tipc.8 tipc-bearer.8 tipc-link.8 tipc-media.8
tipc-nametable.8 tipc-node.8 tipc-socket.8 tc-basic.8 tc-cgroup.8 tc-flow.8
tc-flower.8 tc-fw.8 tc-route.8 tc-tcindex.8 tc-u32.8 tc-matchall.8
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8
tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 tc-skbmod.8
tc-tunnel_key.8 devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8
devlink-sb.8 /tmp/ip/usr/share/man/man8
install: cannot stat ‘tc-skbmod.8’: No such file or directory
make[2]: *** [install] Error 1
make[1]: *** [install] Error 2

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2017-01-18 08:59:54 -08:00
Roi Dayan b2141de1ad tc: flower: Fix flower output for src and dst ports
This fix a missing use case after the introduction of enum flower_endpoint.

Fixes: 6910d65661 ("tc: flower: introduce enum flower_endpoint")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
2017-01-17 08:45:22 -08:00
Jamal Hadi Salim 1c570c50a3 utils: make hex2mem available to all users
hex2mem() api is useful for parsing hexstrings which are then packed in
a stream of chars.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-01-17 08:45:22 -08:00
Petr Vorel 530903dd90 ip: fix igmp parsing when iface is long
Entries with long vhost names in /proc/net/igmp have no whitespace
between name and colon, so sscanf() adds it to vhost and
'ip maddr show iface' doesn't include inet result.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
2017-01-17 08:39:55 -08:00
Phil Sutter a05b9557f4 tc: m_xt: Drop needless parentheses from #if checks
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-13 16:33:54 -08:00
Stephen Hemminger facfc5c1c0 include: remove unused header
not used by any source here

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-13 14:11:12 -08:00
Stephen Hemminger 65047fa641 add more uapi header files
In order to ensure no backward/forward compatiablity problems,
make sure that all kernel headers used come from the local copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:54:39 -08:00
Simon Horman f888f4e205 tc: flower: Support matching ARP
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \                    arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \                   arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-12 17:46:37 -08:00
Stephen Hemminger e2ade8cefb kernel headers update
For flower, etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:45:30 -08:00
Stephen Hemminger 51dd3455a3 Merge branch 'master' into net-next 2017-01-12 17:44:44 -08:00
Simon Horman aeeaae2fa9 tc: ife: correct spelling of prio in example
Correct typo in example in ife man page.

Fixes: 06f9a59170 ("man: tc-ife.8: man page for ife action")
Cc: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-12 17:40:19 -08:00
Nikolay Aleksandrov 7f10090b9f bridge: fdb: add state filter support
This patch adds a new argument to the bridge fdb show command that allows
to filter by entry state.
Also update the man page to include all available show arguments.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2017-01-12 17:38:55 -08:00
David Ahern 5ffbf4508c rttable: Fix invalid range checking when table id is converted to u32
Frank reported that table ids for very large numbers are not properly
detected:
$ ip li add foobar type vrf table 98765432100123456789

command succeeds and resulting table id is actually:

21: foobar: <NOARP,MASTER> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether da:ea:d4:77:38:2a brd ff:ff:ff:ff:ff:ff promiscuity 0
    vrf table 4294967295 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Make the temp variable 'i' unsigned long and let the typecast to u32
happen on assignment to id.

Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-12 17:34:22 -08:00
David Forster 40f9070d94 ip6tunnel: Align ipv6 tunnel key display with ipv4
Show ipv6 tunnel keys on presence of GRE_KEY flag for tunnel types
other than GRE. Aligns ipv6 behaviour with ipv4.

Signed-off-by: dforster@brocade.com
2017-01-12 17:34:02 -08:00
Phil Sutter 97a02cabef tc: m_xt: Fix segfault with iptables-1.6.0
Said iptables version introduced struct xtables_globals field
'compat_rev', a function pointer. Initializing it is mandatory as
libxtables calls it without existence check.

Without this, tc segfaults when using the xt action like so:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
|	action xt -j MARK --set-mark 20

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-01-12 17:32:26 -08:00
Stephen Hemminger 3bad1dbb20 whitespace cleanup
Get rid of blanks at end of line and extra lines at eof

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-12 17:31:20 -08:00
David Ahern 719e331ff6 Add support for rt_protos.d
Add support for reading proto id/name mappings from rt_protos.d
directory. Allows users to have custom protocol values converted
to human friendly names.

Each file under rt_protos.d has the 'id name' format used by
rt_protos. Only .conf files are read and parsed.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-12 17:31:18 -08:00
David Ahern 9b036afd3c ip vrf: Improve bpf error messages
Next up a non-root user gets various bpf related error messages:

$ ip vrf exec mgmt bash
Failed to load BPF prog: 'Operation not permitted'
Kernel compiled with CGROUP_BPF enabled?

Catch the EPERM error and do not show the kernel config option.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:09 -08:00
David Ahern 2bbc5b0726 ip vrf: Improve cgroup2 error messages
Currently, if a non-root user attempts to run ip vrf exec a non-helpful
error is returned:

$ ip vrf exec mgmt bash
Failed to mount cgroup2. Are CGROUPS enabled in your kernel?

Only show the CGROUPS kernel hint for the ENODEV error and for the
rest show the strerror for the errno. So now:

$ ip/ip vrf exec mgmt bash
Failed to mount cgroup2: Operation not permitted

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:08 -08:00
David Ahern edbae5e0b2 ip vrf: Fix run-on error message on mkdir failure
Andy reported a missing newline if a non-root user attempts to run
'ip vrf exec':

$ ./ip/ip vrf exec default /bin/echo asdf
mkdir failed for /var/run/cgroup2: Permission deniedFailed to setup vrf cgroup2 directory

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2017-01-09 12:13:08 -08:00
Simon Horman a5ae170ed8 tc: flower: Update dest UDP port documentation
Since 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
tc flower supports setting the dest UDP port.

* Use "port_number" to be consistent with other man-page text
* Re-add "enc_dst_port" documentation to manpage which was
  accidently removed by b2a1f740aa ("tc: flower: document that *_ip
  parameters take a PREFIX as an argument.")

Cc: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-01-09 12:09:46 -08:00
Stephen Hemminger e467a283b1 minor kernel header update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-01-09 12:09:26 -08:00
Stephen Hemminger 1693e4f257 Merge branch 'master' into net-next 2017-01-09 12:08:34 -08:00
David Michael bb18c98198 tc: make tc linking depend on libtc.a
There was a race condition where the command to link the tc binary
could (rarely) run before the libtc.a archive existed.
2017-01-09 12:06:58 -08:00
Paul Blakey 22a8f01989 tc: flower: support matching flags
Enhance flower to support matching on flags.

The 1st flag allows to match on whether the packet is
an IP fragment.

Example:

	# add a flower filter that will drop fragmented packets
	# (bit 0 of control flags)
	tc filter add dev ens4f0 protocol ip parent ffff: \
		flower \
		src_mac e4:1d:2d:fd:8b:01 \
		dst_mac e4:1d:2d:fd:8b:02 \
		indev ens4f0 \
		matching_flags 0x1/0x1 \
	action drop

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-29 10:42:08 -08:00
Stephen Hemminger d34adf67b5 Merge branch 'master' into net-next 2016-12-29 10:31:44 -08:00
Alexey Kodanev 7f97744777 fix typo in ip-xfrm man page, rmd610 -> rmd160
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
2016-12-29 10:24:35 -08:00
Baruch Siach d421bb4efe tc: add missing limits.h header
This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
                             ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
2016-12-29 10:24:35 -08:00
Hadar Hen Zion f6d3126ef9 tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
tunnel key set parameters includes also dest UDP port, add it to the
usage.

Fixes: 449c709c38 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
2016-12-22 11:02:00 -08:00
Hadar Hen Zion bf73c650ac tc/cls_flower: Add to the usage encapsulation dest UDP port
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.

Fixes: 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
2016-12-22 11:02:00 -08:00
Simon Horman c2078f8dc4 tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
  to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
  filters from the kernel.

Example of use of LLADDR with and without a mask:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 16:07:53 -08:00
Simon Horman b2a1f740aa tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
  optional prefix length which is used to provide a mask to limit the scope
  of matching.
* This is documented as a PREFIX in keeping with ip-route(8).

Example of uses of IPv4 and IPv6 prefixes

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 dst_ip 2001:DB8::1 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 16:07:41 -08:00
Stephen Hemminger 8578bb731d Revert "tc: flower: Allow *_mac options to accept a mask"
This reverts commit 0390185078.
2016-12-21 16:06:49 -08:00
Stephen Hemminger 10da552800 Revert "tc: flower: document that *_ip parameters take a PREFIX as an argument."
This reverts commit a8a1dccd2a.
2016-12-21 16:06:35 -08:00
Stephen Hemminger 176b6b7329 update kernel headers 2016-12-21 15:58:49 -08:00
Roman Mashak 00fe039dd5 tc: updated man page to reflect filter-id use in filter GET command.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2016-12-21 15:56:39 -08:00
Roman Mashak 17b9668a86 tc: fixed man page fonts for keywords and variable values
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2016-12-21 15:56:39 -08:00
Julien Fortin fd4ca03935 ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
Fixes: 4fb4a10e12 ("ipaddress: Print IFLA_VF_QUERY_RSS_EN setting”)

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-12-21 15:56:39 -08:00
Simon Horman 0390185078 tc: flower: Allow *_mac options to accept a mask
* The argument to src_mac and dst_mac may now take an optional mask
  to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
  filters from the kernel.

Example of use of LLADDR with and without a mask:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 15:56:39 -08:00
Simon Horman a8a1dccd2a tc: flower: document that *_ip parameters take a PREFIX as an argument.
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
  optional prefix length which is used to provide a mask to limit the scope
  of matching.
* This is documented as a PREFIX in keeping with ip-route(8).

Example of uses of IPv4 and IPv6 prefixes

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 dst_ip 2001:DB8::1 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-21 15:56:39 -08:00
David Ahern ee9369a05f ip netns: Reset vrf to default VRF on namespace switch
A vrf is local to a namespace. Drop any VRF association before trying
to exec a command in the new namespace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern 2917b4f41a ip vrf: Fix reset to default VRF
Path in vrf_switch for "default" VRF is supposed to be MNT/vrf not
MNT/default. Also, default_vrf flag is redundant with ifindex. Remove
the flag in favor of ifindex != 0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern b5efa59763 ip vrf: Refactor ipvrf_identify
Split ipvrf_identify into arg processing and a function that does the
actual cgroup file parsing. The latter function is used in a follow
on patch.

In the process, convert the reading of the cgroups file to use fopen
and fgets just in case the file ever grows beyond 4k. Move printing
of any error message and the vrf name to the caller of the new
vrf_identify.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
David Ahern c94112faf5 ip vrf: Move kernel config hint to prog_load failure
Move the hint about CGROUP_BPF enabled to prog_load failure since
it fails before the attach. Update the existing error message to
print to stderr.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-21 15:56:39 -08:00
Stephen Hemminger 9f9ccc89f7 configure: fix elftest when warnings enabled
If compile testing with -W then elftest.c would fail because
of unused variables.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-14 19:11:02 -08:00
David Ahern 8b59612f99 Fix compile warning in get_addr_1
A recent cleanup causes a compile warning on Debian jessie:

    CC       utils.o
utils.c: In function ‘get_addr_1’:
utils.c:486:21: warning: passing argument 1 of ‘ll_addr_a2n’ from incompatible pointer type
   len = ll_addr_a2n(&addr->data, sizeof(addr->data), name);
                     ^
In file included from utils.c:34:0:
../include/rt_names.h:27:5: note: expected ‘char *’ but argument is of type ‘__u32 (*)[8]’
 int ll_addr_a2n(char *lladdr, int len, const char *arg);
     ^

Revert the removal of the typecast

Fixes: e1933b9281 ("utils: cleanup style")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-14 19:00:36 -08:00
Roman Mashak 530753184a tc: pass correct conversion specifier to print 'unsigned int' action index.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-12-14 19:00:36 -08:00
Stephen Hemminger ab91aee4b0 ipvrf: cleanup style issues
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:43:24 -08:00
Stephen Hemminger e1933b9281 utils: cleanup style
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:41:36 -08:00
Stephen Hemminger 892a25e286 libnetlink: break up dump function
Indentation is deep here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-13 10:41:29 -08:00
David Ahern 1949f82cdf Introduce ip vrf command
'ip vrf' follows the user semnatics established by 'ip netns'.

The 'ip vrf' subcommand supports 3 usages:

1. Run a command against a given vrf:
       ip vrf exec NAME CMD

   Uses the recently committed cgroup/sock BPF option. vrf directory
   is added to cgroup2 mount. Individual vrfs are created under it. BPF
   filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
   device index. From there the current process (ip's pid) is addded to
   the cgroups.proc file and the given command is exected. In doing so
   all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
   the VRF domain.

   The association is inherited parent to child allowing the command to
   be a shell from which other commands are run relative to the VRF.

2. Show the VRF a process is bound to:
       ip vrf id
   This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
   entry with the VRF name following.

3. Show process ids bound to a VRF
       ip vrf pids NAME
   This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
   shows the process ids in the particular vrf cgroup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 463d9efaa2 libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".

Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 2330490f0e change name_is_vrf to return index
index of 0 means name is not a valid vrf.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 1dafceb1c9 Add filesystem APIs to lib
Add make_path to recursively call mkdir as needed to create a given
path with the given mode.

Add find_cgroup2_mount to lookup path where cgroup2 is mounted. If it
is not already mounted, cgroup2 is mounted under /var/run/cgroup2 for
use by iproute2.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 08bd33d77f move cmd_exec to lib utils
Code move only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-13 10:20:16 -08:00
David Ahern 10e51a76a9 bpf: Add BPF_ macros
Based on version in kernel repo, samples/bpf/libbpf.h

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
David Ahern 869d889eed bpf: export bpf_prog_load
Code move only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
David Ahern fc4ccce038 lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-13 10:20:15 -08:00
Roi Dayan 7d59d6354f tc: tunnel_key: Add tc-tunnel_key man page to Makefile
To be installed with the other man pages.

Fixes: d57639a475 ("tc/act_tunnel: Introduce ip tunnel action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
2016-12-13 10:15:11 -08:00
Roi Dayan 5c46a8fd61 tc: flower: Fix typo and style in flower man page
Replace vlan_eth_type with vlan_ethtype.

Fixes: 745d917260 ("tc: flower: Introduce vlan support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
2016-12-13 10:15:11 -08:00
Hadar Hen Zion 449c709c38 tc/m_tunnel_key: Add dest UDP port to tunnel key action
Enhance tunnel key action parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-13 10:15:11 -08:00
Hadar Hen Zion 41aa17ff46 tc/cls_flower: Add dest UDP port to tunnel params
Enhance IP tunnel parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2016-12-13 10:15:11 -08:00
Stephen Hemminger b723368caa lwtunnel: style cleanup
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:37:00 -08:00
Thomas Graf b15f440e78 lwt: BPF support for LWT
Adds support to configure BPF programs as nexthop actions via the LWT
framework.

Example:
   ip route add 192.168.253.2/32 \
     encap bpf out obj lwt_len_hist_kern.o section len_hist \
     dev veth0

Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-12-12 15:32:54 -08:00
Stephen Hemminger ba2a2124ec update to net-next headers (pre 4.10 rc)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:26:55 -08:00
Stephen Hemminger 2a56c090e4 Merge branch 'master' into net-next 2016-12-12 15:24:40 -08:00
Stephen Hemminger ae0969c893 v4.9.0 2016-12-12 15:07:42 -08:00
Stephen Hemminger dc5622cb66 update to 4.9 release headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-12 15:06:18 -08:00
David Ahern eebc7cc192 Makefile: really suppress printing of directories
Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
defined however Config always defines VERBOSE. Update the check to
whether VERBOSE is 0.

Fixes: 57bdf8b764 ("Make builds default to quiet mode")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-09 12:49:44 -08:00
Simon Horman eb3b5696f1 tc: flower: support matching on ICMP type and code
Support matching on ICMP type and code.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: flower \
	indev eth0 ip_proto icmp type 8 code 0 action drop

tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-09 12:46:34 -08:00
Simon Horman 6910d65661 tc: flower: introduce enum flower_endpoint
Introduce enum flower_endpoint and use it instead of a bool
as the type for paramatising source and destination.

This is intended to improve read-ability and provide some type
checking of endpoint parameters.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-09 12:45:59 -08:00
Daniel Borkmann c7272ca720 bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.

Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.

Minimal example:

  clang -target bpf -O2 -Wall -c prog.c -o prog.o
  ip [-force] link set dev em1 xdp obj prog.o       # attaching
  ip [-d] link                                      # dumping
  ip link set dev em1 xdp off                       # detaching

For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-12-09 12:44:12 -08:00
Daniel Borkmann fb24802b9c bpf: check for owner_prog_type and notify users when differ
Kernel commit 21116b7068b9 ("bpf: add owner_prog_type and accounted mem
to array map's fdinfo") added support for telling the owner prog type in
case of prog arrays. Give a notification to the user when they differ,
and the program eventually fails to load.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-12-09 12:44:12 -08:00
Thomas Graf 0f74d0f3a9 bpf: Fix number of retries when growing log buffer
The log buffer is automatically grown when the verifier output does not
fit into the default buffer size. The number of growing attempts was
not sufficient to reach the maximum buffer size so far.

Perform 9 iterations to reach max and let the 10th one fail.

j:0     i:65536         max:16777215
j:1     i:131072        max:16777215
j:2     i:262144        max:16777215
j:3     i:524288        max:16777215
j:4     i:1048576       max:16777215
j:5     i:2097152       max:16777215
j:6     i:4194304       max:16777215
j:7     i:8388608       max:16777215
j:8     i:16777216      max:16777215

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-12-09 12:42:11 -08:00
Roi Dayan 6566ca8cdb devlink: Add option to set and show eswitch inline mode
This is needed for some HWs to do proper macthing and steering.
Possible values are none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-12-09 12:41:03 -08:00
Roi Dayan a93b6bb3a2 devlink: Add usage help for eswitch subcommand
Add missing usage help for devlink dev eswitch subcommand.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-12-09 12:40:52 -08:00
Stephen Hemminger 3dd0bb51d7 update kernel headers from net-next
Net-next now closed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-09 12:40:07 -08:00
Stephen Hemminger e6fee79104 Merge branch 'master' into net-next 2016-12-09 12:38:51 -08:00
Stephen Hemminger e49aef96bb update kernel headers 2016-12-09 12:38:35 -08:00
Stephen Hemminger b95e5c55a9 Revert "devlink: Add usage help for eswitch subcommand"
This reverts commit 11f4cd31d2.
2016-12-09 12:37:39 -08:00
Stephen Hemminger d646916993 Revert "devlink: Add option to set and show eswitch inline mode"
This reverts commit b9dcf9c282.

Intended for net-next
2016-12-09 12:37:19 -08:00
Simon Horman 6bd5b80cdc tc: flower: make use of flower_port_attr_type() safe and silent
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
  Only access tb[] in the case of the former.
* Do not access null entries in tb[]

Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.

Fixes: a1fb0d4842 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 61dff9ac10 tc: flower: correct name of ip_proto parameter to flower_parse_port()
This corrects a typo.

Fixes: a1fb0d4842 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 6ad7e60c1f tc: flower: document SCTP ip_proto
Add SCTP ip_proto to help text and man page.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-05 10:13:26 -08:00
Simon Horman 730381fede tc: flower: remove references to eth_type in manpage
Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.

Also correct formatting of boldface text with whitespace.

Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-02 14:59:43 -08:00
Stephen Hemminger 143a704bf8 update kernel headers from net-next
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-02 14:54:33 -08:00
Stephen Hemminger f2df31170f Merge branch 'master' into net-next 2016-12-02 14:19:08 -08:00
Simon Horman 1dd0cca7fa ss: initialise variables outside of for loop
Initialise for loops outside of for loops. GCC flags this as being
out of spec unless C99 or C11 mode is used.

With this change the entire tree appears to compile cleanly with -Wall.

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
...
$ make
...
ss.c: In function ‘unix_show_sock’:
ss.c:3128:4: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
...

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-12-02 14:17:09 -08:00
Amir Vadai d57639a475 tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Amir Vadai bb9b63b18e tc/cls_flower: Classify packet in ip tunnels
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action mirred egress redirect dev vnet0

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Amir Vadai aab0f61043 libnetlink: Introduce rta_getattr_be*()
Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and
change existing code to use it.

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Phil Sutter 039b3620cf ss: unix_show: No need to initialize members of calloc'ed structs
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter b710a72254 ss: Make sstate_namel local to scan_state()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 1882c0db02 ss: Make sstate_name local to sock_state_print()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 96d45daa92 ss: Make unix_state_map local to unix_show()
Also make it const, since there won't be any write access happening.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 2f938ce1fa ss: Get rid of single-fielded struct snmpstat
A struct with only a single field does not make much sense. Besides
that, it was used by print_summary() only.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:47 -08:00
Phil Sutter 6b224dad23 ss: Get rid of useless goto in handle_follow_request()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter b3535dd61d ss: Make slabstat_ids local to get_slabstat()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 95eafe438a ss: Make some variables function-local
addrp_width and screen_width are used in main() only, so no need to have
them globally available.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter b25bad2ffe ss: Make user_ent_hash_build_init local to user_ent_hash_build()
By having it statically defined, there is no need for it to be global.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 86dfa1be4a ss: Make tmr_name local to tcp_timer_print()
It's used only there, so no need to have it globally defined.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 0cb74a8610 ss: Turn generic_proc_open() wrappers into macros
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter f25062e9e7 ss: Eliminate unix_use_proc()
This function is used only at a single place anymore, so replace the
call to it by it's content, which makes that specific part of
unix_show() consistent with e.g. tcp_show().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 2d0e538f3e ss: Drop list traversal from unix_stats_print()
Although this complicates the dedicated procfs-based code path in
unix_show() a bit, it's the only sane way to get rid of unix_show_sock()
output diverging from other socket types in that it prints all socket
details in a new line.

As a side effect, it allows to eliminate all procfs specific code in
the same function.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 5f27ac1db9 ss: introduce proc_ctx_print()
This consolidates identical code in three places. While the function
name is not quite perfect as there is different proc_ctx printing code
in netlink_show_one() as well, I sadly didn't find a more suitable one.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter be7e4d20b9 ss: Use sockstat->type in all socket types
Unix sockets used that field already to hold info about the socket type.
By replicating this approach in all other socket types, we can get rid
of protocol parameter in inet_stats_print() and have sock_state_print()
figure things out by itself.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 4519999708 ss: Add missing tab when printing UNIX details
When dumping UNIX sockets and show_details is active but not show_mem
(ss -xne), the socket details are printed without being prefixed by tab.
Fix this by printing the tab character when either one of '-e' or '-m'
has been specified.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 6babc649a9 ss: Drop empty lines in UDP output
When dumping UDP sockets and show_tcpinfo (-i) is active but not
show_mem (-m), print_tcpinfo() does not output anything leading to an
empty line being printed after every socket. Fix this by skipping the
call to print_tcpinfo() and the previous newline printing in that case.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Phil Sutter 36df1a6e92 ss: Mark fall through in arg parsing switch()
As there is a certain chance of overlooking this, better add a comment
to draw readers' attention.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-02 14:07:46 -08:00
Yuchung Cheng b6c7fc61fa ss: print new tcp_info fields: busy, rwnd-limited, sndbuf-limited times
Dump some new fields added to tcp_info in v4.10: tcpi_busy_time,
tcpi_rwnd_limited, tcpi_sndbuf_limited.

Example output for a flow busy for 110ms but never measurably limited by
receive window or send buffer:
   busy:110ms

Example output for a flow usually limited by receive window:
   busy:111ms rwnd_limited:101ms(91.0%)

Example output for a flow sometimes limited by send buffer:
   busy:50ms sndbuf_limited:10ms(20.0%)

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-12-01 11:00:28 -08:00
Neal Cardwell 2f579872fb ss: print new tcp_info fields: delivery_rate and app_limited
Dump the new delivery_rate and delivery_rate_app_limited fields that
were added to tcp_info in Linux v4.9.

Example output:
  pacing_rate 65.7Mbps delivery_rate 62.9Mbps

And for the application-limited case this looks like:
  pacing_rate 1031.1Mbps delivery_rate 87.4Mbps app_limited

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-12-01 11:00:28 -08:00
Cyrill Gorcunov 41fe6c34de ss: Add inet raw sockets information gathering via netlink diag interface
unix, tcp, udp[lite], packet, netlink sockets already support diag
interface for their collection and killing. Implement support
for raw sockets.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-12-01 10:55:56 -08:00
Cyrill Gorcunov 9f66764e30 libnetlink: Add test for error code returned from netlink reply
In case if some diag module is not present in the system,
say the kernel is not modern enough, we simply skip the
error code reported. Instead we should check for data
length in NLMSG_DONE and process unsupported case.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2016-12-01 10:55:56 -08:00
Stephen Hemminger bf9a0aff36 Update kernel headers for XDP and tcp_info
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-12-01 10:52:30 -08:00
Stephen Hemminger d6ad31db57 Merge branch 'master' into net-next 2016-12-01 10:48:05 -08:00
Phil Sutter f5f760b812 man: ip-route.8: Add notes about dropped IPv4 route cache
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-12-01 10:47:11 -08:00
Stephen Hemminger 328374dcfe Merge branch 'master' into net-next 2016-12-01 10:29:12 -08:00
Roi Dayan b9dcf9c282 devlink: Add option to set and show eswitch inline mode
This is needed for some HWs to do proper macthing and steering.
Possible values are none, link, network, transport.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-11-29 19:17:20 -08:00
Roi Dayan 11f4cd31d2 devlink: Add usage help for eswitch subcommand
Add missing usage help for devlink dev eswitch subcommand.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2016-11-29 19:17:20 -08:00
Zhang Shengju 6bd1ea28c5 link: add team and team_slave link type
Add missing team and team_slave link type.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 14:03:00 -08:00
Stephen Hemminger 281db53ff8 l2tp: style cleanup
Make l2tp conform to kernel style guidelines
2016-11-29 13:40:06 -08:00
Asbjørn Sloth Tønnesen 51a9d01aaa man: ip-l2tp.8: document UDP checksum options
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen f7982f5c95 l2tp: show tunnel: expose UDP checksum state
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 8a11421a5d l2tp: support sequence numbering
This patch implement and documents the user interface for
sequence numbering.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 35cc6ded4f l2tp: read IPv6 UDP checksum attributes from kernel
In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX}
the old hard-coded value is being preserved, since the attribute flag will be
missing.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen c73fad7860 l2tp: fix L2TP_ATTR_UDP_CSUM handling
L2TP_ATTR_UDP_CSUM is read by the kernel as a NLA_FLAG value,
but is validated as a NLA_U8, so we will write it as an u8,
but the value isn't actually being read by the kernel.

It is written by the kernel as a NLA_U8, so we will read as
such.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 4d51b3331e l2tp: fix L2TP_ATTR_{RECV,SEND}_SEQ handling
L2TP_ATTR_RECV_SEQ and L2TP_ATTR_SEND_SEQ are declared as NLA_U8
attributes in the kernel, so let's threat them accordingly.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 31f63e7c42 l2tp: fix integers with too few significant bits
udp6_csum_{tx,rx}, tunnel and session are the only ones
currently used.

recv_seq, send_seq, lns_mode and data_seq are partially
implemented in a useless way.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen d0baf5cac8 man: ip-l2tp.8: remove non-existent tunnel parameter name
The name parameter is only valid for sessions, not tunnels.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Asbjørn Sloth Tønnesen 222c4dab8e man: ip-l2tp.8: fix l2spec_type documentation
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
2016-11-29 13:31:30 -08:00
Roman Mashak 98df0c81da tc: distinguish Add/Replace filter operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 13:26:10 -08:00
Daniel Hopf 3a4df03913 macsec: Nr. of packets and octets for macsec tx stats were swapped
Acked-by: Rami Rosen <roszenrami@gmail.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Daniel Hopf <daniel.hopf@continental-corporation.com>
2016-11-29 13:22:12 -08:00
Mike Frysinger eca7a74219 ifstat/nstat: fix help output alignment
Some lines use tabs while others use spaces.  Use spaces everywhere.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2016-11-29 13:17:08 -08:00
Stephen Hemminger 2c500a4dc2 libnetlink: style cleanups
Follow kernel style related cleanups:
 * break long lines
 * remove unnecessary void * cast
2016-11-29 13:15:08 -08:00
Zhang Shengju 1b109a30bf libnetlink: reduce size of message sent to kernel
Fixes commit 246f57c4086d99fa ("ip link: Add support for kernel
side filtering").

This patch reduce the size of message sent to kernel space. Before this
patch, for command: 'ip link show', we will sent 1056 bytes. With this
patch, we only need to send 40 bytes.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 13:03:00 -08:00
Zhang Shengju 2d98dd4821 iproute2: fix the link group name getting error
In the situation where more than one entry live in the same hash bucket,
loop to get the correct one.

Before:
$ cat /etc/iproute2/group
0	default
256     test

$ sudo ip link set group test dummy1

$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group 0 qlen 1000
    link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
    link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff

After:
$ ip link show type dummy
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4e:3b:d3:6c:f0:e6 brd ff:ff:ff:ff:ff:ff
12: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group test qlen 1000
    link/ether d6:9c:a4:1f:e7:e5 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-11-29 12:48:07 -08:00
david decotigny ba7b97776e iproute2: a non-expected rtnl message is an error 2016-11-29 12:44:30 -08:00
david decotigny 8be2955816 iproute2: avoid exit in case of error.
Be consistent with how non-0 print_route() return values are handled
elesewhere: return -1.
2016-11-29 12:44:30 -08:00
michael-dev@fami-braun.de aa1b44ca77 iproute2: macvlan: add "source" mode
Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com

v5:
 - rebase and fix checkpatch

v4:
 - add MACADDR_SET support
 - skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
 - fix man page
2016-11-29 12:41:42 -08:00
Daniel Borkmann e42256699c bpf: make tc's bpf loader generic and move into lib
This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.

  [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-11-29 12:35:32 -08:00
Lorenzo Colitti 82252cdc50 ip: support UID range routing.
- Support adding, deleting and showing IP rules with UID ranges.
- Support querying per-UID routes via "ip route get uid <UID>".

UID range routing was added to net-next in 4fb7450683 ("Merge
branch 'uid-routing'")

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-11-29 12:26:37 -08:00
Stephen Hemminger 512caeb273 tc: flower checkpatch cleanups
break long lines and minor whitespace changes.
2016-11-29 11:48:52 -08:00
Simon Horman a1fb0d4842 tc: flower: Support matching on SCTP ports
Support matching on SCTP ports in the same way that matching
on TCP and UDP ports is already supported.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: \
        flower indev eth0 ip_proto sctp dst_port 80 \
        action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-11-29 11:44:46 -08:00
Stephen Hemminger 1a97748be4 update net-next headers 2016-11-29 11:43:40 -08:00
Stephen Hemminger 6e2e71e16e update headers based on 4.9-rc7 2016-11-29 11:41:58 -08:00
Stephen Hemminger b932e6f372 tc: cleanup style of qdisc code
Get rid of lingering mismatches with kernel style.
2016-11-29 11:41:58 -08:00
Roman Mashak d42e1444f2 tc: print raw qdisc handle.
This is v2 patch with fixed code indentation.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Roman Mashak 4b5451c4cd tc: improved usage help for fw classifier.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Phil Sutter 4fb4a10e12 ipaddress: Print IFLA_VF_QUERY_RSS_EN setting
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:58 -08:00
Roman Mashak 7bdcc0d942 tc: updated man page to reflect GET command to retrieve a single filter.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-11-29 11:41:58 -08:00
Stephen Hemminger 468fa020f1 ip: style cleanup
Make code more inline with current kernel style
2016-11-29 11:41:58 -08:00
Phil Sutter ff9463e048 ipaddress: Simplify vf_info parsing
Commit 7b8179c780 ("iproute2: Add new command to ip link to
enable/disable VF spoof check") tried to add support for
IFLA_VF_SPOOFCHK in a backwards-compatible manner, but aparently overdid
it: parse_rtattr_nested() handles missing attributes perfectly fine in
that it will leave the relevant field unassigned so calling code can
just compare against NULL. There is no need to layback from the previous
(IFLA_VF_TX_RATE) attribute to the next to check if IFLA_VF_SPOOFCHK is
present or not. To the contrary, it establishes a potentially incorrect
assumption of these two attributes directly following each other which
may not be the case (although up to now, kernel aligns them this way).

This patch cleans up the code to adhere to the common way of checking
for attribute existence. It has been tested to return correct results
regardless of whether the kernel exports IFLA_VF_SPOOFCHK or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Greg Rose <grose@lightfleet.com>
2016-11-29 11:41:58 -08:00
Stephen Hemminger 168d97f97b ss: break really long lines 2016-11-29 11:41:58 -08:00
Phil Sutter f89d46ad63 ss: Add support for SCTP protocol
This makes use of the sctp_diag interface recently added to the kernel.

Joint work with Xin Long who provided the PoC implementation which I
merely polished up a bit.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:57 -08:00
Phil Sutter 5dec02d7b4 include: Add linux/sctp.h
Add sanitized UAPI linux/sctp.h header file.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-11-29 11:41:57 -08:00
Paul Blakey d9c3995ab7 tc: flower: Fix usage message
Remove left over usage from removal of eth_type argument.

Fixes: 488b41d020 ('tc: flower no need to specify the ethertype')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2016-11-12 10:19:06 +03:00
Isaac Boukris 878dadc79d iproute2: ss: escape all null bytes in abstract unix domain socket
Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed by ss the
same way the null prefix is currently being translated.

Signed-off-by: Isaac Boukris <iboukris@gmail.com>
2016-11-12 10:16:24 +03:00
stefan@datenfreihafen.org 8ae2c5382b ip: update link types to show 6lowpan and ieee802.15.4 monitor
Both types have been missing here and thus ip always showed
only the numbers.

Based on a suggestion from Alexander Aring.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2016-11-12 10:14:03 +03:00
Shmulik Ladkani 5eca0a3701 tc: m_mirred: Add support for ingress redirect/mirror
So far, only the 'egress' direction was implemented.

Allow specifying 'ingress' as the direction packet appears on the target
interface.

For example, this takes incoming 802.1q frames on veth0 and redirects
them for input on dummy0:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q basic \
     action mirred ingress redirect dev dummy0

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2016-10-26 11:20:47 -07:00
Stephen Hemminger e770979cf1 update kernel headers to 4.9-net-next 2016-10-26 11:20:29 -07:00
Stephen Hemminger f3f339e959 cleanup debris from revert
Last revert didn't come out clean.
2016-10-26 11:19:11 -07:00
Stephen Hemminger c07a36c3db Revert "iproute2: macvlan: add "source" mode"
This reverts commit f33b727610.

The upstream changes are not in 4.9
2016-10-26 11:15:09 -07:00
Daniel Borkmann 4710e46ec3 tc, ipt: don't enforce iproute2 dependency on iptables-devel
Since 5cd1adba79 ("Update to current iptables headers") compilation
of iproute2 broke for systems without iptables-devel package [1].
Reason is that even though we fall back to build m_ipt.c, the include
depends on a xtables-version.h header, which only ships with
iptables-devel. Machines not having this package fail compilation with:

    [...]
    CC       m_ipt.o
In file included from ../include/iptables.h:5:0,
                 from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory
compilation terminated.
../Config:31: recipe for target 'm_ipt.o' failed
make[1]: *** [m_ipt.o] Error 1

The configure script only barks that package xtables was not found in
the pkg-config search path. The generated Config then only contains f.e.
TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o
to TCMODULES. m_ipt.c then includes the local include/iptables.h header
copy, which includes the include/xtables.h copy. Latter then includes
xtables-version.h, which only ships with iptables-devel.

One way to resolve this is to skip this whole mess when pkg-config has
no xtables config available. I've carried something along these lines
locally for a while now, but it's just too annyoing. :/ Build works fine
now also when xtables.pc is not available.

  [1] http://www.spinics.net/lists/netdev/msg366162.html

Fixes: 5cd1adba79 ("Update to current iptables headers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-10-26 10:58:22 -07:00
Hangbin Liu 7a34b9d098 devlink: Convert conditional in dl_argv_handle_port() to switch()
Discovered by Phil's covscan. The final return statement is never reached.
This is not inherently clear from looking at the code, so change the
conditional to a switch() statement which should clarify this.

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-10-17 05:30:45 -07:00
Nikolay Aleksandrov 9208b4e7c9 bridge: add support for the multicast flood flag
Recently a new per-port flag was added which controls the flooding of
unknown multicast, this patch adds support for controlling it via iproute2.
It also updates the man pages with information about the new flag.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-17 05:29:24 -07:00
Jakub Kicinski 87e46a5198 tc: cls_bpf: handle skip_sw and skip_hw flags
Add support for controling hardware offload using (now standard)
skip_sw and skip_hw flags in cls_bpf.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
2016-10-17 05:27:59 -07:00
Nikolay Aleksandrov 660afec25f bridge: vlan: remove wrong stats help
When I did the per-vlan stats iproute2 support, I left out a hunk from a
previous version of the patch that was using a special subcommand "stats".
Since the latest version uses the -s switch remove the help for the stats
subcommand.

Fixes: 7abf5de677 ("bridge: vlan: add support to display per-vlan statistics")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-17 05:22:47 -07:00
Stephen Hemminger 7409334b87 ip: macvlan style cleanup
breaklong lines.
2016-10-12 15:23:27 -07:00
michael-dev@fami-braun.de f33b727610 iproute2: macvlan: add "source" mode
Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com
2016-10-12 15:22:14 -07:00
Lucas Bates a40995d1c7 man pages: add man page for skbmod action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:21:55 -07:00
Stephen Hemminger ec2e005fe5 tc_filter: style cleanup
Break long lines and whtespace changes.
2016-10-12 15:21:13 -07:00
Jamal Hadi Salim 120f556d15 tc filters: add support to get individual filters by handle
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol ip \
u32 match u32 0 0 flowid 1:1 \
action ok
sudo $TC filter add dev $ETH parent ffff: prio 1 protocol ip \
u32 match ip protocol 1 0xff flowid 1:10 \
action ok

now dump to see all rules..
$TC -s filter ls dev $ETH parent ffff: protocol ip
 ....
filter pref 1 u32
filter pref 1 u32 fh 801: ht divisor 1
filter pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10  (rule hit 0 success 0)
  match 00010000/00ff0000 at 8 (success 0 )
        action order 1: gact action drop
         random type none pass val 0
         index 6 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 336 success 336)
  match 00000000/00000000 at 0 (success 336 )
        action order 1: gact action pass
         random type none pass val 0
         index 5 ref 1 bind 1 installed 38 sec used 4 sec
        Action statistics:
        Sent 24864 bytes 336 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
 ....

..get filter 801::800
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 801:0:800 prio 2  u32

 ....
filter parent ffff: protocol ip pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10  (rule hit 260 success 130)
  match 00010000/00ff0000 at 8 (success 130 )
        action order 1: gact action drop
         random type none pass val 0
         index 6 ref 1 bind 1 installed 348 sec used 0 sec
        Action statistics:
        Sent 11440 bytes 130 pkt (dropped 130, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
 ....

..get other one
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 800:0:800 prio 2  u32

....
filter parent ffff: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 514 success 514)
  match 00000000/00000000 at 0 (success 514 )
        action order 1: gact action pass
         random type none pass val 0
         index 5 ref 1 bind 1 installed 506 sec used 4 sec
        Action statistics:
        Sent 35544 bytes 514 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
....

..try something that doesnt exist
$TC -s filter get dev $ETH parent ffff: protocol ip  handle 800:0:803 prio 2  u32

.....
RTNETLINK answers: No such file or directory
We have an error talking to the kernel
.....

Note, added NLM_F_ECHO is for backward compatibility. old kernels never
before Eric's patch will not respond without it and newer kernels (after Erics patch)
will ignore it.
In old kernels there is a side effect:
In addition to a response to the GET you will receive an event (if you do tc mon).
But this is still better than what it was before (not working at all).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:14:47 -07:00
Stephen Hemminger 557b705445 tc: skbmod style cleanup
break long lines
2016-10-12 15:12:51 -07:00
Jamal Hadi Salim 46871dc9c6 man pages: Add tc-ife to Makefile
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Lucas Bates d491a3480f man pages: update ife action to include tcindex
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Jamal Hadi Salim da65128998 actions: add skbmod action
This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
    munge offset -13 u8 set 0x15 \
    munge offset -12 u8 set 0x15 \
    munge offset -11 u8 set 0x15 \
    munge offset -10 u16 set 0x1515 \
    pipe

to:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15

Or worse, try to debug a policy with destination mac, source mac and
etherype. Then make that a hundred rules and you'll get my point.

The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesn't get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac).

In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Craig Dillabaugh 883c6708e4 action gact: list pipe as a valid action
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Jamal Hadi Salim 8da6ff35cd actions ife: Introduce encoding and decoding of tcindex metadata
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 1b600f4b54 ife: improve help text
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 57ee4430f9 ife: print prio, mark and hash as unsigned
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Roman Mashak 9a56cca3f3 ife action: allow specifying index in hex
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-10-12 15:09:52 -07:00
Stephen Hemminger e147161b1a ip: iprule style cleanup
Trivial whitespace cleanup to iprule
2016-10-09 19:29:24 -07:00
Hangbin Liu ca89c52143 ip rule: add selector support
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-10-09 19:25:59 -07:00
Hangbin Liu cb294a1de6 ip rule: merge ip rule flush and list, save together
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-10-09 19:25:59 -07:00
Stephen Hemminger 6773bcc227 iplink: cleanup style errors
Fix long strings causing checkpatch warnings
2016-10-09 19:24:38 -07:00
Moshe Shemesh 56e9f0ab19 ip link: Add support to configure SR-IOV VF to vlan protocol 802.1ad (VST QinQ)
Introduce a new API that exposes a list of vlans per VF (IFLA_VF_VLAN_LIST),
giving the ability for user-space application to specify it for the VF as
an option to support 802.1ad (VST QinQ).

We introduce struct vf_vlan_info, which extends struct vf_vlan and adds
an optional VF VLAN proto parameter.
Default VLAN-protocol is 802.1Q.

Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older kernel versions.

Suitable ip link tool command examples:
 - Set vf vlan protocol 802.1ad (S-TAG)
	ip link set eth0 vf 1 vlan 100 proto 802.1ad
 - Set vf vlan S-TAG and vlan C-TAG (VST QinQ)
	ip link set eth0 vf 1 vlan 100 proto 802.1ad vlan 30 proto 802.1Q
 - Set vf to VST (802.1Q) mode
	ip link set eth0 vf 1 vlan 100 proto 802.1Q
 - Or by omitting the new parameter (backward compatible)
	ip link set eth0 vf 1 vlan 100

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
2016-10-09 19:17:15 -07:00
Eric Dumazet 39f8caeb96 tc: fq: display unthrottle latency
In linux-4.9 fq packet scheduler got a new stat :

unthrottle_latency in nano second units.

Gives a good indication of system load or timer implementation
latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-10-09 19:15:13 -07:00
Shmulik Ladkani 4654173e90 tc: m_vlan: Add vlan modify action
The 'vlan modify' action allows to replace an existing 802.1q tag
according to user provided settings.
It accepts same arguments as the 'vlan push' action.

For example, this replaces vid 6 with vid 5:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \
      basic match 'meta(vlan mask 0xfff eq 6)' \
      action vlan modify id 5 continue

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
2016-10-09 19:11:34 -07:00
Nikolay Aleksandrov 590bf22a34 ipmroute: add support for age dumping
Add support to dump the mroute cache entry age if the show_stats (-s)
switch is provided.
Example:
$ ip -s mroute
(0.0.0.0, 239.10.10.10)          Iif: eth0       Oifs: eth0
  0 packets, 0 bytes, Age  245.44

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-10-09 19:09:31 -07:00
Stephen Hemminger b96306f8d9 Merge branch 'master' into net-next 2016-10-09 19:04:50 -07:00
Stephen Hemminger 63ec17a3da v4.8.0 2016-10-09 19:00:11 -07:00
Anton Aksola e29a8e0537 iproute2: build nsid-name cache only for commands that need it
The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.

As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.

Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m16.832s
user    0m1.350s
sys    0m15.029s

patched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m3.859s
user    0m0.132s
sys    0m3.205s

Signed-off-by: Anton Aksola <aakso@iki.fi>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-10-09 18:56:47 -07:00
Stephen Hemminger d99272470a update headers from pre 4.9 (net-next) 2016-10-09 18:55:58 -07:00
Stephen Hemminger d54e3ab985 Merge branch 'master' into net-next 2016-10-09 18:53:52 -07:00
Sushma Sitaram 58d93d0030 tc: f_u32: Fill in 'linkid' provided by user
Currently, 'linkid' input by the user is parsed but 'handle' is appended to the netlink message.

# tc filter add dev enp1s0f1 protocol ip parent ffff: prio 99 u32 ht 800: \
	order 1 link 1: offset at 0 mask 0f00 shift 6 plus 0 eat match ip \
	protocol 6 ff

resulted in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0
  match 00060000/00ff0000 at 8
    offset 0f00>>6 at 0  eat

This patch results in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 link 1:
  match 00060000/00ff0000 at 8
    offset 0f00>>6 at 0  eat

Signed-off-by Sushma Sitaram: Sushma Sitaram <sushma.sitaram@intel.com>
2016-10-09 18:51:00 -07:00
anuradhak afd3921ea9 bridge: Fix garbled json output seen if a vlan filter is specified
json objects were started but not completed if the fdb vlan did not
match the specified filter vlan.

Sample output:
$ bridge -j fdb show vlan 111
[{
        "mac": "44:38:39:00:69:88",
        "dev": "br0",
        "vlan": 111,
        "master": "br0",
        "state": "permanent"
    }
]
$ bridge -j fdb show vlan 100
[]
$

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2016-10-09 18:49:32 -07:00
Igor Ryzhov 6cf2609ddb fix netlink message length checks
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2016-10-09 18:48:30 -07:00
Hangbin Liu 22a84711f4 ip: Use specific slave id
The original bond/bridge/vrf and slaves use same id, which make people
confused. Use bond/bridge/vrf_slave as id name will make code more clear.

Acked-by: Phil Sutter <psutter@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-22 16:39:55 -07:00
Hangbin Liu 77089b583a misc/ss: tcp cwnd should be unsigned
tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
results in negative bandwidth calculations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-22 16:39:08 -07:00
Hangbin Liu d1f338b318 misc/ss: tcp cwnd should be unsigned
tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
results in negative bandwidth calculations.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-22 16:38:22 -07:00
Lorenzo Colitti ec75249b14 ss: Support displaying and filtering on socket marks.
This allows the user to dump sockets with a given mark (via
"fwmark = 0x1234/0x1234" or "fwmark = 12345", etc.) , and to
display the socket marks of dumped sockets.

The relevant kernel commits are: d545caca827b ("net: inet: diag:
expose the socket mark to privileged processes.") and
- a52e95abf772 ("net: diag: allow socket bytecode filters to
match socket marks")

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-09-22 16:34:40 -07:00
Alexei Starovoitov 4bfe682536 iptnl: add support for collect_md flag in IPv4 and IPv6 tunnels
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2016-09-21 16:36:24 -07:00
Stephen Hemminger a9c990b6d7 Merge branch 'master' into net-next 2016-09-21 16:35:56 -07:00
Jiri Benc 1f4c51c0e4 tunnels: use macros for IPv6 address comparison
Replace open coded comparison of IPv6 addresses with appropriate macros.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-09-21 16:35:05 -07:00
Liping Zhang c44003f7e7 ipmonitor: fix ip monitor can't work when NET_NS is not enabled
In ip monitor, netns_map_init will check getnsid is supported or not.
But when /proc/self/ns/net does not exist, we just print out error
messages and exit. So user cannot use ip monitor anymore when
CONFIG_NET_NS is disabled:
  # ip monitor
  open("/proc/self/ns/net"): No such file or directory

If open "/proc/self/ns/net" failed, set have_rtnl_getnsid to false.

Fixes: d652ccbf81 ("netns: allow to dump and monitor nsid")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-09-21 16:32:44 -07:00
Neal Cardwell 2f0f9aef94 ss: output TCP BBR diag information
Dump useful TCP BBR state information from a struct tcp_bbr_info that
was grabbed using the inet_diag API.

We tolerate info that is shorter or longer than expected, in case the
kernel is older or newer than the ss binary. We simply print the
minimum of what is expected from the kernel and what is provided from
the kernel. We use the same trick as that used for struct tcp_info:
when the info from the kernel is shorter than we hoped, we pad the end
with zeroes, and don't print fields if they are zero.

The BBR output looks like:
  bbr:(bw:1.2Mbps,mrtt:18.965,pacing_gain:2.88672,cwnd_gain:2.88672)

The motivation here is to be consistent with DCTCP, which looks like:
  dctcp(ce_state:23,alpha:23,ab_ecn:23,ab_tot:23)

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
2016-09-21 16:29:35 -07:00
Stephen Hemminger 16c2a51dc4 update bpf.h 2016-09-21 16:28:56 -07:00
Hangbin Liu bffb68b6c2 ip route: check ftell, fseek return value
ftell() may return -1 in error case, which is not handled and
therefore pass a negative offset to fseek(). The return code of
fseek() is also not checked.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-20 09:52:35 -07:00
Stephen Hemminger 36923f4e69 Merge branch 'master' into net-next 2016-09-20 09:50:53 -07:00
Mahesh Bandewar b7c1488034 ip: (ipvlan) introduce L3s mode
The new mode 'l3s' can be set like -

  ip link add link <master> dev <IPvlan-slave> type ipvlan mode l3s

  e.g. ip link add link eth0 dev ipvl0 type ipvlan mode l3s

Also did some trivial code restructuring.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
2016-09-20 09:50:45 -07:00
Davide Caratti f20f5f7990 macsec: fix input range of 'icvlen' parameter
the maximum possible ICV length in a MACsec frame is 16 octects, not 32:
fix get_icvlen() accordingly, so that a proper error message is displayed
in case input 'icvlen' is greater than 16.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
2016-09-20 09:48:26 -07:00
Jiri Benc e2cfe5501f vxlan: group address requires net device
This is now enforced in the kernel, check also in iproute to get a better
error message.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-09-20 09:46:41 -07:00
Davide Caratti 087dec7fcf tc: don't accept qdisc 'handle' greater than ffff
since get_qdisc_handle() truncates the input value to 16 bit, return an
error and prompt "invalid qdisc ID" in case input 'handle' parameter needs
more than 16 bit to be stored.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-09-20 09:44:59 -07:00
Phil Sutter 003f0fde69 iproute: fix documentation for ip rule scan order
Looks like the real issue is missing definition of priority.
2016-09-20 09:36:45 -07:00
Stephen Hemminger e8a67bc4cf update kernel headers from net-next 2016-09-20 09:31:42 -07:00
Stephen Hemminger f3af3074fd tipc: cleanup style issues
Fix style issues reported by checkpatch.
2016-09-20 09:25:42 -07:00
Parthasarathy Bhuvaragan 76fee71bf3 tipc: update man page for link monitor
Add description for the new link monitor commands.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 5b748f094b tipc: add link monitor list
In this commit, we list the monitor attributes. By default it lists
the attributes for all bearers, otherwise the specified bearer.

A sample usage is shown below:
$ tipc link monitor list

bearer eth:data0
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    16         UU []
1.1.2         up     direct    16         UU []
1.1.3         up     direct    16         UU []

bearer eth:data1
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    2          UU []
1.1.2         up     direct    3          UU []
1.1.3         up     direct    3          UU []

$ tipc link monitor list media eth device data0

bearer eth:data0
node          status monitored generation applied_node_status [non_applied_node:status]
1.1.1         up     direct    16         UU []
1.1.2         up     direct    16         UU []
1.1.3         up     direct    16         UU []

$ tipc link monitor list -h
Usage: tipc monitor list [ media MEDIA ARGS...]

MEDIA
 udp                   - User Datagram Protocol
 ib                    - Infiniband
 eth                   - Ethernet

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan d2ba0b0bbb tipc: refractor bearer to facilitate link monitor
In this commit, we:
1. Export print_bearer_media()
2. Move the bearer name handling from nl_add_bearer_name() into
   a new function cmd_get_unique_bearer_name().

These exported functions will be used by link monitor used in
subsequent commits.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 80e9807dff tipc: add link monitor summary
The monitor summary command prints the basic attributes
specific to the local node.
A sample usage is shown below:
$ tipc link monitor summary
bearer eth:data0
    table_generation 15
    cluster_size 8
    algorithm overlapping-ring

bearer eth:data1
    table_generation 15
    cluster_size 8
    algorithm overlapping-ring

$ tipc link monitor summary -h
Usage: tipc monitor summary

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 7da7ef9bd8 tipc: add link monitor get threshold
The command prints the monitor activation threshold.
A sample usage is shown below:
$ tipc link monitor get threshold
32

$ tipc link monitor get -h
Usage: tipc monitor get PPROPERTY

PROPERTIES
 threshold      - Get monitor activation threshold

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan b33a69005e tipc: add link monitor set threshold
The command sets the activation threshold for the new
cluster ring supervision.
A sample usage is shown below:
$ tipc link monitor set threshold 4

$ tipc link monitor set -h
Usage: tipc monitor set PPROPERTY

PROPERTIES
 threshold SIZE - Set activation threshold for monitor

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Parthasarathy Bhuvaragan 5f944e47ea tipc: remove dead code
remove dead code and a newline.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
2016-09-20 09:13:09 -07:00
Stephen Hemminger 6831acc8ef Merge branch 'master' into net-next 2016-09-20 09:13:03 -07:00
Phil Sutter 31a29009c5 iproute: fix documentation for ip rule scan order
Hi,

On Thu, Sep 08, 2016 at 11:59:55AM +0200, Michal Kubecek wrote:
> On Thu, Sep 01, 2016 at 09:04:54AM -0700, Stephen Hemminger wrote:
> > On Tue, 30 Aug 2016 17:32:52 -0700
> > Iskren Chernev <iskren@imo.im> wrote:
> >
> > > From 416f45b62f33017d19a9b14e7b0179807c993cbe Mon Sep 17 00:00:00 2001
> > > From: Iskren Chernev <iskren@imo.im>
> > > Date: Tue, 30 Aug 2016 17:08:54 -0700
> > > Subject: [PATCH bug-fix] iproute: fix documentation for ip rule scan order
> > >
> > > ---
> > >  man/man8/ip-rule.8 | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
> > > index 1774ae3..3508d80 100644
> > > --- a/man/man8/ip-rule.8
> > > +++ b/man/man8/ip-rule.8
> > > @@ -93,7 +93,7 @@ Each policy routing rule consists of a
> > >  .B selector
> > >  and an
> > >  .B action predicate.
> > > -The RPDB is scanned in order of decreasing priority. The selector
> > > +The RPDB is scanned in order of increasing priority. The selector
> > >  of each rule is applied to {source address, destination address,
> > > incoming
> > >  interface, tos, fwmark} and, if the selector matches the packet,
> > >  the action is performed. The action predicate may return with success.
> > > --
> > > 2.4.5
> >
> > Applied
>
> I'm sorry I didn't notice before but this just reverts the change done
> by commit 4957250166 ("iproute2: clarification of various man8 pages").
> IMHO the problem is that both versions are equally confusing as the word
> "priority" can be understood in two different senses.
>
> How about more explicit formulation, e.g.
>
>   ... in order of decreasing logical priority (i.e. increasing numeric
>   values).
>
> Would that be better?

Looks like the real issue is missing definition of priority. What about
this:
2016-09-20 09:08:56 -07:00
Thomas Graf 113fab78e4 tuntap: Add name attribute to usage text
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-09-08 14:31:33 -07:00
Hangbin Liu 12f92e2e4f gitignore: Ignore 'tags' file generated by ctags
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-08 14:30:44 -07:00
Hangbin Liu 45a0dc164a nstat: add sctp snmp support
SCTP module was not load by default. But this should be OK since we will not
load table if fdopen() failed, also opening the proc file won't load SCTP
kernel module.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2016-09-08 14:29:36 -07:00
Stephen Hemminger 88ba11bc08 Merge branch 'master' into net-next 2016-09-01 09:11:10 -07:00
Stephen Hemminger 3cad6e5f25 update kernel headers from 4.8-rc4 2016-09-01 09:10:43 -07:00
Davide Caratti 0330f49ea0 macsec: fix byte ordering on input/display of 'sci'
use get_be64() in place of get_u64() when parsing input 'sci' parameter,
so that 'sci' can be entered using network byte order regardless the
endianness of target system; use ntohll() when printing out 'sci'. While
at it, improve documentation of 'sci' in ip-link.8.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Davide Caratti d0baa1389f man: ip.8: add missing 'macsec' item to OBJECT list
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Davide Caratti 5898bd667a macsec: fix input of 'port', improve documentation of 'address'
remove hardcoded base 10 parsing of 'port' parameter, update man page
and fix usage() functions as well. Fix misleading line in man page that
theoretically allowed specifying 'port' keyword right after 'sci' keyword.
Provide documentation of 'address' parameter in man pages and in usage()
functions as well.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-09-01 09:08:50 -07:00
Stephen Hemminger cc28aad1e6 ip: iptuntap cleanup
Minor whitespace changes
2016-09-01 09:03:40 -07:00
Stephen Hemminger ae810982cc remove useless return statement
Get rid of:
void foo() {
...
	return;
}
2016-09-01 08:44:20 -07:00
Iskren Chernev 4a564d914d iproute: fix documentation for ip rule scan order 2016-09-01 08:41:37 -07:00
Andrey Jr. Melnikov 67a990b811 iproute: disallow ip rule del without parameters
Disallow run `ip rule del` without any parameter to avoid delete any first
rule from table.

Signed-off-by: Andrey Jr. Melnikov <temnota.am@gmail.com>
2016-09-01 08:41:37 -07:00
Hannes Frederic Sowa 567e696072 iptuntap: show processes using tuntap interface
Show which processes are using which tun/tap devices, e.g.:

$ ip -d tuntap
tun0: tun
	Attached to processes: vpnc(9531)
vnet0: tap vnet_hdr
	Attached to processes: qemu-system-x86(10442)
virbr0-nic: tap UNKNOWN_FLAGS:800
	Attached to processes:

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2016-09-01 08:41:37 -07:00
Nikolay Aleksandrov 56e3eb4c34 ip: route: fix multicast route dumps
If we have multicast routes and do ip route show table all we'll get the
following output:
 ...
 multicast ???/32 from ???/32  table default  proto static  iif eth0
The "???" are because the rtm_family is set to RTNL_FAMILY_IPMR instead
(or RTNL_FAMILY_IP6MR for ipv6). Add a simple workaround that returns the
real family based on the rtm_type (always RTN_MULTICAST for ipmr routes)
and the rtm_family. Similar workaround is already used in ipmroute, and
we can use this helper there as well.

After the patch the output is:
multicast 239.10.10.10/32 from 0.0.0.0/32  table default  proto static  iif eth0

Also fix a minor whitespace error and switch to tabs.

Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-09-01 08:41:37 -07:00
Stephen Hemminger 98a2af1d40 Merge branch 'master' into net-next 2016-09-01 08:39:15 -07:00
Hadar Hen Zion 0e43ed9dea tc: m_vlan: Add priority option to push vlan action
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent ffff: \
	flower \
	indev veth0 \
	action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:38:41 -07:00
Hadar Hen Zion 745d917260 tc: flower: Introduce vlan support
Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent ffff: \
	flower \
		indev ens4f0 \
		vlan_ethtype ipv4 \
		vlan_id 100 \
		vlan_prio 3 \
	action vlan pop

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:38:41 -07:00
Yotam Gigi 0501294bca tc: man: Add man entry for the matchall classifier.
In addition to providing information about the mathcall filter and its
configurations, the man entry contains examples for creating port
mirorring entries.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:37:01 -07:00
Yotam Gigi d5cbf3ff05 tc: Add support for the matchall traffic classifier.
The matchall classifier matches every packet and allows the user to apply
actions on it. In addition, it supports the skip_sw and skip_hw (as can
be found on u32 and flower filter) that direct the kernel to skip the
software/hardware processing of the actions.

This filter is very useful in usecases where every packet should be
matched. For example, packet mirroring (SPAN) can be setup very easily
using that filter.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:37:01 -07:00
Richard Alpe ed81deabf2 tipc: add the ability to get UDP bearer options
In this patch we introduce the ability to get UDP specific bearer
options such as remoteip, remoteport, localip and localport.

After some discussions on tipc-discussion on how to handle media
specific options we agreed to pass them after the media.

For media generic bearer options we already do:
$ tipc bearer get OPTION media MEDIA name|device NAME|DEVICE

For the UDP media specific bearer options we introduce in this path:
$ tipc bearer get media udp name NAME OPTION
such as
$ tipc bearer get media udp name NAME remoteip

This allows bash-completion to tab complete only appropriate options,
it makes more logical sense and it scales better. Even though it might
look a little different to the user.

In order to use the existing option parsing framework to do this we
add a flag (OPT_KEY) to the option parsing function.

If the UDP bearer has multiple remoteip addresses associated with it
(replicast) we handle the TIPC_NLA_UDP_MULTI_REMOTEIP flag and send
a TIPC_NL_UDP_GET_REMOTEIP query transparently to the user.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
2016-09-01 08:34:35 -07:00
Richard Alpe f1f40cf77d tipc: introduce bearer add for remoteip
Introduce the ability to add remote IP addresses to an existing UDP
bearer. On the kernel side, adding a "remoteip" to an existing bearer
puts the bearer in "replicast" mode where TIPC multicast messages are
send out to each configured remoteip using unicast. This is required
for TIPC UDP bearers to work in environments where IP multicast is
disabled.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
2016-09-01 08:34:35 -07:00
Stephen Hemminger 3cc0b954b0 Merge branch 'master' into net-next 2016-08-29 11:19:03 -07:00
Stephen Hemminger 7c55d7700f devlink: whitespace cleanup
Break long lines
2016-08-29 11:17:38 -07:00
Or Gerlitz f57856fab2 devlink: Add e-switch support
Implement kernel devlink e-switch interface. Currently we allow
to get and set the device e-switch mode.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-08-29 11:15:54 -07:00
Stephen Hemminger 60ba41ad7f update TIPC headers 2016-08-29 11:06:02 -07:00
Nikolay Aleksandrov 7abf5de677 bridge: vlan: add support to display per-vlan statistics
This patch adds support for the stats argument to the bridge
vlan command which will display the per-vlan statistics and the device
each vlan belongs to with its flags. The supported command filtering
options are dev and vid. Also the man page is updated to explain the new
option.
The patch uses the new RTM_GETSTATS interface with a filter_mask to dump
all bridges and ports vlans. Later we can add support for using the
per-device dump and filter it in the kernel instead.

Example:
$ bridge -s vlan show
port             vlan id
br0               1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
eth1              1 Egress Untagged
                    RX: 2536 bytes 20 packets
                    TX: 2536 bytes 20 packets
                  100
                    RX: 0 bytes 0 packets
                    TX: 0 bytes 0 packets
                  101
                    RX: 43158 bytes 50 packets
                    TX: 43158 bytes 50 packets
                  102
                    RX: 16897 bytes 93 packets
                    TX: 0 bytes 0 packets

The format is the same as bridge vlan show but with stats, even though
under the hood the calls done to the kernel are different.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-08-29 10:58:40 -07:00
Stephen Hemminger f7708201f8 Merge branch 'master' into net-next 2016-08-29 10:57:02 -07:00
Roman Mashak 27d2b08e23 police: bug fix man page
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Roman Mashak 3de88c4b47 police: improve usage message
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Roman Mashak cef49e514a police: add extra space to improve police result printing
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-29 10:54:40 -07:00
Phil Sutter 7cc7cb8a88 ip-route: Prevent some double spaces in output
The code is a bit messy, as it starts with space after text and at some
point switches to space before text. But either way, printing space
before *and* after text almost certainly leads to printing more
whitespace than necessary.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-29 10:35:33 -07:00
Richard Alpe 535194a172 tipc: add peer remove functionality
This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

This functionality was first merged in:
f9dec657e4 (Richard Alpe tipc: add peer remove functionality)

And later backed out (as the kernel counterpart was held up) in:
385caeb13b (Stephen Hemminger Revert "tipc: add peer remove functionality")

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
2016-08-29 10:33:24 -07:00
Stephen Hemminger 380656f8c4 update headers to 4.8-rc2 net-next 2016-08-25 08:49:07 -07:00
Stephen Hemminger 9f9e2bb88e update BPF headers 2016-08-25 08:46:25 -07:00
Jamal Hadi Salim 06be01f75d tc classifiers: Modernize tcindex classifier
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-08-22 10:08:00 -07:00
Eric Dumazet 1acd208c0b ip: report IFLA_GSO_MAX_SIZE and IFLA_GSO_MAX_SEGS
kernel support for these attributes was added in linux-4.6

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-08-22 10:03:57 -07:00
Gustavo Zacarias 6b376ebd6e ss: fix build with musl libc
UINT_MAX usage requires limits.h, so include it.

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
2016-08-22 10:02:12 -07:00
Xin Long c85703bb9f ip route: restore_handler should check tb[RTA_PREFSRC] for local networks
Prior to this patch, If one route entry's RTA_PREFSRC and RTA_GATEWAY
both were NULL, it was supposed to be restored ONLY as a local address.

But as it didn't check tb[RTA_PREFSRC] when restoring local networks,
rtattr_cmp would return a success if it was NULL, this route entry would
be restored again as a local network.

This patch is to add tb[RTA_PREFSRC] check when restoring local networks.

Fixes: 74af8dd962 ("ip route: restore route entries in correct order")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Phil Sutter <phil@nwl.cc>
2016-08-18 14:54:08 -07:00
Sabrina Dubroca 9423a324bf ila: show usage even if the module is not available
Currently, the `ip ila` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: ec71cae0bb ("ila: Support for configuring ila to use netfilter hook")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 14:00:28 -07:00
Sabrina Dubroca d240a0e174 fou: show usage even if the module is not available
Currently, the `ip fou` command tries to initialize a genl context even
when we just want to see the help for the command, which doesn't require
to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: 6928747b6e ("ip fou: Support to configure foo-over-udp RX")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 14:00:22 -07:00
Sabrina Dubroca 688f9aa4f2 macsec: show usage even if the module is not available
Currently, the `ip macsec` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.

Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.

Fixes: b26fc590ce ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 13:59:52 -07:00
Sabrina Dubroca 2b68cb77cd libgenl: introduce genl_init_handle
All users of genl have the same code to open a genl socket and resolve
the family for their specific protocol.  Introduce a helper to initialize
the handle, and use it in all the genl code.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
2016-08-17 13:59:21 -07:00
Phil Sutter 08c0466b11 ip-link: add missing {min,max}_tx_rate to help text
These vf options are described in man page already, they're just missing
in help output.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-17 13:54:31 -07:00
Richard Alpe 50afc4dbf8 tipc: refactor bearer identification
Introduce a generic function (nl_add_bearer_name()) that identifies a
bearer and adds it to an existing netlink message. This reduces code
complexity and makes the code a little bit easier to maintain.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
2016-08-17 13:52:07 -07:00
Richard Alpe ff77557957 tipc: fix UDP bearer synopsis
Local ip is not required to identify a UDP bearer and shouldn't be
passed to bearer disable, set or get. In this patch we remove the
localip entry from the synopsis of these functions.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
2016-08-17 13:52:07 -07:00
Tom Herbert 2d01b393f4 ipila: Fixed unitialized variables
Initialize locator and locator_match to zero and only do
addattr if they have been set.

Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-17 13:51:18 -07:00
Phil Sutter 7e33b09331 man: ip-link.8: Document missing geneve options
This adds missing documentation of geneve type options:

- dstport
- external
- udpcsum
- udp6zerocsumtx
- udp6zerocsumrx

The bits for the last three was just copy and pasted from vxlan section.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-12 12:58:53 -07:00
Tom Herbert 8bd31d8db5 fou: Allowing configuring IPv6 listener
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert 0b2fbb7358 gre6: Support for fou encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert 73516e128a ip6tnl: Support for fou encapsulation
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:51:18 -07:00
Tom Herbert ec71cae0bb ila: Support for configuring ila to use netfilter hook
Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:50:15 -07:00
Tom Herbert ed67f83806 ila: Support for checksum neutral translation
Add configuration of ila LWT tunnels for checksum mode including
checksum neutral translation.

Signed-off-by: Tom Herbert <tom@herbertland.com>
2016-08-12 12:49:30 -07:00
WANG Cong 6fcf36c9c6 tc: fix a misleading failure
Before this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".

After this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel

Cc: Stephen Hemminger <shemming@brocade.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2016-08-09 11:18:14 -07:00
Stephen Hemminger 592990ebf4 Merge branch 'net-next' 2016-08-09 11:14:47 -07:00
Roopa Prabhu e40d6b2b90 bridge: print_vlan: add missing check for json instance
Also initialize vlan_flags

Fixes: d82a49ce85 ("bridge: add json support for bridge vlan show")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-08-08 10:51:12 -07:00
Stephen Hemminger 80c4bc6a3e Merge branch 'master' into net-next 2016-08-08 09:27:28 -07:00
Stephen Hemminger 4ecc96f8b6 v4.7.0 2016-08-08 08:58:39 -07:00
Stephen Hemminger 1b2594935e Merge branch 'master' into net-next 2016-08-08 08:57:22 -07:00
Phil Sutter c15feb99a4 tc/m_gact: Fix action_a2n() return code check
The function returns zero on success.

Reported-by: Mark Bloch <markb@mellanox.com>
Fixes: 69f5aff63c ("tc: use action_a2n() everywhere")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-08 08:52:47 -07:00
Stephen Hemminger dc00db9e84 update kernel headers 2016-08-08 08:51:22 -07:00
Roopa Prabhu 39c64df1c7 bridge: print_vlan: add missing check for json instance
Also initialize vlan_flags

Fixes: d82a49ce85 ("bridge: add json support for bridge vlan show")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-08-08 08:44:16 -07:00
Stephen Hemminger 6d54c41580 Merge branch 'master' into net-next 2016-08-08 08:44:07 -07:00
Roopa Prabhu 1eeb6fdac8 bridge: vlan json: skip ports with empty vlans
The non-json output prints 'None' for such vlans.
And this can garble json output.

Fixes: d82a49ce85 ("bridge: add json support for bridge vlan show")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-08-08 08:42:26 -07:00
Phil Sutter 9579afb24e tc: Fix for missing estimator initialization
When switching to C99 initializers, I forgot to add this one. This means
that when trying to set an estimator value, tc would complain about
spurious duplicate estimator parameter. But much worse, the random
variable content is sent to the kernel regardless of whether an
estimator was given or not.

Fixes: d17b136f7d ("Use C99 style initializers everywhere")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-08-06 10:14:06 -07:00
Jiri Pirko e3d0f0c0e3 devlink: add option to generate JSON output
For parsing by another app it is convenient to produce output in JSON
format.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-08-01 16:31:08 -07:00
Jiri Pirko 7a9466dbcb devlink: write usage help messages to stderr
In order to not confuse reader, write help messages into stderr.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-08-01 16:31:08 -07:00
Davide Caratti 89bb6e673a macsec: cipher and icvlen can be set separately
since kernel driver has valid default values for 'cipher' and 'icvlen',
there is no need for requiring users to specify both of them when a new
link is added. Also, prompt an error message and exit with appropriate
exit status in case of unsupported cipher suite.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-07-28 11:12:39 -07:00
Davide Caratti fd4df5b211 ip {link,address}: add 'macsec' item to TYPE list
fix output of "ip address help" and "ip link help". Update TYPE list in man
pages ip-address.8 and ip-link.8 as well.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-07-28 11:12:39 -07:00
Davide Caratti c0ab80a490 man: macsec: fix macsec related typos
- ip-macsec.8: fix wrong 'device' keyword in 'ip link add device eth0';
add missing description of 'validate' keyword; remove spurious bracket
near 'encrypt' keyword; add missing reference to configuration of 'port'
and 'sci'
- ip-link.8 fix wrong 'es' and 'encoding' keywords in MACsec section

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2016-07-28 11:12:39 -07:00
Stephen Hemminger 79f5bf17a5 Merge branch 'master' into net-next 2016-07-25 08:21:00 -07:00
Michal Soltys bdd6104f52 man/man8/tc-flow.8: minor corrections
- baseclass: major handle must match that of class's, Y defaults to 1
- flow map example: maps to 1-256, not 1-257

Signed-off-by: Michal Soltys <soltys@ziu.info>
2016-07-25 08:19:25 -07:00
Phil Sutter 7093200611 tc: util: No need for action_n2a() to be reentrant
This allows to remove some buffers here and there. While at it, make it
return a const value.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-25 08:10:43 -07:00
Phil Sutter 69f5aff63c tc: use action_a2n() everywhere
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-25 08:10:43 -07:00
Phil Sutter 53aadc5286 tc: util: bore up action_a2n()
It's a pitty this function is used nowhere, so let's polish it for use:

* Loop over branch names, makes it clear that every former conditional
  was exactly identical.
* Support 'pipe' branch name, too.
* Make number parsing optional.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-25 08:10:43 -07:00
Phil Sutter 9ffc80b1e4 tc: Reformat tc_util.h
* Drop 'extern' keyword before function declarations.
* Add parameter names where they were missing for matters of
  consistency.
* Drop fancy indenting (e.g. tab between type and name).
* Break long lines to not exceed 80 columns.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-25 08:10:43 -07:00
Shanker Wang 9bf9d05b23 l2tp: add udp checksum control flags
Three options are added for the user to control
whether the checksum is enabled

Signed-off-by: Miao Wang <miao.wang@tuna.tsinghua.edu.cn>
2016-07-22 15:25:23 -07:00
Stephen Hemminger ba91cd9d86 include: update net-next XDP headers 2016-07-20 12:24:59 -07:00
Stephen Hemminger ac75d5cd36 Merge branch 'master' into net-next 2016-07-20 12:21:42 -07:00
Phil Sutter 6acf086c2b ip-address.8: Document autojoin flag
Description copied from related kernel support commit message with a
little tailoring to fit.

While at it, fix font of non-terminal CONFFLAG-LIST in synopsis.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:21:18 -07:00
Phil Sutter 247ace6115 tc: ematch: Ignore all-zero mask value when printing filters
The optional mask which may be added to int values is considered by the
kernel only if it is non-zero, therefore tc should only then also print
it.

Without this, not passing a mask value like so:

| # tc filter add dev d0 parent 8001: \
| 	basic match meta\(vlan eq 1\) \
| 	classid 8001:1

Would lead to tc printing an all-zero mask later:

| # tc filter show dev d0
| filter parent 8001: protocol all pref 49151 basic
| filter parent 8001: protocol all pref 49151 basic handle 0x1 flowid 8001:1
|   meta(vlan mask 0x00000000 eq 1)

This is obviously confusing as an all-zero mask strictly means to
eliminate all bits from the value, but the opposite is the case.

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:20:13 -07:00
Phil Sutter 7ffa2b557a Makefile: Allow to override CC
This makes it easier to build iproute2 with a custom compiler.

While at it, make HOSTCC default to the value of CC if not explicitly
set elsewhere.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Phil Sutter 30a8842c49 No need to initialize rtattr fields before parsing
Since parse_rtattr_flags() calls memset already, there is no need for
callers to do so themselves.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Phil Sutter f89bb0210f Replace malloc && memset by calloc
This only replaces occurrences where the newly allocated memory is
cleared completely afterwards, as in other cases it is a theoretical
performance hit although code would be cleaner this way.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Phil Sutter d17b136f7d Use C99 style initializers everywhere
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).

Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.

The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Phil Sutter d892aaf740 tc: m_action: Improve conversion to C99 style initializers
This improves my initial change in the following points:

- Flatten embedded struct's initializers.
- No need to initialize variables to zero as the key feature of C99
  initializers is to do this implicitly.
- By relocating the declaration of struct rtattr *tail, it can be
  initialized at the same time.

Fixes: a0a73b298a ("tc: m_action: Use C99 style initializers for struct req")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Phil Sutter 52a5986980 ip-link.8: Fix font choices
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Phil Sutter 3dd4b8936b ip-link.8: Add slave type option descriptions
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Phil Sutter f9e9f92881 ip-link.8: Place 'ip link set' warning more prominently
This moves the warning to the beginning of the section about 'ip link
set' which makes it still stand out after adding more text to it's end.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Phil Sutter 657426c506 ip-link.8: Extend type list in synopsis
'ip link set' supports passing a type to set type-specific parameters.
Add this missing piece of information to the synopsis section.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Phil Sutter 25c93faa58 iplink: bond_slave: Add missing help functions
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Phil Sutter 771a242890 iplink: List valid 'type' argument in ip link help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-20 12:04:34 -07:00
Stephen Hemminger 48703289da bridge: remove unused variable
Debris from JSON changes.
2016-07-20 12:03:33 -07:00
Roopa Prabhu db7263798a bridge: update man page
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-07-20 12:02:02 -07:00
Anuradha Karuppiah 15539fc6f9 bridge: add json schema for bridge fdb show
Storing the schema file for the json format will be useful for doc
purposes as optional paramaters are typically suppressed in the json
sample outputs.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2016-07-20 12:02:02 -07:00
Anuradha Karuppiah b239c56ebc bridge: add json support for bridge fdb show
Sample output:
$bridge -j fdb show
[{
        "mac": "44:38:39:00:69:88",
        "dev": "swp2s0",
        "vlan": 2,
        "master": "br0",
        "state": "permanent"
    },{
        "mac": "00:02:00:00:00:01",
        "dev": "swp2s0",
        "vlan": 2,
        "master": "br0"
    },{
        "mac": "00:02:00:00:00:02",
        "dev": "swp2s1",
        "vlan": 2,
        "master": "br0"
    },{
        "mac": "44:38:39:00:69:89",
        "dev": "swp2s1",
        "master": "br0",
        "state": "permanent"
    },{
        "mac": "44:38:39:00:69:89",
        "dev": "swp2s1",
        "vlan": 2,
        "master": "br0",
        "state": "permanent"
    },{
        "mac": "44:38:39:00:69:88",
        "dev": "br0",
        "master": "br0",
        "state": "permanent"
    }
]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-07-20 12:02:02 -07:00
Roopa Prabhu d82a49ce85 bridge: add json support for bridge vlan show
$bridge -c vlan show
port	vlan ids
swp1	 1 PVID Egress Untagged
	 10-13

swp2	 1 PVID Egress Untagged
	 10-13

br0	 1 PVID Egress Untagged

$bridge  -json vlan show
{
    "swp1": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        },{
            "vlan": 10
        },{
            "vlan": 11
        },{
            "vlan": 12
        },{
            "vlan": 13
        }
    ],
    "swp2": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        },{
            "vlan": 10
        },{
            "vlan": 11
        },{
            "vlan": 12
        },{
            "vlan": 13
        }
    ],
    "br0": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        }
    ]
}

$bridge -c -json vlan show
{
    "swp1": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        },{
            "vlan": 10,
            "vlanEnd": 13
        }
    ],
    "swp2": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        },{
            "vlan": 10,
            "vlanEnd": 13
        }
    ],
    "br0": [{
            "vlan": 1,
            "flags": ["PVID","Egress Untagged"
            ]
        }
    ]
}

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-07-20 12:02:02 -07:00
Anuradha Karuppiah d721a14590 json_writer: Removed automatic json-object type from the constructor
Top level can be any json type and can be created using
jsonw_start_object/jsonw_end_object etc.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2016-07-20 12:02:02 -07:00
David Ahern 7a4559f67c ss: Add option to suppress header line
Add option to suppress header line. When used the following line
is not shown:
"State  Recv-Q Send-Q     Local Address:Port  Peer Address:Port"

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 11:55:43 -07:00
David Ahern 930d3f2819 ss: Fix support for device filter by index
Support was recently added for device filters. The intent was to allow
the device to be specified by name or index, and using the if%u format
(dev == if5) or the simpler and more intuitive index alone (dev == 5).
The latter case is broken since the index is not saved to the filter
after the strtoul conversion. Further, the tmp variable used for the
conversion shadows another variable used in the function. Fix both.

With this change all 3 variants work as expected:
$ ss -t 'dev == 62'
State   Recv-Q Send-Q         Local Address:Port    Peer Address:Port
ESTAB       0      224         10.0.1.3%mgmt:ssh   192.168.0.50:58442

$ ss -t 'dev == mgmt'
State   Recv-Q Send-Q         Local Address:Port    Peer Address:Port
ESTAB       0      224         10.0.1.3%mgmt:ssh   192.168.0.50:58442

$ ss -t 'dev == if62'
State   Recv-Q Send-Q         Local Address:Port    Peer Address:Port
ESTAB       0      36          10.0.1.3%mgmt:ssh   192.168.0.50:58442

Fixes: 2d29321256 ("ss: Add support to filter on device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 11:55:43 -07:00
Daniel Borkmann e77fa41d4c bpf: also check elf for official e_machine value
Use the official BPF ELF e_machine value that was assigned recently [1]
and will be propagated to glibc, libelf et al. LLVM will switch to it
in 3.9 release, therefore we need to prepare tc to check for EM_ELF as
well, older version still have the EM_NONE.

  [1] 36b9c09330

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-07-20 11:54:53 -07:00
Stephen Hemminger a951428058 update headers files to current net-next 2016-07-15 11:55:14 -07:00
Stephen Hemminger ba5783cbf3 Merge branch 'master' into net-next 2016-07-15 11:49:41 -07:00
Ido Schimmel 78c610e6ea man: Point to 'devlink-sb' from 'devlink' man page
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
2016-07-15 11:46:39 -07:00
Ido Schimmel e3da7a45ba man: Add devlink man pages to Makefile
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
2016-07-15 11:46:39 -07:00
Stephen Hemminger 79f4a39365 iproute: constify rtattr_cmp 2016-07-15 11:34:45 -07:00
Xin Long 74af8dd962 ip route: restore route entries in correct order
Sometimes we cannot restore route entries, because in kernel
  [1] fib_check_nh()
  [2] fib_valid_prefsrc()
cause some routes to depend on existence of others while adding.

For example, we saved all the routes, and flushed all tables
  [a] default via 192.168.122.1 dev eth0
  [b] 192.168.122.0/24 dev eth0 src 192.168.122.21
  [c] broadcast 127.0.0.0 dev lo table local src 127.0.0.1
  [d] local 127.0.0.0/8 dev lo table local  src 127.0.0.1
  [e] local 127.0.0.1 dev lo table local src 127.0.0.1
  [f] broadcast 127.255.255.255 dev lo table local src 127.0.0.1
  [g] broadcast 192.168.122.0 dev eth0 table local src 192.168.122.21
  [h] local 192.168.122.21 dev eth0 table local src 192.168.122.21
  [i] broadcast 192.168.122.255 dev eth0 table local src 192.168.122.21

  Now start to restore them:
    If we want to add [a], we have to add [b] first, as [1] and
    'via 192.168.122.1' in [a].
    If we want to add [b], we have to add [h] first, as [2] and
    'src 192.168.122.21' in [b].

  So the correct order to restore should be like:
    [e][h] -> [b][c][d][f][g][i] -> [a]

This patch fixes it by traversing the file 3 times, it only restores
part of them in each run according to the following conditions, to
make sure every entry can be restored successfully.
  1. !gw && (!fib_prefsrc || fib_prefsrc == cfg->fc_dst)
  2. !gw && (fib_prefsrc != cfg->fc_dst)
  3. gw

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-07-15 11:34:10 -07:00
Stephen Hemminger ef0a738c8d ip: link style cleanup
break long lines and other trivial changes
2016-07-15 11:31:20 -07:00
Eli Cohen d91fb3f4c7 Add support for configuring Infiniband GUIDs
Add two NLA's that allow configuration of Infiniband node or port GUIDs
by referencing the IPoIB net device set over the physical function. The
format to be used is as follows:

ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

Signed-off-by: Eli Cohen <eli@mellanox.com>
2016-07-15 11:25:36 -07:00
Stephen Hemminger d5b62e6439 Merge branch 'master' into net-next 2016-07-06 21:29:32 -07:00
David Ahern 0130f0120b ip route: Add support for vrf keyword
Add vrf keyword to 'ip route' commands. Allows:
1. Users can list routes by VRF name:
       $ ip route show vrf NAME

   VRF tables have all routes including local and broadcast routes.
   The VRF keyword filters LOCAL and BROADCAST routes; to see all
   routes the table option can be used. Or to see local routes only
   for a VRF:
       $ ip route show vrf NAME type local

2. Add or delete a route for a VRF:
       $ ip route {add|delete} vrf NAME <route spec>

3. Do a route lookup for a VRF:
       $ ip route get vrf NAME ADDRESS

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
David Ahern 9b76577042 ip vrf: Add ipvrf_get_table
Add ipvrf_get_table to lookup table id for device name. Returns 0
on any error or if name is not a VRF device.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
David Ahern d84b1878ea ip route: Change type mask to bitmask
Allow option to select multiple route types to show or exlude
specific route types.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
David Ahern 5db1adae2a ip neigh: Add support for keyword
Add vrf keyword to 'ip neigh' commands. Allows listing neighbor
entries for all links associated with a given VRF.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
David Ahern 104444c201 ip link/addr: Add support for vrf keyword
Add vrf keyword to 'ip link' and 'ip addr' commands (common list code).

Allows:
1. Adding a link to a VRF
       $ ip link set NAME vrf NAME

   Removing a link from a VRF still uses 'ip link set NAME nomaster'

2. Showing links associated with a VRF:
       $ ip link show vrf NAME

3. List addresses associated with links in a VRF
       $ ip -br addr show vrf red

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
David Ahern 7dc0e974f1 ip vrf: Add name_is_vrf
Add name_is_vrf function to determine if given name corresponds to a
VRF device.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-06 21:28:31 -07:00
Amir Vadai cfcabf18d8 tc: flower: Add skip_{hw|sw} support
On devices that support TC flower offloads, these flags enable a filter to be
added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive
flags. By default without any flags, the filter is added to both HW and SW,
but no error checks are done in case of failure to add to HW.
With skip-sw, failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip_sw and the other
with skip_hw flag.

   # add ingress qdisc
   tc qdisc add dev enp0s9 ingress

   # enable hw tc offload.
   ethtool -K enp0s9 hw-tc-offload on

   # add a flower filter with skip-sw flag.
   tc filter add dev enp0s9 protocol ip parent ffff: flower \
	   ip_proto 1 indev enp0s9 skip_sw \
	   action drop

   # add a flower filter with skip-hw flag.
   tc filter add dev enp0s9 protocol ip parent ffff: flower \
	   ip_proto 3 indev enp0s9 skip_hw \
	   action drop

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-07-06 21:24:48 -07:00
Stephen Hemminger 2a5855706a Merge branch 'master' into net-next 2016-07-06 21:23:26 -07:00
Jamal Hadi Salim 1d1e0fd29b actions: skbedit add support for mod-ing skb pkt_type
I'll make a formal submission sans the header when the kernel patches
makes it in. This version is for someone who wants to play around with
the net-next kernel patches i sent

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-07-06 21:15:44 -07:00
Stephen Hemminger 4824bb4151 update kernel header (4.7 net-next) 2016-07-06 21:14:57 -07:00
Phil Sutter 03ac85b708 ip-address: constify match_link_kind arg
Since the function won't ever change the data 'kind' is pointing at, it
can sanely be made const.

Fixes: e0513807f6 ("ip-address: Support filtering by slave type, too")
Suggested-by: Stephen Hemminger <shemming@brocade.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-07-06 21:08:54 -07:00
Michal Soltys 509dcd43c9 iproute2: unmangle netdev/my emails in man pages (hfsc, stab)
No other man pages do so, hiding netdev is kind of silly and I don't
mind having my own address normally visible.
2016-07-06 21:07:23 -07:00
Masatake YAMATO fab3e001fd man: rtacct: add missing TP marker
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2016-07-06 21:06:33 -07:00
Stephen Hemminger f62f952fad Merge branch 'master' into net-next 2016-06-30 17:31:37 -07:00
Vivien Didelot 3aa8f8cb7a bridge: man: fix STP LISTENING description
Correct the unclear and poorly conjugated STP LISTENING documentation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
2016-06-30 17:30:02 -07:00
Vivien Didelot 400b5404af bridge: man: fix BPUD typo
s/BPUD/BPDU/ in guard description.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
2016-06-30 17:30:02 -07:00
Andrew Vagin eecc006952 ip route: timeout for routes has to be set in seconds
Currently a timeout is multiplied by HZ in user-space and
then it multiplied by HZ in kernel-space.

$ ./ip/ip r add 2002::0/64 dev veth1 expires 10
$ ./ip/ip -6 r
2002::/64 dev veth1  metric 1024 linkdown  expires 996sec pref medium

Cc: Xin Long <lucien.xin@gmail.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Stephen Hemminger <shemming@brocade.com>
Fixes: 68eede2505 ("route: allow routes to be configured with expire values")
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
2016-06-30 17:24:59 -07:00
Phil Sutter 577cfe0b67 ip-address: Align type list in help and man page
This adds missing entries on both sides until they are identical.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-29 09:20:02 -07:00
Phil Sutter e0513807f6 ip-address: Support filtering by slave type, too
This patch allows to query all interfaces enslaved to a bridge or bond
using the following syntax:

| ip addr show type bridge_slave

Filtering has to be done in userspace since the kernel does not support
filtering on IFLA_INFO_SLAVE_KIND.

Functionality introduced in this patch is not fully complete since it
does not allow to match on type and slave type at the same time, but it
doesn't prevent implementing a dedicated slave_type match, either.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-29 09:20:02 -07:00
Phil Sutter 62000e51e0 Use ARRAY_SIZE macro everywhere
This patch was generated by the following semantic patch (a trimmed down
version of what is shipped with Linux sources):

@@
type T;
T[] E;
@@
(
- (sizeof(E)/sizeof(*E))
+ ARRAY_SIZE(E)
|
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
|
- (sizeof(E)/sizeof(T))
+ ARRAY_SIZE(E)
)

The only manual adjustment was to include utils.h in misc/nstat.c to make
the macro known there.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-29 09:18:18 -07:00
David Ahern 2d29321256 ss: Add support to filter on device
Add support for device names in the filter. Example:

    root@kenny:~# ss -t  'sport == :22 && dev == red'
    State      Recv-Q Send-Q     Local Address:Port      Peer Address:Port
    ESTAB      0      0          10.100.1.2%red:ssh      10.100.1.254:47814
    ESTAB      0      0           2100:1::2%red:ssh        2100:1::64:49406

Since kernel does not support iface in the filter specifying a
device name means all filtering is done in userspace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-28 12:09:01 -07:00
David Ahern 376fb86872 ss: Allow ssfilter_bytecompile to return 0
Allow ssfilter_bytecompile to return 0 for filter ops the kernel
does not support. If such an op is in the filter string then all
filtering is done in userspace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-28 12:09:01 -07:00
David Ahern 82d73ea03a ss: Refactor inet_show_sock
Extract parsing of sockstat and filter from inet_show_sock.
While moving run_ssfilter into callers of inet_show_sock enable
userspace filtering before the kill.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-28 12:09:01 -07:00
Stephen Hemminger 131351086e Merge branch 'master' into net-next 2016-06-27 11:30:06 -07:00
Jakub Sitnicki 08401220a9 ip/tcp_metrics: Simplify process_msg a bit
On Tue, Jun 21, 2016 at 06:18 PM CEST, Phil Sutter <phil@nwl.cc> wrote:
> By combining the attribute extraction and check for existence, the
> additional indentation level in the 'else' clause can be avoided.
>
> In addition to that, common actions for 'daddr' are combined since the
> function returns if neither of the branches are taken.
>
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  ip/tcp_metrics.c | 45 ++++++++++++++++++---------------------------
>  1 file changed, 18 insertions(+), 27 deletions(-)
>
> diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
> index f82604f458ada..899830c127bcb 100644
> --- a/ip/tcp_metrics.c
> +++ b/ip/tcp_metrics.c
> @@ -112,47 +112,38 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
>  	parse_rtattr(attrs, TCP_METRICS_ATTR_MAX, (void *) ghdr + GENL_HDRLEN,
>  		     len);
>
> -	a = attrs[TCP_METRICS_ATTR_ADDR_IPV4];
> -	if (a) {
> +	if ((a = attrs[TCP_METRICS_ATTR_ADDR_IPV4])) {

Copy the pointer inside the branch?

Same gain on indentation while keeping checkpatch happy.

I only compile-tested the patch below.

Thanks,
Jakub
2016-06-27 11:00:54 -07:00
David Ahern 2a6f9cfa8b man: ip-link: Add vrf type
Add description for vrf type to ip-link man page.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-27 10:53:28 -07:00
Vivien Didelot 7fab22abd1 bridge: man: fix "brige" typo
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
2016-06-27 10:50:37 -07:00
Vivien Didelot 296cee6fdf bridge: vlan: fix a few "fdb" typos in vlan doc
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
2016-06-27 10:49:50 -07:00
Phil Sutter 0aae23468a Fix MAC address length check
I forgot to change the variable in the conditional, too.

Fixes: 8fe58d5894 ("iplink: Check address length via netlink")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-27 10:48:35 -07:00
Phil Sutter 3462c116f8 man: ip-address, ip-link: Document 'type' quirk
This covers the fact that calling 'ip {link|addr} show type foobar' does
not return an error.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-27 10:48:35 -07:00
Stephen Hemminger 3b6d9ab2e2 update kernel headers (net-next) 2016-06-21 11:29:20 -07:00
Stephen Hemminger 7d057fc292 Merge branch 'master' into net-next 2016-06-21 11:28:32 -07:00
Stephen Hemminger 5b26063c25 if: add missing kernel headers
Add kernel headers for all headers that included by current source.
2016-06-21 11:24:52 -07:00
Phil Sutter 8fe58d5894 iplink: Check address length via netlink
This is a feature which was lost during the conversion to netlink
interface: If the device exists and a user tries to change the link
layer address, query the kernel for the old address first and reject the
new one if sizes differ.

This patch adds the same check when setting VF address by assuming same
length as PF device.

Note that at least for VFs the check can't be done in kernel space since
struct ifla_vf_mac lacks a length field and due to netlink padding the
exact size can't be communicated to the kernel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-21 08:50:14 -07:00
Phil Sutter a89193a7d6 iplink: Add missing variable initialization
Without this, we might feed garbage to the kernel when the address is
shorter than expected.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-21 08:50:14 -07:00
Martin KaFai Lau 414aeec90f ss: Add tcp_info fields data_segs_in/out
tcp_info fields, data_segs_in and data_segs_out, have been added to the
kernel in commit a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
since kernel 4.6.

This patch supports those fileds in ss:

ESTAB      801736 360                            face:face:face:face::1:22                                      face:face:face:face::face:46779
         cubic wscale:9,7 rto:223 rtt:22.195/8.202 ato:40 mss:1428 cwnd:11 ssthresh:7 bytes_acked:203649 bytes_received:334034603 segs_out:18513 segs_in:241825 data_segs_out:4192 data_segs_in:241672 send 5.7Mbps lastsnd:2 lastack:3 pacing_rate 6.8Mbps unacked:10 retrans:0/1 rcv_rtt:29.375 rcv_space:1241704 minrtt:0.013

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2016-06-21 08:48:50 -07:00
Stephen Hemminger 76d0aa1f3b Merge branch 'master' into net-next 2016-06-16 09:43:07 -07:00
Phil Sutter 5f6a467f59 tc: m_action: Drop unused variable nladdr in tc_action_gd()
This has been there since the introduction of tc/m_action.c back in 2004
and was apparently never in use.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-16 09:41:55 -07:00
Phil Sutter a0a73b298a tc: m_action: Use C99 style initializers for struct req
Instead of initializing fields after (or sometimes even before) zeroing
the whole struct via memset(), initialize the whole thing at declaration
time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-16 09:41:55 -07:00
Stephen Hemminger 74951b2d07 Merge branch 'master' into net-next 2016-06-14 18:05:49 -07:00
Alexander Aring 9b32f89693 tc: let m_ipt work with new iptables API headers
Since commit 5cd1adb ("Update to current iptables headers") the build
with m_ipt.o and the following config will fail:

TC_CONFIG_XT:=n
TC_CONFIG_XT_OLD:=n
TC_CONFIG_XT_OLD_H:=n

This patch renames "iptables_target" to "xtables_target" and some other
things which gets renamed and I noticed while reading iptables git log.
Functions which are not used in m_ipt.c and not exported by the header
are removed, if they still used in m_ipt.c I added a static to the function.

Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
2016-06-14 18:03:30 -07:00
Stephen Hemminger d831cc7c00 iprule: whitespace cleanup
Cleanup long lines, and indentation issues.
Use rta_getattru32 rather than cast to unsigned.
2016-06-14 17:20:02 -07:00
David Ahern 8c92e12277 ip rule: Add support for l3mdev rules
Kernel commit 96c63fa7393d ("net: Add l3mdev rule") added support for
the FRA_L3MDEV attribute. The attribute enables use of l3mdev rules
which mean 'get table id from l3 master device'. This patch adds
support to iproute2 to show, add and delete rules with this attribute.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-06-14 16:53:20 -07:00
Stephen Hemminger ec09118749 fib_rules.h update header file
Add new L3MDEV (clone from net-next)
2016-06-14 16:33:48 -07:00
Stephen Hemminger 5c05c1d4a2 Merge branch 'master' into net-next 2016-06-14 16:33:24 -07:00
Stephen Hemminger 4b83a08c28 m_xt: whitespace cleanup
Make it 99% checkpatch clean.
2016-06-14 14:40:53 -07:00
Phil Sutter 2ef4008585 tc: m_xt: Introduce get_xtables_target_opts()
This pulls common code from parse_ipt() and print_ipt() functions
together.

While here, also fix for incorrect use of the global 'optarg' variable
in print_ipt().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter f6ddd9c5da tc: m_xt: Simplify argc adjusting in parse_ipt()
And while at it, also improve the error message in case too few
parameters have been given.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter 28432f370e tc: m_xt: Get rid of iargc variable in parse_ipt()
After dropping the unused decrement of argc in the function's tail, it
can fully take over what iargc has been used for.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter ab8f52fc4a tc: m_xt: Get rid of rargc in parse_ipt()
No need to copy the passed parameter, it's changed only once right
before function return.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter b0ba018576 tc: m_xt: Drop unused variable fw in parse_ipt()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter b45f9141c2 tc: m_xt: Get rid of one indentation level in parse_ipt()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter f1a7c7d830 tc: m_xt: Fix indenting
By exiting early if xtables_find_target() fails, one indenting level can
be dropped. Some of the wrongly indented code then happens to sit at the
right spot by accident which is why this patch is smaller than expected.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter 8eee75a835 tc: m_xt: Fix segfault when adding multiple actions at once
Without this, the following call to tc would segfault:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
| 	action xt -j MARK --set-mark 0x1 \
| 	action xt -j MARK --set-mark 0x1

The reason is basically the same as for 6e2e5ec28b ("fix print_ipt:
segfault if more then one filter with action -j MARK.") but in
parse_ipt() instead of print_ipt().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter 445745221a tc: m_xt: Prevent segfault with standard targets
Iptables standard targets like DROP or REJECT don't implement the print
callback in libxtables. Hence the following command would segfault:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 action xt -j DROP

With this patch standard targets still can't be used (and are not really
useful anyway), but at least it doesn't crash anymore.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Stephen Hemminger 8b625177ba pedit: fix whitespace etc
Minor changes from checkpatch
2016-06-14 14:32:27 -07:00
Jamal Hadi Salim d8694a30a4 action pedit: stylistic changes
More modern layout.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-06-14 14:29:20 -07:00
Phil Sutter 8e45e44b79 man: ip-link: Document query_rss option
Doc text shamelessly stolen from the introducing commit's message
(6c55c8c461 ['ip link set vf: Added "query_rss" command']).

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:27:03 -07:00
Beniamino Galvani 9ba4126dc4 utils: fix hex digits parsing in hexstring_a2n()
strtoul() only modifies errno on overflow, so if errno is not zero
before calling the function its value is preserved and makes the
function fail for valid inputs; initialize it.

Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
2016-06-14 14:25:05 -07:00
Phil Sutter 24604eb287 ipaddress: Allow listing addresses by type
Not sure why this was limited to ip-link before. It is semantically
equal to the 'master' keyword, which is not restricted at all.

The man page and help text adjustments include the 'master' keyword as
well since that is also supported but wasn't documented before.

Cc: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:20:39 -07:00
Stephen Hemminger 622812052a tc: f_u32 cleanup indentation and long lines
Several long lines and too long messages here.
2016-06-08 16:45:26 -07:00
Samudrala, Sridhar 5e5b3008d1 tc: f_u32: Add support for skip_hw and skip_sw flags
On devices that support TC U32 offloads, these flags enable a filter to be
added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive
flags. By default without any flags, the filter is added to both HW and SW,
but no error checks are done in case of failure to add to HW.
With skip-sw, failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip_sw and the other
with skip_hw flag.

   # add ingress qdisc
   tc qdisc add dev p4p1 ingress

   # enable hw tc offload.
   ethtool -K p4p1 hw-tc-offload on

   # add u32 filter with skip-sw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:1 u32 ht 800: flowid 800:1 \
      skip-sw \
      match ip src 192.168.1.0/24 \
      action drop

   # add u32 filter with skip-hw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:2 u32 ht 800: flowid 800:2 \
      skip-hw \
      match ip src 192.168.2.0/24 \
      action drop

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
2016-06-08 16:39:30 -07:00
Simon Horman 6f1aded9d0 iproute2: correct port in FOU/GRE example
This resolves what appears to be a typo.

Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2016-06-08 09:40:07 -07:00
Stephen Hemminger c68780826d minor header update from net-next 2016-06-08 09:39:03 -07:00
Sabrina Dubroca b26fc590ce ip: add MACsec support
Extend ip-link to create MACsec devices

  ip link add link <master> <macsec> type macsec [options]

Add `ip macsec` command to configure receive-side secure channels and
secure associations within a macsec netdevice.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 09:35:29 -07:00
Sabrina Dubroca 609640f5f0 utils: provide get_hex to read a hex digit from a char
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 09:30:41 -07:00
Sabrina Dubroca 9f7401fa49 utils: add get_be{16, 32, 64}, use them where possible
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 09:30:37 -07:00
Sabrina Dubroca 89ae502056 utils: make hexstring_a2n provide the number of hex digits parsed
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 09:30:31 -07:00
Stephen Hemminger 90353c3341 ip: minor checkpatch cleanup 2016-06-08 09:15:52 -07:00
Eric Dumazet 4de4b5ca14 fq_codel: add per queue memory limit
This patch adds support for TCA_FQ_CODEL_MEMORY_LIMIT attribute.

..
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223)
 rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
  maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
  ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
new_flows_len 1 old_flows_len 177

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-06-08 08:42:00 -07:00
Lucas Bates 06f9a59170 man: tc-ife.8: man page for ife action
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 08:38:27 -07:00
Phil Sutter 3088787c4b man: rtpr: Fix minor typo
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-08 08:37:30 -07:00
Fabien Siron ebef3174b6 misc/ss: Add family list to -f option in _usage()
Signed-off-by: Fabien Siron <fabien.siron@epita.fr>
2016-06-08 08:36:41 -07:00
Peter Heise 9b3c971a49 man: ip-link: Added HSR part
Added HSR part to manpage as follow-up to last commit's
feedback.

Signed-off-by: Peter Heise <peter.heise@airbus.com>
2016-06-08 08:28:53 -07:00
Jamal Hadi Salim ead954cbd4 tc action policer: enable timestamp display
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 13:03:13 -07:00
Stephen Hemminger de70bd2f6b tc: update headers for TCA_POLICE
These are from linux-net but will be in next rc.
2016-05-31 13:02:28 -07:00
Phil Sutter 134080cff3 man: ip, ip-link: Fix ip option location
This patch drops the redundant description of some of ip's options in
ip-link.8's description of the 'show' subcommand, preserving the
description of -iec (but appending it to the list in ip.8 with minor
fixes).

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-05-31 12:33:48 -07:00
Jamal Hadi Salim 82e6efe2e3 tc filter u32: Coding style fixes
"handle" was being used several times for different things.
Fix the 80 character limit abuse and other little issues while at it.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:33:48 -07:00
Stephen Hemminger e6263c8583 tc: action result is u32
In kernel action result is u32 not int in netlink messages.
2016-05-31 12:22:45 -07:00
Jamal Hadi Salim 45c6837911 tc action policer: Avoid nonsensical input
The user must at least specify a choice of the token bucket or
ewma policing or late binding index. TB policing requires at minimal
a rate and burst.

In addition fix formatting issues (80 chars etc).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:16:45 -07:00
David Ahern 57bdf8b764 Make builds default to quiet mode
Similar to the Linux kernel and perf add infrastructure to reduce the
amount of output tossed to a user during a build. Full build output
can be obtained with 'make V=1'

Builds go from:

make[1]: Leaving directory `/home/dsa/iproute2.git/lib'
make[1]: Entering directory `/home/dsa/iproute2.git/ip'
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ip.o ip.c
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ipaddress.o ipaddress.c

to:

...
    AR       libutil.a

ip
    CC       ip.o
    CC       ipaddress.o
...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-31 12:13:07 -07:00
Jamal Hadi Salim e70b9f16ea tc simple action: bug fix
Failed compile
m_simple.c: In function ‘parse_simple’:
m_simple.c:154:6: warning: too many arguments for format [-Wformat-extra-args]
      *argv);
      ^
m_simple.c:103:14: warning: unused variable ‘maybe_bind’ [-Wunused-variable]

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:11:52 -07:00
Daniel Borkmann f8daee42a5 ip, token: add del command
For convenience also add a del command for deleting a token and
update the man page accordingly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-05-31 12:10:29 -07:00
Peter Heise 4273f100ed Added support for selection of new HSR version
A new HSR version was added in 4.7 that can be enabled
via iproute2. Per default the old version is selected,
however, with "ip link add [..] type hsr [..] version 1"
the newer version can be enabled.

Signed-off-by: Peter Heise <peter.heise@airbus.com>
2016-05-31 12:09:29 -07:00
Jamal Hadi Salim a78a2dba27 tc fix ife late binding
following late binding didn't work

sudo tc actions add action ife encode \
type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-23 16:15:31 -07:00
Daniel Borkmann 1a0320727c f_bpf: fix filling of handle when no further arg is provided
We need to fill handle when provided by the user, even if no further
argument is provided. Thus, move the test for arg to the correct location,
so that it works correctly:

  # tc filter show dev foo egress
  filter protocol all pref 1 bpf
  filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action
  filter protocol all pref 1 bpf handle 0x2 bpf.o:[classifier] direct-action
  # tc filter del dev foo egress prio 1 handle 2 bpf
  # tc filter show dev foo egress
  filter protocol all pref 1 bpf
  filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-05-23 16:14:18 -07:00
Kylie McClain 110e84a058 ipaddress: fix build with musl libc
MIN() is defined within sys/param.h.

Signed-off-by: Kylie McClain <somasis@exherbo.org>
2016-05-23 16:11:29 -07:00
Stephen Hemminger 5c33c95924 add if_macsec header
Current version from 4.7-pre-rc1
2016-05-23 16:10:43 -07:00
Stephen Hemminger 0a99e7badf update kernel headers (from 4.7-rc1) 2016-05-23 09:06:11 -07:00
Stephen Hemminger 1d63d8c606 Merge branch 'master' into net-next 2016-05-18 11:57:28 -07:00
Stephen Hemminger bbe2abdf3d vv4.6.0 2016-05-18 11:56:02 -07:00
David Ahern b0a4ce620e ip link: Add support for kernel side filtering
Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-18 11:52:14 -07:00
Jiri Pirko 25ec49be2c devlink: implement shared buffer occupancy control
Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-05-16 11:22:04 -07:00
Jiri Pirko e6d7367d79 devlink: implement shared buffer support
Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-05-16 11:22:04 -07:00
Daniel Borkmann a2de651e64 ingress, clsact: don't add TCA_OPTIONS to nl msg
In ingress and clsact qdisc TCA_OPTIONS are ignored, since it's
parameterless. In tc, we add an empty addattr_l(... TCA_OPTIONS,
NULL, 0) to the netlink message nevertheless. This has the
side effect that when someone tries a 'tc qdisc replace' and
already an existing such qdisc is present, tc fails with
EINVAL here.

Reason is that in the kernel, this invokes qdisc_change() when
such requested qdisc is already present. When TCA_OPTIONS are
passed to modify parameters, it looks whether qdisc implements
.change() callback, and if not present (like in both cases here)
it returns with error. Rather than adding an empty stub to the
kernel that ignores TCA_OPTIONS again, just don't add TCA_OPTIONS
to the netlink message in the first place.

Before:

  # tc qdisc replace dev foo clsact    # first try
  # tc qdisc replace dev foo clsact    # second one
  RTNETLINK answers: Invalid argument

After:

  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-05-16 11:20:50 -07:00
Stephen Hemminger 866f6d77bf Merge branch 'master' into net-next 2016-05-16 11:20:40 -07:00
Jamal Hadi Salim fdf1bdd0f1 tc simple action update and breakage
Brings it closer to more serious actions (adding branching
and allowing for late binding)

Unfortunately this breaks old syntax of the simple action.
But because simple is a pedagogical example unlikely to be used
in production environments (i.e its role is to serve as an example
on how to write actions), then this is ok.

New syntax for simple has new keyword "sdata". Example usage is:

sudo tc actions add action simple sdata "foobar" index 1
or
tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple sdata "foobar"

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:15:12 -07:00
Jamal Hadi Salim 43726b750a tc: don't ignore ok as an action branch
This is what used to happen before:

tc filter add dev tap1 parent ffff: protocol 0xfefe prio 10 \
     u32 match u32 0 0 flowid 1:16 \
     action ife decode allow mark ok

tc -s filter ls dev tap1 parent ffff:
filter protocol [65278] pref 10 u32
filter protocol [65278] pref 10 u32 fh 800: ht divisor 1
filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800
bkt 0 flowid 1:16
  match 00000000/00000000 at 0
        action order 1: ife decode action pipe
         index 2 ref 1 bind 1 installed 4 sec used 4 sec
         type: 0x0
         Metadata: allow mark
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 1 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Note the extra action added at the end..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:58 -07:00
Jamal Hadi Salim d3e511223f tc: introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
    Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:26 -07:00
Stephen Hemminger 29b7968926 add tc_ife.h 2016-05-16 11:13:05 -07:00
Stephen Hemminger 31ce6e0101 update kernel headers from net-next
Take sanitized headers for davem net-next
2016-05-13 14:56:31 -07:00
Stephen Hemminger 0c9ffc0b0a devlink: update uapi header
Get santized version from net-next
2016-05-13 14:49:40 -07:00
Stephen Hemminger 18820bacdc Merge branch 'master' into net-next 2016-05-13 14:48:53 -07:00
Stephen Hemminger c13363e4cd devlink: remove more unused code 2016-05-13 14:48:32 -07:00
subashab@codeaurora.org b38e740903 ss: Remove unused argument from kill_inet_sock
addr is not used here.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2016-05-13 14:47:28 -07:00
Stephen Hemminger 7fd86a9676 Merge branch 'master' into net-next 2016-05-13 14:44:48 -07:00
Stephen Hemminger 719c443bb8 devlink: remove unused code
Unused code causes warnings, removed.
2016-05-13 14:42:06 -07:00
Stephen Hemminger 8a781d7e25 update kernel headers to 4.6-rc6
Close to final upstream headers
2016-05-13 14:41:45 -07:00
Stephen Hemminger 7aca60c0eb Revert "devlink: implement shared buffer support"
This reverts commit b56700bf8a.
2016-05-13 14:38:47 -07:00
Stephen Hemminger c3d25ec392 Revert "devlink: implement shared buffer occupancy control"
This reverts commit a60ebcb6f3.
2016-05-13 14:38:38 -07:00
Edward Cree 2642b6b03e geneve: fix IPv6 remote address reporting
Since we can only configure unicast, we probably want to be able to
display unicast, rather than multicast.

Fixes: 906ac5437a ("geneve: add support for IPv6 link partners")
Signed-off-by: Edward Cree <ecree@solarflare.com>
2016-05-13 14:31:55 -07:00
Jiri Benc 7c337e2c20 ip link gre: print only relevant info in external mode
Display only attributes that are relevant when a GRE interface is in
'external' mode instead of the default values (which are ignored by the
kernel even if passed back).

Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-05-06 11:49:08 -07:00
Jiri Benc df217d5d5c ip link gre: create interfaces in external mode correctly
For GRE interfaces in 'external' mode, the kernel ignores all manual
settings like remote IP address or TTL. However, for some of those
attributes, kernel checks their value and does not allow them to be zero
(even though they're ignored later).

Currently, 'ip link' always includes all attributes in the netlink message.
This leads to problem with creating interfaces in 'external' mode. For
example, this command does not work:

ip link add gre1 type gretap external

and needs a bogus remote IP address to be specified, as the kernel enforces
remote IP address to be either not present, or not null.

Ignore the parameters that do not make sense in 'external' mode.
Unfortunately, we cannot error out, as there may be existing deployments
that workarounded the bug by specifying bogus values.

Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-05-06 11:49:08 -07:00
Quentin Monnet 27d44f3a8a tc: add bash-completion function
Add function for command completion for tc in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Inside iproute2 repository, the completion code is in a new
`bash-completion` toplevel directory.

v2: Remove `if` statement in Makefile: do not try to install in
    /etc/bash_completion.d/ if /usr/share/bash-completion/completions/
    is not found; instead, the user can override the installation path
    with the specific environment variable.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
2016-05-03 09:53:26 -07:00
Stephen Hemminger b76b93ddac update kernel headers from net-next 2016-04-24 22:30:46 -07:00
Eric Dumazet 6df9c7a06a ss: add SK_MEMINFO_DROPS display
SK_MEMINFO_DROPS is added in linux-4.7 for TCP, UDP and SCTP

skmem will display the socket drop count using d prefix as in :

$ ss -tm src :22 | more
State      Recv-Q Send-Q Local Address:Port    Peer Address:Port
ESTAB      0      52     10.246.7.151:ssh      172.20.10.101:50759
	 skmem:(r0,rb8388608,t0,tb8388608,f1792,w2304,o0,bl0,d0)

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-04-22 10:20:32 -07:00
Stephen Hemminger 32c0b9b7a8 update kernel headers from net-next 2016-04-22 10:01:12 -07:00
Stephen Hemminger 8b5be9ecff update inet_diag.h header 2016-04-19 08:06:11 -07:00
Stephen Hemminger 6065805922 Merge branch 'master' into net-next 2016-04-19 08:01:55 -07:00
Jiri Pirko 4bf138d6d2 devlink: add manpage for shared buffer
Manpage for devlink "sb" object.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko a60ebcb6f3 devlink: implement shared buffer occupancy control
Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko b56700bf8a devlink: implement shared buffer support
Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 2f85a9c535 devlink: allow to parse both devlink and port handle in the same time
For filtering purposes, it makes sense for used to either specify
devlink handle of port handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 707a91c549 devlink: introduce dump filtering function
This function is to be used from dump callbacks to decide if the output
currect output should be filtered off or not. Filtering is based on
previously parsed and stored command line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 6563a6eb53 devlink: split dl_argv_parse_put to parse and put parts
It is handy to have parsed cmdline data stored so they can be used for
dumps filtering. So split original dl_argv_parse_put into parse and put
parts.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 43f35be4eb devlink: introduce helper to print out nice names (ifnames)
By default, ifnames will be printed out. User can turn that off using
"-n" option on the command line.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 68cab0ba76 devlink: introduce pr_out_port_handle helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko ebaf76b55e list: add list_add_tail helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko f1239ca1f9 list: add list_for_each_entry_reverse macro
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko ec7513faa9 devlink: fix "devlink port" help message
"dl" -> "devlink"

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:04 -07:00
Eric Dumazet d9ba887e9d ss: take care of unknown min_rtt
Kernel sets info->tcpi_min_rtt to ~0U when no RTT sample was ever
taken for the session, thus min_rtt is unknown.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-04-19 07:56:54 -07:00
Phil Sutter e56a959e55 ss: Fix accidental state filter override
Passing a filter expression and selecting an address family using the
'-f' flag would overwrite the state filter by accident. Therefore
calling e.g. 'ss -nl -f inet '(sport = :22)' would not only print
listening sockets (as requested by '-l' flag) but connected ones, as
well.

Fix this by reusing the formerly ineffective call to filter_states_set()
to restore the state filter as it was before the call to
filter_af_set().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-19 07:56:53 -07:00
Phil Sutter 9d320e1e92 ss: Drop silly assignment
An expression of the form '(a | b) & b' will evaluate to the value of b
for any value of a or b.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-19 07:56:53 -07:00
Jeff Harris 1c346dccf8 ip: neigh: Fix leftover attributes message during flush
Use the same rtnl_dump_request_n call as the show.  The rtnl_wilddump_request
assumes the type uses an ifinfomsg which is not the case for the neighbor
table.

Signed-off-by: Jeff Harris <jefftharris@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-04-19 07:52:33 -07:00
Jiri Benc 346410bdb4 vxlan: add support for VXLAN-GPE
Adds support to create a VXLAN-GPE interface.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-04-11 22:15:49 +00:00
Jiri Benc 42d17a617f ip-link.8: document "external" flag for vxlan
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-04-11 22:15:49 +00:00
Jiri Benc 44df45973a vxlan: 'external' implies 'nolearning'
It doesn't make sense to use external control plane and fill internal FDB at
the same time. It's even an illegal combination for VXLAN-GPE.

Just switch off learning when 'external' is specified.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-04-11 22:15:49 +00:00
Stephen Hemminger 39afc4b08e Merge branch 'master' into net-next 2016-04-11 22:15:41 +00:00
Stephen Hemminger bbac6c6301 ip: whitespace cleanup
Fix whitespace
2016-04-11 22:13:55 +00:00
Phil Sutter fe9322781e ip-link: Support printing VF trust setting
This adds a new item to VF lines of a PF, stating whether the VF is
trusted or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-11 22:11:33 +00:00
Gustavo Zacarias 5c5a0f3df9 iproute2: tc_bpf.c: fix building with musl libc
We need limits.h for PATH_MAX, fixes:

tc_bpf.c: In function ‘bpf_map_selfcheck_pinned’:
tc_bpf.c:222:12: error: ‘PATH_MAX’ undeclared (first use in this
function)
  char file[PATH_MAX], buff[4096];

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 22:09:57 +00:00
Stephen Hemminger 11522e7d02 ip: only display phys attributes with details option
Since output of ip commands are already cluttered, move the physical port details
under a show_details option.
2016-04-11 22:07:51 +00:00
Nicolas Dichtel df590401d6 iplink: display IFLA_PHYS_PORT_NAME
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-04-11 22:02:21 +00:00
Daniel Borkmann 4dd3f50af4 tc, bpf: add support for map pre/allocation
Follow-up to kernel commit 6c9059817432 ("bpf: pre-allocate hash map
elements"). Add flags support, so that we can pass in BPF_F_NO_PREALLOC
flag for disallowing preallocation. Update examples accordingly and also
remove the BPF_* map helper macros from them as they were not very useful.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:54:47 +00:00
Daniel Borkmann afc1a2000b tc, bpf: further improve error reporting
Make it easier to spot issues when loading the object file fails. This
includes reporting in what pinned object specs differ, better indication
when we've reached instruction limits. Don't retry to load a non relo
program once we failed with bpf(2), and report out of bounds tail call key.

Also, add truncation of huge log outputs by default. Sometimes errors are
quite easy to spot by only looking at the tail of the verifier log, but
logs can get huge in size e.g. up to few MB (due to verifier checking all
possible program paths). Thus, by default limit output to the last 4096
bytes and indicate that it's truncated. For the full log, the verbose option
can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:53:58 +00:00
Daniel Borkmann 0395711c52 tc, bpf: add new csum and tunnel signatures
Add new signatures for BPF_FUNC_csum_diff, BPF_FUNC_skb_get_tunnel_opt
and BPF_FUNC_skb_set_tunnel_opt.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:53:58 +00:00
Nikolay Aleksandrov 5a2d0201cc bridge: vlan: add support to filter by vlan id
Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port	vlan ids
eth2	 450

$ bridge -c vlan show vid 450
port	vlan ids
eth2	 400-500

$ bridge vlan show vid 1
port	vlan ids
eth1	 1 PVID Egress Untagged
eth2	 1 PVID
br0	 1 PVID Egress Untagged

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Nikolay Aleksandrov 24687d678f bridge: mdb: add support to filter by vlan id
Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Nikolay Aleksandrov ae6eb9075f bridge: fdb: add support to filter by vlan id
Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Eric Dumazet f1c656e5c0 iplink: display number of rx/tx queues
We can set the attributes, so would be nice to display them when
provided by the kernel.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-04-11 21:51:28 +00:00
Stephen Hemminger bc9fb25788 update kernel headers
Headers up to date with 4.6-net-next
2016-04-11 13:44:50 -07:00
Stephen Hemminger 6268b08c13 update kernel headers
Update from 4.6-rc3
2016-04-11 13:40:40 -07:00
Stephen Hemminger 3273e3c132 devlink: ignore build result
devlink binary is built
2016-04-11 13:35:12 -07:00
Daniel Borkmann 29bb2373a8 geneve: add support to set flow label
Follow-up for kernel commit 8eb3b99554b8 ("geneve: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-03-27 10:58:48 -07:00
Daniel Borkmann f8eb79a624 vxlan: add support to set flow label
Follow-up for kernel commit e7f70af111f0 ("vxlan: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-03-27 10:58:48 -07:00
Jiri Pirko a3c4b484a1 add devlink tool
Add new tool called devlink which is userspace counterpart of devlink
Netlink socket.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-27 10:57:15 -07:00
Jiri Pirko 4952b45946 include: add linked list implementation from kernel
Rename hlist.h to list.h while adding it to be aligned with kernel

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-27 10:56:11 -07:00
Jesse Gross 325d02b44c geneve: Add support for configuring UDP checksums.
Enable support for configuring outer UDP checksums on Geneve tunnels:

ip link add type geneve id 10 remote 10.0.0.2 udpcsum

Signed-off-by: Jesse Gross <jesse@kernel.org>
2016-03-27 10:53:25 -07:00
Jesse Gross af3253984d vxlan: Follow kernel defaults for outer UDP checksum.
On recent kernels, UDP checksum computation has become more efficient and
the default behavior was changed, however, the ip command overrides this
by always specifying a particular behavior.

If the user does not specify that UDP checksums should either be computed
or not then we don't need to send an explicit netlink message - the kernel
can just use its default behavior.

Signed-off-by: Jesse Gross <jesse@kernel.org>
2016-03-27 10:53:25 -07:00
Stephen Hemminger e9e9365b56 scrub out whitespace issues
Run script that removes trailing whitespace everywhere.
2016-03-27 10:50:14 -07:00
Marco Varlese 334af76143 fix get_addr() and get_prefix() error messages
An attempt to add invalid address to interface would print "???" string
instead of the address family name.

For example:
$ ip address add 256.10.166.1/24 dev ens8
Error: ??? prefix is expected rather than "256.10.166.1/24".

$ ip neighbor add proxy 2001:db8::g dev ens8
Error: ??? address is expected rather than "2001:db8::g".

With this patch the output will look like:
$ ip address add 256.10.166.1/24 dev ens8
Error: inet prefix is expected rather than "256.10.166.1/24".

$ ip neighbor add proxy 2001:db8::g dev ens8
Error: inet6 address is expected rather than "2001:db8::g".

Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Signed-off-by: Marco Varlese <marco.varlese@intel.com>
2016-03-27 10:47:02 -07:00
Phil Sutter f63ed3e629 lib/ll_addr: improve ll_addr_n2a() a bit
Apart from making the code a bit more compact and efficient, this also
prevents a potential buffer overflow if the passed buffer is really too
small: Although correctly decrementing the size parameter passed to
snprintf, it could become negative which would then wrap since snprintf
uses (unsigned) size_t for the parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:35 -07:00
Phil Sutter 7faf1588a7 lib/utils: introduce rt_addr_n2a_rta()
This simple macro eases calling rt_addr_n2a() with data from an rt_attr
pointer.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:35 -07:00
Phil Sutter d49f934c10 lib/utils: introduce format_host_rta()
This simple macro eases calling format_host() with data from an rt_attr
pointer.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:35 -07:00
Phil Sutter 2e96d2ccd0 utils: make rt_addr_n2a() non-reentrant by default
There is only a single user who needs it to be reentrant (not really,
but it's safer like this), add rt_addr_n2a_r() for it to use.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter a418e45164 make format_host non-reentrant by default
There are only three users which require it to be reentrant, the rest is
fine without. Instead, provide a reentrant format_host_r() for users
which need it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter ff9d8f3728 ipaddress: colorize peer, broadcast and anycast addresses as well
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter a1121aa1f5 color: introduce color helpers and COLOR_CLEAR
This adds two helper functions which map a given data field to a color,
so color_fprintf() statements don't have to be duplicated with only a
different color value depending on that data field's value. In order for
this to work in a generic way, COLOR_CLEAR has been added to serve as a
fallback default of uncolored output.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter 16418561b7 man: tc-vlan.8: Describe CONTROL option
This should be made generic and part of a common tc-actions man page.
Though leave it here for now to not confuse readers of the example which
uses it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:49 -07:00
Phil Sutter 51011dac36 tc/m_vlan.c: mention CONTROL option in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:48 -07:00
Phil Sutter c73b621cfa man: tc-skbedit.8: Elaborate a bit on TX queues
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:47 -07:00
Phil Sutter 8409abd59b man: tc-police.8: Emphasize on the two rate control mechanisms
As Jamal pointed out, there are two different approaches to bandwidth
measurement. Try to make this clear by separating them in synopsis and
also documenting the way to fine-tune avrate.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:45 -07:00
Phil Sutter 26df2953a5 man: tc-mirred.8: Reword man page a bit, add generic mirror example
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:44 -07:00
Phil Sutter dbfb17a67f man: tc-csum.8: Add an example
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:43 -07:00
Phil Sutter 1672f42195 tc: connmark, pedit: Rename BRANCH to CONTROL
As Jamal suggested, BRANCH is the wrong name, as these keywords go
beyond simple branch control - e.g. loops are possible, too. Therefore
rename the non-terminal to CONTROL instead which should be more
appropriate.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:42 -07:00
Phil Sutter edf35b8824 doc/tc-filters.tex: Drop overly subjective paragraphs
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:40 -07:00
Phil Sutter 3c48c714a3 testsuite: add a test for tc pedit action
This is not a full test, since kernel functionality is not actually
tested. It only compares that the kernel returned values when listing
the action are what one expects them to be.

Since this test succeeded on both a little-endian and a big-endian
system, it shows that any endianness issues have been resolved in
tc/p_ip.c at least.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:38 -07:00
Phil Sutter a33786b582 tc: pedit: Fix raw op
The retain value was wrong for u16 and u8 types.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:36 -07:00
Phil Sutter 77bed404d0 tc: pedit: Fix for big-endian systems
This was tricky to get right:
- The 'stride' value used for 8 and 16 bit values must behave inverse to
  the value's intra word offset to work correctly with big-endian data
  act_pedit is editing.
- The 'm' array's values are in host byte order, so they have to be
  converted as well (and the ordering was just inverse, for some
  reason).
- The only sane way of getting this right is to manipulate value/mask in
  host byte order and convert the output.
- TIPV4 (i.e. 'munge ip src/dst') had it's own pitfall: the address
  parser converts to network byte order automatically. This patch fixes
  this by converting it back before calling pack_key32, which is a hack
  but at least does not require to implement a completely separate code
  flow.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:33 -07:00
Phil Sutter 952f89deba tc/p_ip.c: Minor coding style cleanup
Break overlong function definitions and remove one extraneous
whitespace.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:22 -07:00
Zhang Shengju 95c9d0d301 netconf: add support for ignore route attribute
Add support for ignore route attribute, and refine the code to use
rta_getattr_* function to get attribute value.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-03-21 12:16:25 -07:00
Zhang Shengju c256dcd47c man: update netconf manual for new attributes
Update this manual to add attributes proxy_neigh and
ignore_routes_with_linkdown.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-03-21 12:15:58 -07:00
Stephen Hemminger 52a4474d90 netconf: replace macro with a function
The number of casts in macro was excessive.
2016-03-21 12:13:57 -07:00
Stephen Hemminger b7e0091a92 update kernel headers to 4.6 (pre rc1) 2016-03-21 12:02:32 -07:00
Stephen Hemminger acd1e437be misc: fix style issues
More checkpatch spring cleaning
2016-03-21 11:56:36 -07:00
Stephen Hemminger df4b043f08 bridge: code cleanup
Use checkpatch auto fix to cleanup lingering style issues
2016-03-21 11:56:01 -07:00
Stephen Hemminger 56f5daac98 ip: code cleanup
Run all the ip code through checkpatch and have it fix the obvious stuff.
2016-03-21 11:52:19 -07:00
Stephen Hemminger 32a121cba2 tc: code cleanup
Use checkpatch to fix whitespace and other style issues.
2016-03-21 11:48:36 -07:00
Luca Lemmo 4733b18a5e tc: q_{codel,fq_codel}: add missing space in help text
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:13 -07:00
Luca Lemmo 725f2a872d tc: f_u32: trivial coding style cleanups
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:12 -07:00
Luca Lemmo dd0c8d193f tc: f_u32: add missing spaces around operators
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:12 -07:00
Nikolay Aleksandrov ba0372670d bridge: mdb: add support for extended router port information
Recently a new temp router port mode was added and with it the dumped
information was extended similar to how mdb entries were done. This
patch adds support to dump the new information by using the "-s" switch.
Example:
$ bridge -d -s mdb show
dev br0 port eth1 grp ff02::1:ffbf:5716 temp 234.39
dev br0 port eth1 grp 239.0.0.2 temp  97.17
dev br0 port eth1 grp 239.0.0.3 temp 105.36
router ports on br0: eth1    0.00 permanent
router ports on br0: eth2  254.87 temp

It also updates the bridge man page.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-03-14 16:05:09 -07:00
Stephen Hemminger 165303e57f Merge branch 'master' into net-next 2016-03-14 16:05:00 -07:00
Stephen Hemminger 162b3ce92e v4.5.0 2016-03-14 16:02:31 -07:00
Stephen Hemminger 0cfb9f6acd Merge branch 'master' into net-next 2016-03-06 12:56:23 -08:00
Phil Sutter 338b003bcc tc: pedit: Fix retain value for ihl adjustments
Since the IP Header Length field is just half a byte, adjust retain to
only match these bits so the Version field is not overwritten by
accident.

The whole concept is actually broken due to dependency on endianness
which pedit ignores.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter f440e9d8c2 tc: pedit: Fix parse_cmd()
This was horribly broken:
* pack_key8() and pack_key16() ...
  * missed to invert retain value when applying it to the mask,
  * did not sanitize val by ANDing it with retain,
  * and ignored the mask which is necessary for 'invert' command.
* pack_key16() did not convert mask to network byte order.
* Changing the retain value for 'invert' or 'retain' operation seems
  just plain wrong.
* While here, also got rid of unnecessary offset sanitization in
  pack_key32().
* Simplify code a bit by always assigning the local mask variable to
  tkey->mask before calling any of the pack_key*() variants.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter ec0ceeec49 tc: pedit: Fix layered op parsing
After lookup of the layered op submodule, pedit would pass argv and argc
including the layered op identifier at first position which confused the
submodule parser. Fix this by calling NEXT_ARG() before calling the
parse_peopt() callback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter 72b365e8e0 libnetlink: Double the dump buffer size
There have been reports about 'ip addr' printing "Message truncated" on
systems with large numbers of VFs. Although I haven't been able to get
my hands on hardware suitable to reproduce this, increasing the dump
buffer has been reported to resolve the issue. For want of a better
idea, just double the buffer size to 32k.

Feels like this opportunistic buffer size selection is rather
workarounding a design flaw in libnetlink or maybe even the netlink
protocol itself.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:51:18 -08:00
Phil Sutter dd81ee04ed ifstat, nstat: fix daemon mode
Since the relevant code (and it's bugs) is identical in both files, fix
them in one go. This patch fixes multiple issues:

* Using 'int' for the 'tdiff' variable does not suffice on 64bit
  systems, the assigned initial time difference makes it wrap and
  contain a negative value afterwards. Instead use the more appropriate
  'time_t' type.

* As far as I understood the code, poll() is supposed to time out just
  at the right time to trigger update_db() in the configured interval.
  Therefore it's timeout must be set to the desired interval *minus* the
  time that has already passed since then.

* With the last change to the algorithm in place, it does not make sense
  to call update_db() before returning data to the connected client.
  Actually, it never does otherwise we could skip the periodic updates
  in the first place.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:49:05 -08:00
Elad Raz 29d61fb385 bridge: mdb: add support for offloaded mdb entries
Mark MDB entries which are offloaded to HW with "offload" flag

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-06 12:46:56 -08:00
Stephen Hemminger 6b3e03881c Merge branch 'master' into net-next 2016-03-04 15:47:18 -08:00
Phil Sutter 5f4d27d533 doc: Add my article about tc, filters and actions
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:29:00 -08:00
Phil Sutter 03a0cf20b4 ipneigh: List all nud states in help output
To not make the output overly confusing, list them in a definition of
the STATE placeholder which is already used in the show/flush syntax but
wasn't explained before.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter 948acfed23 man: ip-neighbour.8: Document all known nud states
Not sure how useful they are in practice, but as 'ip neigh' supports
setting them all, they deserve to be described as well.

While at it, also add a missing layer of indentation to the subordinate
nud state list.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter 0ce05841d5 doc, man: ip-rule: Remove incorrect statement about rule 0
The documentation is wrong here: it is indeed possible to remove policy
rule 0 and recreate it afterwards. Therefore remove these statements.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter 2452c57a52 man: ip-route: Make synopsis consistent with description
While the synopsis section contains 'ip route list', it is later
described as 'ip route show'. Make this consistent by replacing 'list'
with 'show' in synopsis.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter c024acc641 tc: pedit: document branch control in help output
This seems to have been a hidden feature, though it's very useful and
necessary at least when combining multiple pedit actions.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter 4853ee5281 man: ip-link: Beef up VXLAN csum options a bit
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter b487954d5b man: tc-u32: Minor syntax fix
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter bcdd39c588 man: ship action man pages
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Phil Sutter fa2c34eff1 man: Add a man page for the xt action
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter 8a1c6d4894 man: Add a man page for the vlan action
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter ae6cf29be0 man: Add a man page for the skbedit action
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter ebf9933bb3 man: Add a man page for the simple action
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter d477eea5a6 man: Add a man page for the police action
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter 448800026f man: Add a man page for the pedit action
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter ec0bab1e02 man: Add a man page for the nat action
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter 61d74eed70 man: Add a man page for the mirred action
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter 438dd1d49d man: Add a man page for the csum action.
Cc: Gregoire Baron <baronchon@n7mm.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter 1b5440e94f man: Add a man page for the connmark action
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:51 -08:00
Phil Sutter e895ae0b31 man: ip-*.8: drop any reference to generic ip options
Listing generic 'ip' options in subcommand man pages is redundant and
error-prone, as they won't be kept in sync anyway. Since many other man
pages don't list them either, drop references to them in the remaining
ones.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:53 -08:00
Phil Sutter 2227f2a5a2 man: ip-l2tp.8: Fix BNF syntax
The 'ADDR' part of 'local' and 'remote' parameters is not optional, but
may also consist of the word 'any'. While at it, add missing whitespace
and fix fonts.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:53 -08:00
Phil Sutter ac0eff58fd man: ip.8: Add missing flags and token subcommand description
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter a7eef7aa70 man: ip-xfrm.8: Document missing parameters
Namely, 'extra-flag' of 'ip xfrm state' and 'flag' of 'ip xfrm policy'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 5d8cb0900e man: ip-tunnel.8: Document missing 6rd action
Also drop the non-terminal 'TIME' description as it is not referenced
anywhere.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 16a124ea2d man: ip-token.8: Review synopsis section
Drop unnecessary curly braces around single action keywords, point out
that 'dev' parameter to 'ip token get' is optional and clarify that 'ip
token' defaults to 'list' action.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 582b0fc6cb man: ip-rule.8: Review synopsis section
Clarify that 'ip rule' defaults to action 'list', that 'flush' and
'save' actions don't accept additional parameters, add missing 'not' and
'goto' keywords and finally fix fonts used in 'fwmark' and 'realms'
parameters.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 54beacc334 man: ip-ntable.8: Review synopsis section
The first line contained a c'n'p error, incorrectly listing 'ip address'
syntax. Since PARAMS is used just once and there are not many other
parameters to 'ip ntable change', state them inline and in addition to
that clarify the possibility to pass multiple parameters at once.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 57e1ace02a man: ip-netns.8: Clarify synopsis a bit
Use brackets to show that 'ip netns' defaults to action 'list', drop
superfluous curly braces around 'set' action keyword.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 03cb9d58bc man: ip-neighbour: Fix for missing NUD_STATE description
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter ca611d6408 man: ip-link.8: Fix and improve synopsis
Reflect that it is possible to pass multiple parameters at the same
time, also use the same trick the help text uses to emphasize vf
specific parameters.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter d890144ecf man: ip-link.8: minor font fix
We commonly use bold font for terminals and italic for non-terminals.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 37fdeb585d man: ip-address.8: Minor syntax fixes
Clarify that the optional '-' prefix of the 'tentative', 'deprecated'
and 'dadfailed' keywords has to be put right in front of them, no
whitespace is allowed in between.

In addition to that, clarify that it is valid to pass both 'valid_lft'
and 'preferred_lft' at the same time to 'ip address'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 20f2af78fb iprule: add missing nat keyword to help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:52 -08:00
Phil Sutter 070ebbdf75 iproute: TYPE keyword is not optional, fix help text accordingly
This is a bit pedantic, but brackets ([]) show optional values and since
TYPE must not become empty, they're not suited to surround the type
keyword choices. Use curly braces instead.

Also add some missing whitespace to the parameter list above.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Phil Sutter f1fdcfe66a ipntable: Fix typo in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Phil Sutter c339b4cc53 ipneigh: add missing proxy keyword to help text
And while we're at it, add whitespace around braces and pipe symbol.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Phil Sutter 5c2ea5b8c0 iplink: fix help text syntax
Get rid of extraneous closing brackets and while here, merge the double
netns parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Phil Sutter 27ff1a564b ipaddrlabel: Improve help text precision
Neither 'list' nor 'flush' actions accept parameters, and with given
prefix the action keyword is not optional anymore.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Phil Sutter 7a53aa592f ip: align help text with manpage
Although the ip command accepts both "neighbor" and "neighbour" as
subcommand, I assume it's sufficient to list it in help text as just
"neigh" like ip.8 does.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 11:23:51 -08:00
Stephen Hemminger da60d03136 Merge branch 'master' into net-next 2016-03-02 09:41:28 -08:00
Phil Sutter e897776690 ipl2tp: Print help even on systems without l2tp support
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 09:33:20 -08:00
Nikolay Aleksandrov 05d4f64d4a bridge: mdb: add user-space support for extended attributes
Recently support was added to the kernel to be able to add more per-mdb
entry attributes via standard netlink attributes of type MDBA_MDB_EATTR_.
This patch adds support to iproute2 to parse and output these
attributes. The first exported attribute is the mdb "timer" value which
is shown only when the "-s" iproute2 arg is used.

Example:
$ bridge -s mdb show
dev br0 port eth1 grp 239.0.0.11 permanent   0.00
dev br0 port eth1 grp 239.0.0.10 temp 244.15
dev br0 port eth1 grp 239.0.0.1 temp 245.21
dev br0 port eth1 grp 239.0.0.5 temp 246.43
dev br0 port eth2 grp 239.0.0.5 temp 248.44
dev br0 port eth1 grp 239.0.0.2 temp 245.32

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-03-02 09:31:46 -08:00
Stephen Hemminger 2421ab750a update to current 4.5-rc net-next headers 2016-03-02 09:30:56 -08:00
Stephen Hemminger 240b3573f7 Merge branch 'master' into net-next 2016-03-02 09:27:09 -08:00
Phil Sutter 67eedcd9a1 iprule: Align help text with man page synopsis
The help text was misleading: One could think it is possible to list
rules by selector, which would be nice but isn't. This change also
clarifies that 'ip rule' defaults to 'list' if no further arguments are
given.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-02 09:26:32 -08:00
Hiroshi Shimamoto b6d77d9ee3 iplink: Support VF Trust
Add IFLA_VF_TRUST message to trust the VF.
PF can accept some privileged operation from the trusted VF.
For example, ixgbe PF doesn't allow to enable VF promiscuous mode until
the VF is trusted because it may hurt performance.

To trust VF.
 # ip link set dev eth0 vf 1 trust on

To untrust VF.
 # ip link set dev eth0 vf 1 trust off

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
2016-03-02 09:26:24 -08:00
Stephen Hemminger 9ecf3da984 Merge branch 'master' into net-next 2016-02-21 12:02:10 -08:00
Nikolay Aleksandrov 8a4cd3943f iplink: bridge: remove unnecessary returns
invarg exits so no need to return, remove this c&p error from my recent
patches

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-21 12:00:47 -08:00
Roopa Prabhu c6d0cfb54b bridge: add support for dynamic fdb entries
This patch is a follow up to the recently added
'static' fdb option.

It introduces a new option 'dynamic' which adds
dynamic fdb entries with NUD_REACHABLE.

$bridge fdb add 00:01:02:03:04:06 dev eth0 master dynamic

$bridge fdb show
00:01:02:03:04:06 dev eth0

This patch also documents all fdb types. Removes 'temp'
from usage message since it is now replaced by 'static'.
'temp' still works and is synonymous with static.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-02-21 12:00:41 -08:00
Stephen Hemminger 1a6b1aa602 Merge branch 'master' into net-next 2016-02-17 17:53:28 -08:00
Dmitrii Shcherbakov 467f9fce60 htb: rename b4 buffer to b3 to make its name more consistent
b3 buffer has been deleted previously so b2 is followed by b4
which is not consistent.

Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-02-17 17:50:14 -08:00
Dmitrii Shcherbakov 1aea7fea26 htb: remove printing of a deprecated overhead value
Remove printing according to the previously used encoding of mpu and
overhead values within the tc_ratespec's mpu field. This encoding is
no longer being used as a separate 'overhead' field in the ratespec
structure has been introduced.

Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-02-17 17:49:47 -08:00
Nikolay Aleksandrov 478a8e5920 iplink: bridge_slave: add support for IFLA_BRPORT_FAST_LEAVE
Add support to be able to view and change IFLA_BRPORT_FAST_LEAVE
port attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-17 17:47:01 -08:00
Nikolay Aleksandrov 10759a90ab iplink: bridge_slave: add support for IFLA_BRPORT_MULTICAST_ROUTER
Add support to be able to view and change IFLA_BRPORT_MULTICAST_ROUTER port
attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-17 17:47:01 -08:00
Nikolay Aleksandrov 38b31a78da iplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP_WIFI
Add support to be able to view and change IFLA_BRPORT_PROXYARP_WIFI port
attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-17 17:47:01 -08:00
Nikolay Aleksandrov f6e615dec9 iplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP
Add support to be able to view and change IFLA_BRPORT_PROXYARP port
attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-17 17:47:00 -08:00
Nikolay Aleksandrov 3069539fb8 iplink: bridge_slave: export read-only values
Export all the read-only values that get returned about a bridge port
such as the timers, the ids, designated_port and cost,
topology_change_ack and config_pending. For the bridge ids the
br_dump_bridge_id function is exported from iplink_bridge.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-17 17:47:00 -08:00
David Ahern 33e41670d7 vrf: Add support for slave_info
Print VRF slave_info attributes if present.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-02-17 17:45:04 -08:00
Stephen Hemminger 9e99e49528 ss: display not_sent and min_rtt info
Display new info from net-next kernel.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-02-17 17:44:39 -08:00
Stephen Hemminger 0aefb9fa41 Merge branch 'master' into net-next 2016-02-17 17:36:17 -08:00
Nicolas Cavallari a1b4a274d4 netns: Fix an off-by-one strcpy() in netns_map_add().
netns_map_add() does a malloc of (sizeof (struct nsid_cache) +
strlen(name)) and then proceed with strcpy() of name into the
zero-length member at the end of the nsid_cache structure.  The
nul-terminator is written outside of the allocated memory and may
overwrite the allocator's internal structure.

This can trigger a segmentation fault on i386 uclibc with names of size 8:
after the corruption occurs, the call to closedir() on netns_map_init()
crashes while freeing the DIR structure.

Here is the relevant valgrind output:

==1251== Memcheck, a memory error detector
==1251== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1251== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
info
==1251== Command: ./ip netns
==1251==
==1251== Invalid write of size 1
==1251==    at 0x4011975: strcpy (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==1251==    by 0x8058B00: netns_map_add (ipnetns.c:181)
==1251==    by 0x8058E2A: netns_map_init (ipnetns.c:226)
==1251==    by 0x8058E79: do_netns (ipnetns.c:776)
==1251==    by 0x804D9FF: do_cmd (ip.c:110)
==1251==    by 0x804D814: main (ip.c:300)
2016-02-17 17:35:31 -08:00
Stephen Hemminger 07ec183418 iplink: display rx nohandler stats
Support for the new rx_nohandler statistic.
This code is designed to handle the case where the kernel reported statistic
structure is smaller than the larger structure in later releases (and vice versa).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-02-09 11:16:57 -08:00
Stephen Hemminger 655235754a Merge branch 'master' into net-next 2016-02-09 10:52:26 -08:00
Stephen Hemminger 385caeb13b Revert "tipc: add peer remove functionality"
This reverts commit f9dec657e4.

Since this code is not in upstream kernel, it shouldn't be in iproute2
2016-02-09 10:51:32 -08:00
Stephen Hemminger 8593b2cac0 Update header files from net-next 2016-02-09 10:49:03 -08:00
Roopa Prabhu ecc509f9a3 ip route: add mpls multipath support
This patch adds support to add mpls multipath
routes.

example:
ip -f mpls route add 100 \
	nexthop as 200 via inet 10.1.1.2 dev swp1 \
	nexthop as 700 via inet 10.1.1.6 dev swp2

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-02-09 10:43:16 -08:00
Nikolay Aleksandrov e6c38e2c59 iplink: bond_slave: fix ad_actor/partner_oper_port_state output
It seems that I've made a mistake when I exported these, instead of a
space in the end I've put a newline character which is wrong and breaks
the single line output.

Fixes: 7d6bc3b87a ("bonding: export 3ad actor and partner port state")
Reported-by: Sam Tannous <stannous@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:43:06 -08:00
Nikolay Aleksandrov 861c5dae5c iplink: bridge: add support for netfilter call attributes
This patch implements support for the IFLA_BR_NF_CALL_(IP|IP6|ARP)TABLES
attributes in iproute2 so it can change their values.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 178b18066a iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_INTVL
This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_INTVL
attribute in iproute2 so it can change the startup query interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 483df11cf1 iplink: bridge: add support for IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
This patch implements support for the IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
attribute in iproute2 so it can change the query response interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 5a32388f5c iplink: bridge: add support for IFLA_BR_MCAST_QUERY_INTVL
This patch implements support for the IFLA_BR_MCAST_QUERY_INTVL attribute
in iproute2 so it can change the query interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 1f2244b851 iplink: bridge: add support for IFLA_BR_MCAST_QUERIER_INTVL
This patch implements support for the IFLA_BR_MCAST_QUERIER_INTVL
attribute in iproute2 so it can change the querier interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 7f3d559226 iplink: bridge: add support for IFLA_BR_MCAST_MEMBERSHIP_INTVL
This patch implements support for the IFLA_BR_MCAST_MEMBERSHIP_INTVL
attribute in iproute2 so it can change the membership interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 10082a253f iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_INTVL
This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_INTVL
attribute in iproute2 so it can change the last member interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov ceb6486655 iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_CNT
This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_CNT
attribute in iproute2 so it can change the startup query count.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov fb44cadb92 iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_CNT
This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_CNT
attribute in iproute2 so it can change the last member count value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 8b9eb7cd25 iplink: bridge: add support for IFLA_BR_MCAST_HASH_MAX
This patch implements support for the IFLA_BR_MCAST_HASH_MAX attribute
in iproute2 so it can change the maximum hashed entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 92c0ef7071 iplink: bridge: add support for IFLA_BR_MCAST_HASH_ELASTICITY
This patch implements support for the IFLA_BR_MCAST_HASH_ELASTICTITY
attribute in iproute2 so it can change the hash elasticity value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 0778b74122 iplink: bridge: add support for IFLA_BR_MCAST_QUERIER
This patch implements support for the IFLA_BR_MCAST_QUERIER attribute
in iproute2 so it can toggle the mcast querier value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 449843d1d6 iplink: bridge: add support for IFLA_BR_MCAST_QUERY_USE_IFADDR
This patch implements support for the IFLA_BR_MCAST_QUERY_USE_IFADDR
attribute in iproute2 so it can toggle the multicast_query_use_ifaddr val.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 7ddd2d946c iplink: bridge: add support for IFLA_BR_MCAST_SNOOPING
This patch implements support for the IFLA_BR_MCAST_SNOOPING attribute
in iproute2 so it can change the multicast snooping value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 963d137cf9 iplink: bridge: add support for IFLA_BR_MCAST_ROUTER
This patch implements support for the IFLA_BR_MCAST_ROUTER attribute
in iproute2 so it can change the multicast router value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 719832af6c iplink: bridge: add support for IFLA_BR_VLAN_DEFAULT_PVID
This patch implements support for the IFLA_BR_VLAN_DEFAULT_PVID
attribute in iproute2 so it can change the default pvid.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 0a61aa3963 iplink: bridge: add support for IFLA_BR_GROUP_ADDR
This patch implements support for the IFLA_BR_GROUP_ADDR attribute
in iproute2 so it can change the group address.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 8caaf33bdb iplink: bridge: add support for IFLA_BR_GROUP_FWD_MASK
This patch implements support for the IFLA_BR_GROUP_FWD_MASK attribute
in iproute2 so it can change the group forwarding mask.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 8c0f7a1630 iplink: bridge: export read-only timers
Netlink already provides hello_timer, tcn_timer, topology_change_timer
and gc_timer, so let's make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 4e3bbc6658 iplink: bridge: export root_(port|path_cost), topology_change and change_detected
Netlink already export these values, we just need to make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Nikolay Aleksandrov 70dfb0b883 iplink: bridge: export bridge_id and designated_root
Netlink returns the bridge_id and designated_root, we just need to
make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-02-09 10:42:03 -08:00
Roopa Prabhu a1987cd17f bridge: support for static fdb entries
There is no intuitive option to add static fdb entries today.
'temp' seems to have a side effect of adding
'static' fdb entries. But the name and intent
of 'temp' does not say anything about it being static.

example:
bridge fdb add operates as follows:

$bridge fdb add 00:01:02:03:04:05 dev eth0 master
$bridge fdb add 00:01:02:03:04:06 dev eth0 master temp
$bridge fdb add 00:01:02:03:04:07 dev eth0 master local

$bridge fdb show
00:01:02:03:04:05 dev eth0 permanent
00:01:02:03:04:06 dev eth0 static
00:01:02:03:04:07 dev eth0 permanent
00:01:02:03:04:08 dev eth0 <<== dynamic, ageable learned mac

This patch adds a new bridge fdb type 'static' which
makes sure NUD_NOARP and NUD_REACHABLE is set for static
entries. This effectively is nothing but what 'temp'
does today. But the name 'temp' is misleading.

After the patch:
$bridge fdb add 00:01:02:03:04:06 dev eth0 master static

$bridge fdb show
00:01:02:03:04:06 dev eth0 static

'temp' could ideally be a dynamic mac that can age (ie just
NUD_REACHABLE). But, 'temp' sets 'NUD_NOARP' and 'NUD_REACHABLE'.
Too late to change 'temp' now. But, we are thinking of introduing a
'dynamic' keyword after this patch that only sets NUD_REACHABLE.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-02-07 11:41:09 -08:00
Daniel Borkmann 5230a2ede0 tc, bpf: use bind/type macros from gelf
Don't reimplement them and rather use the macros from the gelf header,
that is, GELF_ST_BIND()/GELF_ST_TYPE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-07 11:27:38 -08:00
Daniel Borkmann a576c6b977 tc, bpf: give some more hints wrt false relos
Provide some more hints to the user/developer when relos have been found
that don't point to ld64 imm instruction. Ran couple of times into relos
generated by clang [1], where the compiler tried to uninline inlined
functions with eBPF and emitted BPF_JMP | BPF_CALL opcodes. If this seems
the case, give a hint that the user should do a work-around to use
always_inline annotation.

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-07 11:27:38 -08:00
Daniel Borkmann f31645d138 tc, bpf: improve verifier logging
With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in
size, it happens rather quickly that bpf(2) rejects also valid programs
when only the verifier log buffer size we have in tc is too small.

Change that, so by default we don't do any logging, and only in error
case we retry with logging enabled. If we should fail providing a
reasonable dump of the verifier analysis, retry few times with a larger
log buffer so that we can at least give the user a chance to debug the
program.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
2016-02-07 11:27:38 -08:00
Daniel Borkmann 92a36995b3 tc, bpf, examples: further bpf_api improvements
Add a couple of improvements to tc's BPF api, that facilitate program
development.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-07 11:27:38 -08:00
Paolo Abeni 9450c5ec63 geneve: add support for lwt tunnel creation and dst port selection
This change add the ability to create lwt/flow based/externally
controlled geneve device and to select the udp destination port used
by a full geneve tunnel.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2016-02-05 11:55:06 +11:00
Nicolas Dichtel 67584e3ab2 tc: fix compilation with old gcc (< 4.6) (bis)
Commit 8f80d450c3 ("tc: fix compilation with old gcc (< 4.6)") was reverted
to ease the merge of the net-next branch.

Here is the new version.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-05 11:46:18 +11:00
Roopa Prabhu a9390c921a ipmonitor: match user option 'all' before 'all-nsid'
'ip monitor all' is broken on older kernels.
This patch fixes 'ip monitor all' to match
'all' and not 'all-nsid'.

It moves parsing arg 'all-nsid' to after parsing
'all'.

Before:
$ip monitor all
NETLINK_LISTEN_ALL_NSID: Protocol not available

After:
$ip monitor all
[NEIGH]Deleted 10.0.0.1 dev eth1 lladdr c4:54:44:4f:b2:dd STALE

Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-02-05 11:45:02 +11:00
Daniel Borkmann 2486337aac tc, bpf: make sure relo is in relation with map section
Add a test that symbol from relocation entry is actually related
to map section and bail out with an error message if it's not the
case; in relation to [1].

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-02-02 16:04:11 +11:00
Gustavo Zacarias 4a36b4c2ec iproute2: fix building with musl
We need limits.h for PATH_MAX, fixes:

rt_names.c:364:13: error: ‘PATH_MAX’ undeclared (first use in this
function)

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
2016-02-02 15:58:33 +11:00
Zhang Shengju eb85526923 ip-link: remove warning message
the warning was:
iproute.c:301:12: warning: 'val' may be used uninitialized in this
function [-Wmaybe-uninitialized]
   features &= ~RTAX_FEATURE_ECN;
            ^
iproute.c:575:10: note: 'val' was declared here
   __u32 val;
	  ^

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2016-02-02 15:57:43 +11:00
Stephen Hemminger 62392ecbbb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2016-02-02 15:57:23 +11:00
Lorenzo Colitti fb2594c183 ss: support closing inet sockets via SOCK_DESTROY.
This patch adds a -K / --kill option to ss that attempts to
forcibly close matching sockets using SOCK_DESTROY.

Because ss typically prints sockets instead of acting on them,
and because the kernel only supports forcibly closing some types
of sockets, the output of -K is as follows:

- If closing the socket succeeds, the socket is printed.
- If the kernel does not support forcibly closing this type of
  socket (e.g., if it's a UDP socket, or a TIME_WAIT socket),
  the socket is silently skipped.
- If an error occurs (e.g., permission denied), the error is
  reported and ss exits.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-01-18 11:47:03 -08:00
Lorenzo Colitti 57fdf2d4d9 libnetlink: don't print NETLINK_SOCK_DIAG errors in rtnl_talk
This change is a no-op, as currently no code uses rtnl_talk on
NETLINK_SOCK_DIAG_BY_FAMILY sockets. It is needed to suppress
spurious errors when using SOCK_DESTROY via rtnl_talk.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2016-01-18 11:47:03 -08:00
Thomas Faivre 1ab0f02f46 ip-link: fix man page warnings
grff wrapper returns warnings when parsing the ip-link.8.in file.

How to reproduce:
$ man --warnings ip-link > /dev/null
`R' is a string (producing the registered sign), not a macro.
[...]

Signed-off-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-01-18 11:45:02 -08:00
Thomas Faivre 5cd64c979f vxlan: fix help and man text
Options 'group' and 'remote' cannot take 'any' as value but 'local' can.

Signed-off-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-01-18 11:44:22 -08:00
Daniel Borkmann 8187b01273 tc, bpf: more header checks on loading elf
eBPF llvm backend can support different BPF formats, make sure the object
we're trying to load matches with regards to endiannes and while at it, also
check for other attributes related to BPF ELFs.

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 3.8.0svn
    Optimized build.
    Built Jan  9 2016 (02:08:10).
    Default target: x86_64-unknown-linux-gnu
    Host CPU: ivybridge

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
      [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-01-18 11:41:27 -08:00
Daniel Borkmann cce3d4664c tc, bpf: check section names and type everywhere
When extracting sections, we better check for name and type. Noticed
that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre
3.7), while more recent ones only seem to emit .strtab. Thus, make sure
we get the right sections.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-01-18 11:41:27 -08:00
Daniel Borkmann 8f9afdd531 tc, clsact: add clsact frontend
Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add
clsact qdisc"). Quoting example usage from that commit description:

  Example, adding qdisc:

  # tc qdisc add dev foo clsact
  # tc qdisc show dev foo
  qdisc mq 0: root
  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc clsact ffff: parent ffff:fff1

  Adding filters (deleting, etc works analogous by specifying ingress/egress):

  # tc filter add dev foo ingress bpf da obj bar.o sec ingress
  # tc filter add dev foo egress  bpf da obj bar.o sec egress
  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
  # tc filter show dev foo egress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

The ingress parent alias can also be used with ingress qdisc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-18 11:41:27 -08:00
Daniel Borkmann 0d45c4b420 tc, ingress: clean up ingress handling a bit
Clean it up a bit, we can also get rid of some ugly ifdefs as in our case
TC_H_INGRESS is always defined.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-18 11:41:27 -08:00
Stephen Hemminger 7321b7db6f update headers (post 4.4 merge window) 2016-01-18 09:40:13 -08:00
Stephen Hemminger 2505780c20 Merge branch 'net-next' 2016-01-18 09:37:45 -08:00
Stephen Hemminger bc223ab861 Revert "tc: fix compilation with old gcc (< 4.6)"
This reverts commit 8f80d450c3.
2016-01-18 09:37:38 -08:00
Richard Alpe f9dec657e4 tipc: add peer remove functionality
This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
2016-01-11 08:39:15 -08:00
Stephen Hemminger 92a0236a3c v4.4.0 2016-01-11 08:33:03 -08:00
Stephen Hemminger 19ec5f8393 Revert "tipc: add peer remove functionality"
This reverts commit d4585a4bb1.
This commit is meant for later kernel.
2016-01-11 08:31:46 -08:00
Jamal Hadi Salim 488b41d020 tc: flower no need to specify the ethertype
since all tc classifiers are required to specify ethertype as part of grammar
By not allowing eth_type to be specified we remove contradiction for
example when a user specifies:
tc filter add ... priority xxx protocol ip flower eth_type ipv6
This patch removes that contradiction

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-01-11 08:24:01 -08:00
Julien Floret 8f80d450c3 tc: fix compilation with old gcc (< 4.6)
gcc < 4.6 does not handle C11 syntax for the static initialization of
anonymous struct/union, hence the following error:

tc_bpf.c:260: error: unknown field map_type specified in initializer

Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-11 08:23:36 -08:00
Roopa Prabhu f921f567d1 iplink: replace exit with return
This patch replaces exits with returns in iplink
command. Helps to continue on errors when
invoked with ip -force -batch.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2016-01-11 08:23:27 -08:00
Phil Sutter de7db5d857 tc: m_connmark: Fix help text
When specifying a conntrack zone, the 'zone' keyword has to be used
before the actual zone index.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-01-07 10:35:08 -08:00
Stephen Hemminger e947d8947d man: fix whatis for fq
The FQ man page was not following whatis formatting rules.
2016-01-06 10:29:06 -08:00
Richard Alpe d4585a4bb1 tipc: add peer remove functionality
This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
2016-01-06 09:24:25 -08:00
Richard Alpe 0257369837 tipc: fix help text spelling error in node.c 2016-01-06 09:24:25 -08:00
Bjørn Mork 8f0777a857 man: iplink: document new addrgenmodes
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
2016-01-06 09:20:59 -08:00
Bjørn Mork 8e12bc0a9d iplink: support show and set of "addrgenmode random"
"random" is a new IPv6 addrgenmode, enabling "stable_secret" type
addresses with an auto-generated secret.

$ ip link set eth0 addrgenmode random

$ ip -d link show dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:21:86:a3:25:7d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode random

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
2016-01-06 09:20:59 -08:00
Bjørn Mork 8e098dd81a iplink: support setting addrgenmode stable_secret
It is possible to switch to another addrgenmode after setting a
valid secret.  Allow switching back without reconfiguring the
secret for completeness.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
2016-01-06 09:20:59 -08:00
Stephen Hemminger a4c89d8087 update most kernel headers
still have issues with xtables
2016-01-06 09:14:29 -08:00
Stephen Hemminger 5cd1adba79 Update to current iptables headers
Keep in sync with current iptables upstream
2016-01-03 15:14:27 -08:00
Stephen Hemminger c13b6b097a add coverity model file
Track any coverity overrides for this project.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-12-30 18:06:12 -08:00
Stephen Hemminger b90b773ca6 lnstat: fix error handling
Error handling was silent and had leaks.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-12-30 17:28:11 -08:00
Stephen Hemminger e49b51d663 monitor: fix file handle leak
In some cases passing file to monitor left file open.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-12-30 17:26:38 -08:00
Stephen Hemminger b27f005b27 genl: make string const
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-12-30 17:17:45 -08:00
Hangbin Liu 3fbe7ca847 iproute2: ip-route.8.in: Add expires option for ip route
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2015-12-30 12:35:04 -08:00
Hangbin Liu 966fe23a7c iproute2: ip-route.8.in: Add missing '[' before 'pref'
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2015-12-30 12:35:04 -08:00
Hangbin Liu 68eede2505 route: allow routes to be configured with expire values
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2015-12-21 21:38:29 -08:00
Stephen Hemminger 5d3ec43849 Merge branch 'master' into net-next 2015-12-21 21:37:21 -08:00
Phil Sutter f8fc1d101e iptunnel: Fix compile error in ip/tunnel.c
I repeatedly failed to get this right, so now I have to clean up my mess
afterwards.

Fixes: 7d6aadcd0a ("ip{,6}tunnel: have a shared stats parser/printer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-12-21 21:33:51 -08:00
Phil Sutter 7d6aadcd0a ip{,6}tunnel: have a shared stats parser/printer
This has a slight side-effect of not aborting when /proc/net/dev is
malformed, but OTOH stats are not parsed for uninteresting interfaces.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-12-18 11:46:21 -08:00
Paolo Abeni d95cdcf52b lwtunnel: implement support for ip6 encap
Currently ip6 encap support for lwtunnel is missing.
This patch implement it, mostly duplicating the ipv4 parts.

Also be sure to insert a space after the encap type, when
showing lwtunnel, to avoid the tunnel type and the following
argument being merged into a single word.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2015-12-18 11:40:32 -08:00
Paolo Abeni 926b39e1fe gre: add support for collect metadata flag
This patch add support for IFLA_GRE_COLLECT_METADATA via the
'external' keyword to the gre link.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2015-12-18 11:40:32 -08:00
Paolo Abeni e79c327edd vxlan: add support for collect metadata flag
This patch add support for IFLA_VXLAN_COLLECT_METADATA via the
'external' keyword to the vxlan link.

Also enforce mutual exclusion between 'vni' and 'external'.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2015-12-18 11:40:32 -08:00
Hannes Frederic Sowa 5c5176ce4b iproute: print addrgenmode stable_secret and fallback otherwise
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2015-12-17 17:25:04 -08:00
Daniel Borkmann fd7f9c7fd1 bpf: minor fix in api and bpf_dump_error() usage
Fix a whitespace in bpf_dump_error() usage, and also a missing closing
bracket in ntohl() macro for eBPF programs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-12-17 17:22:25 -08:00
Stephen Hemminger 741c20b024 include: update kernel headers
Current headers for net-next
2015-12-17 17:21:53 -08:00
Stephen Hemminger 00a2a1748b Merge branch 'master' into net-next 2015-12-17 17:21:15 -08:00
Paolo Abeni f0df40810f lwtunnel: fix argument parsing
Currently parse_encap_ip() does not update correctly argv/argc;
if multiple lwtunnel arguments are provided, the parsing fails after
the first one, i.e.

 ip route add 172.16.101.0/24 dev vxlan1 encap ip id 42 dst 192.168.255.1

fails with:

 Error: either "to" is duplicate, or "dst" is a garbage.

This commit addresses the issue, stepping to next argument at each iteration
of the parsing loop.

Fixes: 1e5293056a ("lwtunnel: Add encapsulation support to ip route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2015-12-17 17:16:02 -08:00
Phil Sutter ed6b8652f7 route: Fix printing of locked entries
Commit 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
accidentally reordered fprintf statements. This patch restores the
original ordering.

Fixes: 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-12-17 17:07:07 -08:00
Konstantin Khlebnikov e834eb8eba ip neigh: device is optional for proxy entries
Though dumping such entries crashes present kernels.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
2015-12-17 17:07:07 -08:00
Tom Herbert 5866bddd9a ila: Add support for ILA lwtunnels
This patch:
 - Adds a utility function for parsing a 64 bit address
 - Adds a utility function for converting a 64 bit address to ASCII
 - Adds and ILA encap type in lwt tunnels

Signed-off-by: Tom Herbert <tom@herbertland.com>
2015-12-17 17:07:07 -08:00
Daniel Borkmann 41d6e33fc9 examples, bpf: further improve examples
Improve example files further and add a more generic set of possible
helpers for them that can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-12-10 08:56:45 -08:00
Stephen Hemminger 6ad355ca9e Merge branch 'master' into net-next 2015-12-10 08:56:18 -08:00
Stephen Hemminger 654ae881de ip: fix format string when reading statistics
The tunnel code was doing sscanf(buf, "%ld", &x) where x was unsigned
long.
2015-12-10 08:52:10 -08:00
Phil Sutter b08b5ff128 tc.8: Fix reference to tc-tcindex.8
Just a typo there, it's spelled correctly in SEE ALSO section..

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-12-10 08:48:07 -08:00
David Ahern 8a23f82045 vrf: Add support for table names
Currently, the table id for VRF devices requires an integer. Convert
it to use rtnl_rttable_a2n which handles table names from the iproute2
directory.

This also fixes a bug in the original commit where table name are not
properly handled.

Fixes: 15faa0a30b ("add support for VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-12-10 08:45:30 -08:00
Nicolas Dichtel ed108cfc02 libnetlink: don't confuse variables in rtnl_talk()
There is two variables named 'len' in rtnl_talk. In fact, commit
c079e121a7 didn't work. For example, it was possible to trigger
a seg fault with this command:
$ ip link set gre2 type ip6gre hoplimit 32

Let's rename the argument len to maxlen.

Fixes: c079e121a7 ("libnetlink: add size argument to rtnl_talk")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-12-10 08:45:21 -08:00
Phil Sutter 0f7543322c route: ignore RTAX_HOPLIMIT of value -1
Older kernels use -1 internally as indicator to use the sysctl default,
but they still export the setting. Newer kernels use 0 to indicate that
(which is why the conversion from -1 to 0 was done here), but they also
stopped exporting the value. Since the meaning of -1 is clear, treat it
equally like default on newer kernels (which is to not print anything).

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-12-10 08:45:11 -08:00
Stephen Hemminger a96a5d94c6 iptunnel: cleanup code
Make iptunnel pass checkpatch (mostly).
2015-11-29 12:05:39 -08:00
Konstantin Shemyak cc9c1dfaee ip_tunnel: determine tunnel address family from the tunnel type
On 24.11.2015 02:26, Stephen Hemminger wrote:
> On Thu, 12 Nov 2015 21:10:08 +0000
> Konstantin Shemyak <konstantin@shemyak.com> wrote:
>
>> When creating an IP tunnel over IPv6, the address family must be passed in
>> the option, e.g.
>>
>> ip -6 tunnel add mode ip6gre local 1::1 remote 2::2
>>
>> This makes it impossible to create both IPv4 and IPv6 tunnels in one batch.
>>
>> In fact the address family option is redundant here, as each tunnel mode is
>> relevant for only one address family.
>> The patch determines whether the applicable address family is AF_INET6
>> instead of the default AF_INET and makes the "-6" option unnecessary for
>> "ip tunnel add".
>>
>> Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
>> ---
>>   ip/iptunnel.c                          | 26 ++++++++++++++++++++++++++
>>   testsuite/tests/ip/tunnel/add_tunnel.t | 14 ++++++++++++++
>>   2 files changed, 40 insertions(+)
>>   create mode 100755 testsuite/tests/ip/tunnel/add_tunnel.t
>>
>> diff --git a/ip/iptunnel.c b/ip/iptunnel.c
>> index 78fa988..7826a37 100644
>> --- a/ip/iptunnel.c
>> +++ b/ip/iptunnel.c
>> @@ -629,8 +629,34 @@ static int do_6rd(int argc, char **argv)
>>          return tnl_6rd_ioctl(cmd, medium, &ip6rd);
>>   }
>>
>> +static int tunnel_mode_is_ipv6(char *tunnel_mode) {
>> +       char *ipv6_modes[] = {
>> +               "ipv6/ipv6", "ip6ip6",
>> +               "vti6",
>> +               "ip/ipv6", "ipv4/ipv6", "ipip6", "ip4ip6",
>> +               "ip6gre", "gre/ipv6",
>> +               "any/ipv6", "any"
>> +       };
>> +       int i;
>> +
>> +       for (i = 0; i < sizeof(ipv6_modes) / sizeof(char *); i++) {
>> +               if (strcmp(ipv6_modes[i], tunnel_mode) == 0)
>> +                       return 1;
>> +       }
>> +       return 0;
>> +}
>> +
>
> The ipv6_modes table should be static const.

Thank you for the note! attached the corrected patch.

> Also is it possible to use strstr for ipv6 and ip6 or even strchr(tunnel_mode, '6')
> to simplify this?

There is IPv6 tunnel mode 'any', and IPv4 tunnel mode 'ipv6/ip' (aka
'sit'). It looks to me that attempts to find some substring match
would not make the code much shorter, but definitely less readable.

Konstantin Shemyak.

>From 42d27db0055c3a114fe6eb86d680bef9ec098ad4 Mon Sep 17 00:00:00 2001
From: Konstantin Shemyak <konstantin@shemyak.com>
Date: Thu, 12 Nov 2015 20:52:02 +0200
Subject: [PATCH] Tunnel address family is determined from the tunnel mode

When the tunnel mode already tells the IP address family, "ip tunnel"
command determines it and does not require option "-4"/"-6" to be passed.

This makes possible creating both IPv4 and IPv6 tunnels in one batch.

Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
2015-11-29 11:57:21 -08:00
Daniel Borkmann 0b7e3fc8f1 {f,m}_bpf: add more example code
I've added three examples to examples/bpf/ that demonstrate how one can
implement eBPF tail calls in tc with f.e. multiple levels of nesting.
That should act as a good starting point, but also as test cases for the
ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be
developed in future work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann 91d88eeb10 {f,m}_bpf: allow updates on program arrays
Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.

Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:

  tc filter add dev foo parent ffff: bpf da obj cls.o

Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):

  tc exec bpf graft m:globals/jmp obj cls.o sec 0/1

In case the program resides under a different section 'foo', it can also
be injected into the program array like:

  tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo

If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:

  tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser

In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann f6793eec46 {f, m}_bpf: allow for user-defined object pinnings
The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann 9e607f2e72 {f, m}_bpf: check map attributes when fetching as pinned
Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann 910b543dcc {f,m}_bpf: make tail calls working
Now that we have the possibility of sharing maps, it's time we get the
ELF loader fully working with regards to tail calls. Since program array
maps are pinned, we can keep them finally alive. I've noticed two bugs
that are being fixed in bpf_fill_prog_arrays() with this patch. Example
code comes as follow-up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Stephen Hemminger fece33c195 Merge branch 'master' into net-next 2015-11-29 11:53:43 -08:00
Tom Herbert 35f59d862f vxlan: Add support for remote checksum offload
This patch adds support to remote checksum checksum offload
to VXLAN. This patch adds remcsumtx and remcsumrx to ip vxlan
configuration to enable remote checksum offload for transmit
and receive on the VXLAN tunnel.

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Example:

ip link add name vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 \
    udpcsum remcsumtx remcsumrx

Testing:

Ran single netperf over mlnx4 to illustrate the effest:

- Without RCO (UDP csum set to zero)
  4335.99 Mbps
- With RCO enabled
  7661.81 Mbps

Signed-off-by: Tom Herbert <tom@herbertland.com>
2015-11-29 11:53:02 -08:00
Phil Sutter 61170fd88d get rid of unnecessary fgets() buffer size limitation
fgets() will read at most size-1 bytes into the buffer and add a
terminating null-char at the end. Therefore it is not necessary to pass
a reduced buffer size when calling it.

This change was generated using the following semantic patch:

@@
identifier buf, fp;
@@
- fgets(buf, sizeof(buf) - 1, fp)
+ fgets(buf, sizeof(buf), fp)

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:48:24 -08:00
Phil Sutter d572ed4d0a get rid of remaining -Wunused-result warnings
Although not fundamentally necessary to check return codes in these
spots, preventing the warnings will put new ones into focus.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:48:24 -08:00
Phil Sutter c29d37925a ss: review is_ephemeral()
No need to keep static port boundaries global, they are not used
directly. Keeping them local also allows to safely reduce their names to
the minimum. Assign hardcoded fallback values also if fscanf() fails.
Get rid of unnecessary braces around return parameter.

Instead of more or less duplicating is_ephemeral() in run_ssfilter(),
simply call the function instead.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:48:24 -08:00
Phil Sutter 596307ea3d ss: reduce max indentation level in init_service_resolver()
Exit early or continue on error instead of putting conditional into
conditional to make reading the code a bit easier.

Also, the call to memcpy() can be skipped by initialising prog with the
desired prefix.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:48:24 -08:00
Phil Sutter db3ef44c54 lnstat: review lnstat_update()
Instead of calling rewind() and fgets() before every call to
scan_lines(), move them into scan_lines() itself.

This should also fix compat mode, as before the second call to
scan_lines() the first line was skipped unconditionally.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:48:24 -08:00
Phil Sutter fc31817d1f bridge.8: minor formatting cleanup
- Replace commas at end of subsection with dots.
- Replace double whitespace by single one.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter ea6cbab792 iproute: restrict hoplimit values to be in range [0; 255]
Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is actually not
necessary to allow that value to be specified, but keep it anyway for
backwards compatibility.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter d81f54d599 iptoken: simplify iptoken_list a bit
Since it uses only a single filter, rtnl_dump_filter() can be used.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter 906dfe4887 ipaddress: drop unnecessary check in ipaddr_list_flush_or_save()
Right after ipaddr_reset_filter(), filter.family is always AF_UNSPEC.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter d25ec03e1d ipaddress: fix ipaddr_flush for Linux >= 3.1
Linux version 3.1 introduced a consistency check for netlink dumps in
commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
iproute2 when flushing more addresses than can fit into a single
RTM_GETADDR response. To silence the spurious error message "Dump was
interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to
not care about NLM_F_DUMP_INTR.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter 8e72880f6b libnetlink: introduce nc_flags
Allow for a filter to ignore certain nlmsg_flags.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Phil Sutter c6995c4802 ipaddress: simplify ipaddr_flush()
Since it's no longer relevant whether an IP address is primary or
secondary when flushing, ipaddr_flush() can be simplified a bit.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-29 11:47:29 -08:00
Stephen Hemminger 68ef507249 rt_names: style cleanup
Cleanup all checkpatch complaints about whitespace in rt_names.
2015-11-29 11:41:23 -08:00
David Ahern 13ada95da4 Add support for rt_tables.d
Add support for reading table id/name mappings from rt_tables.d
directory.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-11-29 11:29:31 -08:00
John W. Linville 906ac5437a geneve: add support for IPv6 link partners
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-11-23 16:23:11 -08:00
John W. Linville 6581df5ef3 geneve: add support for IPv6 link partners
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-11-23 16:21:55 -08:00
Daniel Borkmann 32e93fb7f6 {f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-23 16:10:44 -08:00
Neil Horman e149d4e843 iproute2: Ignore EADDRNOTAVAIL errors during address flush operation
I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>

would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)

Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address.  But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed.  That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.

It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
2015-11-23 15:59:08 -08:00
Phil Sutter 6e2e2cf03a bridge.8: document fdb replace command
Despite commit 45a82e5 ("iproute vxlan add support for fdb replace
command"), the 'fdb replace' command was not mentioned in bridge.8.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:58:07 -08:00
Phil Sutter fdb347f7fd lnstat: fix header displaying mechanism
The algorithm depends on the loop counter ('i') to increment by one in
each iteration. Though if running endlessly (count==0), the counter was
not incremented at all.

Also change formatting of the header printing conditional a bit so it's
hopefully easier to read.

Fixes: e7e2913 ("lnstat: run indefinitely by default")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:54:05 -08:00
Phil Sutter 869fcabecc lnstat: describe -s option in help output
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:54:05 -08:00
Stephen Hemminger 0198930b55 update kernel headers to 4.4-rc1
Post merge window changes
2015-11-23 15:53:04 -08:00
Phil Sutter f7b49a3fc7 ip_common.h header cleanup
- Drop 'extern' keyword from all function prototypes.
- Make line breaking of print_* functions consistent.
- Make print_ntable() and ipntable_reset_filter() static and remove
  their declaration.
- Drop declaration of non-existent ipaddr_list() and iproute_monitor().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:44:03 -08:00
Stephen Hemminger 23d6c997d9 misc: remove extra blank line 2015-11-23 15:42:34 -08:00
Stephen Hemminger 5699275b42 man8: scrub trailing whitespace
Remove extraneous whitespace
2015-11-23 15:41:37 -08:00
Ville Skyttä ac0817ef66 man: Spelling fixes
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
2015-11-23 15:39:25 -08:00
Ville Skyttä 85e3c87c82 man: Syntax and warning fixes
Fix syntax issues and warnings highlighted by `man --warnings=w' from
man-db 2.7.1.

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
2015-11-23 15:39:25 -08:00
Phil Sutter 04ce8d3eda ip{,6}tunnel: put spaces around non-unary operators
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter f53ecee818 iptunnel: sanitize copying tunnel name
Since p->name is only IFNAMSIZ bytes, do not copy more than IFNAMSIZ - 1
bytes into it so there remains at least a single null byte in the end.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter c957821b18 iptunnel: share common code when determining the default interface name
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 0dd4d2b37f iptunnel: simplify parsing TTL, allow 'hlim' as identifier
Instead of parsing an unsigned integer and checking boundaries, simply
parse u8. This and the added ttl alias 'hlim' provide consistency with
ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 2520598a1a iptunnel: share common code when setting tunnel mode
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 7894ce7722 ip6tunnel: fix coding style: no newline between brace and else
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 9af72f819e ip6tunnel: print local/remote addresses like iptunnel does
This makes output consistent with iptunnel, also supporting reverse DNS
lookup for remote address if requested.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter c4527d7ba3 ip{,6}tunnel: align do_tunnels_list() a bit
In iptunnel, declare loop variables inside the loop as done in
ip6tunnel.

Fix and simplify goto logic in ip6tunnel:
- Failure to read over header lines would have left fp opened.
- By returning directly upon fopen() failure, fp can be closed
  unconditionally in the end.

Use the same goto logic in iptunnel, as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 4b3cb96281 iptunnel: use ll_name_to_index() for physical interface lookup
Although the cache is only initialized in do_show(), this way it is at
least consistent with ip6tunnel.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 6ddb1e8c90 ip{, 6}tunnel: unify behaviour if physical device is not found
Make ip6tunnel print an error message as well. While there, get rid of
unnecessary line breaking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter a7ed1520ee ip/tunnel: introduce tnl_parse_key()
Instead of duplicating the same code six times (key, ikey and okey in
iptunnel and ip6tunnel), have a common parsing routine. This has the
added benefit of having the same verbose error message in ip6tunnel as
well as iptunnel.

I'm not sure if parsing an IPv4 address as key makes sense for
ip6tunnel, but the code was there before so this patch at least doesn't
make it worse.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Phil Sutter 8de592d05c ip{, 6}tunnel: get rid of extraneous whitespace when printing
Put whitespace in the beginning of optional parts, not as suffix
anywhere. Also drop double whitespaces in between words.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-23 15:26:37 -08:00
Aaro Koskinen caf8875b3c misc/Makefile: use PKG_CONFIG
Use PKG_CONFIG from Config - it works better when cross-compiling.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
2015-11-23 15:25:50 -08:00
Stephen Hemminger 115b4d8873 Merge branch 'master' into net-next 2015-11-03 16:38:15 -08:00
Stephen Hemminger 6720eceff7 v4.3.0 2015-11-03 16:34:46 -08:00
Stephen Hemminger 1e5aa99024 Merge branch 'master' into net-next 2015-11-03 16:31:57 -08:00
Phil Sutter b5bb1820e8 lib/utils: improve error messages of get_addr() and get_prefix()
Instead of statically complaining about illegal inet address, use
get_family() to get the address family right.

Based on a patch by Hangbin Liu to print "inet6" for AF_INET6 made more
generic by me.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-11-03 16:28:36 -08:00
Phil Sutter bd5bbad450 bridge: fdb: minor syntax fix in help text 2015-11-03 16:27:39 -08:00
Phil Sutter 17c53fcd2c ifstat: add manpage 2015-11-03 16:27:39 -08:00
Phil Sutter 7124942942 genl: add manpage 2015-11-03 16:27:39 -08:00
Phil Sutter 958cd21094 ifcfg: add manpage 2015-11-03 16:27:39 -08:00
Stephen Hemminger 037660b351 qfq: fix parse_opt dead code
Fix Coverity warning from dead code.
2015-10-27 15:46:20 +09:00
Stephen Hemminger dddf1b4412 add new IFLA_VF_TRUST netlink attribute 2015-10-23 15:47:07 -07:00
Stephen Hemminger 86c392f958 Merge branch 'master' into net-next 2015-10-23 15:46:08 -07:00
Stephen Hemminger 1473bda921 misc: cleanup extra whitespace
No blank lines at end of file
2015-10-23 15:44:30 -07:00
Stephen Hemminger 753ef5bbd6 tc: remove extra whitespace
No blank lines at EOF, or trailing whitespace.
2015-10-23 15:43:28 -07:00
Stephen Hemminger f7520a1998 ip: remove extra newlines at end-of-file
Shouldn't have extra blank lines.
2015-10-23 15:41:58 -07:00
Phil Sutter a257bc7b4c tc: ship filter man pages and refer to them in tc.8
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:39:28 -07:00
Phil Sutter f15a23966f tc: add a man page for u32 filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:39:28 -07:00
Phil Sutter fc7a72f1eb tc: add a man page for tcindex filter
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 02dddd6110 tc: add a man page for route filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 49891ba177 tc: add a man page for fw filter
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter b3aa12a401 tc: add a man page for flower filter
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 334ddc9b4d tc: add a man page for flow filter
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 5774f09ee8 tc: add a man page for cgroup filter
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 55b35567ad tc: add a man page for basic filter
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 40eb737ebb tc: u32 filter coding style cleanup
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter 0a83e1eaf7 tc: improve filter help texts a bit
This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Stephen Hemminger c518d3a7f7 update bpf kernel header 2015-10-22 23:43:35 -07:00
Stephen Hemminger 651dccbee7 Merge branch 'master' into net-next 2015-10-22 23:42:37 -07:00
Daniel Borkmann d583e88ebc ip, realms: also allow to pass in raw realms value
If get_rt_realms() fails, try to get a possible raw u32 realms
value for the u32 RTA_FLOW/FRA_FLOW attribute, as it might be
useful to directly configure the hex value itself. And only if
that fails, then bail out.

The source realm is provided in the upper u16 (mask: 0xffff0000)
and the destination realm through the lower u16 part (mask:
0x0000ffff). This can be useful for tc's bpf realm matcher, but
also a full hex/mask param can be provided already for matching
through iptables' --realm cmdline option, for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-10-22 23:40:51 -07:00
Stephen Hemminger 89bb4c6aca update kernel headers
Track upstream
2015-10-22 23:36:49 -07:00
Kirill Tkhai 2f4e171f7d Add ip rule save/restore
This patch adds save and restore commands to "ip rule"
similar the same is made in commit f4ff11e3e2 for "ip route".

The feature is useful in checkpoint/restore for container
migration, also it may be helpful in some normal situations.

Signed-off-by: Kirill Tkhai <ktkhai@odin.com>
2015-10-22 23:35:57 -07:00
Stephen Hemminger b89c359c15 Merge branch 'master' into net-next 2015-10-18 21:58:29 -07:00
Roopa Prabhu 8b21cef129 ip route get: change exit to return to support batch commands
replace exit with return -2 on rtnl_talk failure

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-10-18 21:57:46 -07:00
Wilson Kok 4d45bf3baf bridge: add calls to fflush in fdb and mdb print functions
This patch adds fflush in fdb and mdb print functions

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-10-18 21:57:06 -07:00
Phil Sutter ccaf6eb5cc ip-rule: neither prohibit nor reject or unreachable flags exist
This has been inconsistent since the beginning of Git and seems to be
merely a documentation leftover, therefore just remove it from help
output and man page.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-18 21:57:01 -07:00
Phil Sutter f73105ab42 ss: return -1 if an unrecognized option was given
When getopt_long encounters an option which has not been registered, it
returns '?'. React upon that and call usage() instead of help() so ss
returns with a non-zero exit status.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-18 21:56:55 -07:00
Roopa Prabhu 70e4663472 ip-route man: add usage and description for lwtunnel encap attributes
This patch updates ip-route man page with lwtunnel encap
usage and description, covering MPLS and IP encapsulation.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
2015-10-16 16:13:32 -07:00
Roopa Prabhu 1e5293056a lwtunnel: Add encapsulation support to ip route
This patch adds support to parse and print lwtunnel
encapsulation attributes attached to routes for MPLS
and IP tunnels.

example:
Add ipv4 route with mpls encap attributes:

Examples:

  MPLS:
  $ ip route add 40.1.2.0/30 encap mpls 200 via inet 40.1.1.1 dev eth3
  $ ip route show
  40.1.2.0/30  encap mpls 200 via 40.1.1.1 dev eth3

  Add ipv4 multipath route with mpls encap attributes:
  $ ip route add 10.1.1.0/30 nexthop encap mpls 200 via 10.1.1.1 dev eth0 \
		    nexthop encap mpls 700 via  40.1.1.2 dev eth3
  $ ip route show
  10.1.1.0/30
    nexthop encap mpls 200  via 10.1.1.1  dev eth0 weight 1
    nexthop encap mpls 700  via 40.1.1.2  dev eth3 weight 1

  IP:
  $ ip route add 10.1.1.1/24 encap ip id 200 dst 20.1.1.1 dev vxlan0

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
2015-10-16 16:13:22 -07:00
Stephen Hemminger e569c5c0fd add tunnel header files from net-next uapi
Files needed for new lwtunnel code.
2015-10-16 16:13:05 -07:00
Stephen Hemminger c6646c1ea5 Merge branch 'master' into net-next 2015-10-16 16:03:32 -07:00
Phil Sutter 6f07f3dc41 ip-address: fix oneline mode for interfaces with VF
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-16 16:02:38 -07:00
Roopa Prabhu 39ca4879a0 ip monitor neigh: Change 'delete' to 'Deleted' to be consistent with ip route
It helps to grep for one string "Deleted" when monitoring all events.

Fixes: 6ea3ebafe0 ("iproute2: inform user when a neighbor is removed")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-10-16 16:01:34 -07:00
Roopa Prabhu 303cc9cbee libnetlink: introduce rta_nest and u8, u16, u64 helpers for nesting within rtattr
This patch introduces two new api's rta_nest and rta_nest_end to
nest attributes inside a rta attribute represented by 'struct rtattr'
as required to construct a nexthop. Also adds rta_addattr* variants
for u8, u16 and u64 as needed to support encapsulation.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
2015-10-16 16:00:47 -07:00
Stephen Hemminger d2ccb70a91 Merge branch 'master' into net-next 2015-10-12 09:50:46 -07:00
willy tarreau 0ee9052f1b fix "ss -p" segfaults
I've updated Jose's patch to make it slightly simpler (eg: calloc instead
of malloc+memset), and ported it to 4.2.0 which requires it as well, and
attached it to this e-mail.

I can confirm that with this patch 4.1.1 doesn't segfault on me anymore.
The commit message should be reworked I guess though everything's in it
and I didn't want to modify his description.

Can it be merged as-is or should I reword the commit message and reference
Jose as the fix reporter ? We should not let this bug live forever.

From: "j.ps@openmailbox.org" <j.ps@openmailbox.org>

Essentially all that is needed to get rid of this issue is the
addition of:

    memset(u, 0, sizeof(*u));

after:

    if (!(u = malloc(sizeof(*u))))
            break;

Also patched some other situations (strcpy and sprintf uses) that
potentially produce the same results.

Signed-off-by: Jose P Santos <j.ps@openmailbox.org>

[ wt: made Jose's patch slightly simpler, all credits to him for the diag ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2015-10-12 09:49:06 -07:00
Phil Sutter a60223bc1c man: ip-link: document MACVLAN/MACVTAP interface types
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-12 09:46:55 -07:00
Phil Sutter 3cf8ba5960 ip: macvlan: support MACVLAN_FLAG_NOPROMISC flag
This flag is allowed for devices in passthru mode to prevent forcing the
underlying interface into promiscuous mode.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-12 09:46:55 -07:00
Phil Sutter 541f1b3e1d ip: link: consolidate macvlan and macvtap
After eliminating the minor differences in both files which existed
solely because features/fixes were applied to only one of them and not
the other, the remaining differences were in function naming and error
messages. The latter is addressed by using the 'id' field of struct
link_util.

Fold both files into one in order to share common code and eliminate the
chance of having fixes/enhancements applied to only one of them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-12 09:46:55 -07:00
Daniel Borkmann 343dc90854 m_bpf: don't require default opcode on ebpf actions
After the patch, the most minimal command to load an eBPF action
for late binding with auto index selection through tc is:

  tc actions add action bpf obj prog.o

We already set TC_ACT_PIPE in tc as default opcode, so if nothing
further has been specified, just use it. Also, allow "ok" next to
"pass" for matching cmdline on TC_ACT_OK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-10-12 09:44:52 -07:00
David Ahern b8c753245b ip neigh: Add ifindex to request when filtering dumps by device
Add ifindex to dump request when filtering by device. If the kernel
supports it adding the index to the request limits the amount of data
the kernel pushes to userpsace.

The feature exists in userspace already, so no need to warn the user
if kernel side support does not exist. Using the kernel side filter
makes the request more efficient.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-10-12 09:43:28 -07:00
Daniel Borkmann faa8a46300 f_bpf: allow for optional classid and add flags
When having optional classid, most minimal command can be sth
like:

  tc filter add dev foo parent X: bpf obj prog.o

Therefore, adapt the code so that a next argument will not be
enforced as the case currently.

Also, minor cleanup on the classid, where we should rather
have used addattr32(), and add flags for exec configuration,
for example (using short notation):

  tc filter add dev foo parent X: bpf da obj prog.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-10-12 09:41:05 -07:00
David Ahern 0d238ca2b8 ip neigh: Add support for filtering dumps by master device
Add support for filtering neighbor dumps by master device. Kernel side
support provided by commit 21fdd092acc7. Since the feature is not
available in older kernels the user is given a warning message if the
kernel does not support the request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-10-12 09:39:37 -07:00
Stephen Hemminger 23e905096c update kernel headers for net-next 2015-10-12 09:34:18 -07:00
Stephen Hemminger cf5b002f20 Merge branch 'master' into net-next 2015-10-12 09:32:14 -07:00
Satish Ashok 25bc3d3d4a ip, bridge: document -timestamp option
This patch documents bridge and ip -timestamp option

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
2015-10-12 09:28:55 -07:00
Wilson Kok 9de8c6d976 bridge: add batch command support
This patch adds support to batch bridge commands.
Follows ip batch code.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Christophe Gouault <christophe.gouault@6wind.com>
2015-10-12 09:24:15 -07:00
Stephen Hemminger 6b53cb66e8 update kernel headers 2015-10-12 09:22:29 -07:00
Christophe Gouault 39e3d3836c batch: support quoted strings
Support quoting strings with " or ' in an iproute2 batch file.

Enables to configure empty crypto keys (for ESP-null) or keys with
spaces:

    xfrm state add src 1.1.1.1 dst 2.2.2.2 proto ah spi 0x1 \
        mode tunnel auth hmac(sha1) "r4ezR/@kd6'749f2 6zf$"

    xfrm state add src 5.5.5.5 dst 2.2.2.2 proto esp spi 0x2 \
        mode tunnel enc cipher_null ""

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
2015-10-07 10:35:25 +01:00
Christoph Schulz 8aacb9bbbd ip: allow using a device "help" (or a prefix thereof)
Device names that match "help" or a prefix thereof should be allowed anywhere
a device name can be used. Note that a suitable keyword ("dev" or "name", the
latter for "ip tunnel") has to be used in these cases to resolve ambiguities.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Reported-by: Leonhard Preis <leonhard@pre.is>
Reported-by: Wilhelm Wijkander <lists@0x5e.se>
2015-10-07 10:35:17 +01:00
Stephen Hemminger 09a50f420b add tipc manpages to Makefile 2015-10-07 10:33:39 +01:00
Richard Alpe dcd8d142d2 tipc: add man pages
This patch adds man pages for the TIPC tool. There is one main page
and one page for each top level sub-command. These pages mainly aims
to help a user of the tipc tool. In addition to this they describe
a bit about what TIPC is and some of its features as a protocol.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
2015-10-07 10:31:34 +01:00
Stephen Hemminger 8fe9839857 fq: fix whitespace 2015-09-25 12:40:00 -07:00
Eric Dumazet 8d5bd8c302 tc: fq: allow setting and retrieving orphan_mask
linux-3.19 fq packet scheduler got a new attribute, controlling
number of 'flows' holding packets not attached to a socket
(forwarding usage)

kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7
("pkt_sched: fq: better control of DDOS traffic")

This patch adds corresponding code to tc command.

tc qd replace dev eth0 root fq orphan_mask 511

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-09-25 12:37:09 -07:00
Dan Webster a8e35427fb ss: fix file-based filtering segfault
Commit 1527a17 introduced a change where the second of two ssfilter_parse()
calls in ss.c was moved outside of a conditional block (ss.c: ~3575). This
commit enabled the parsing of services, such as 'sport = :ssh', but
inadvertently broke the '-F' file-based filtering:
2015-09-25 12:36:43 -07:00
Florian Westphal 484b3f922c man: tc: add man page for fq pacer
Partially based on kernel Kconfig help text, code comments and
git commit messages from Eric Dumazet.

Joint work with Phil Sutter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2015-09-25 12:36:16 -07:00
Eric Dumazet 32a6fbe563 tc : add timestamps to tc monitor
Support -timestamp and -tshort options for tc monitor like ip monitor.

# tc -tshort monitor
[2015-09-23T16:39:11.260555] qdisc fq 8003: dev eth0 root refcnt 2 limit
10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140
refill_delay 40.0ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-09-25 12:35:46 -07:00
David Ahern 84d30afd8a ip: Add type and master filters to brief output
The brief format does not honer the master and type filters:

$ ip link show master vrf-mgmt
7: dummy0: <BROADCAST,NOARP,SLAVE> mtu 1500 qdisc noop master vrf-mgmt state DOWN mode DEFAULT group default qlen 1000
    link/ether 66:39:cc:2b:e9:bd brd ff:ff:ff:ff:ff:ff

$ ip -br link show master vrf-mgmt
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0             UP             08:00:27🇩🇪14:c8 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1             UP             08:00:27:87:02:f1 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth2             UP             08:00:27:61:1e:fd <BROADCAST,MULTICAST,UP,LOWER_UP>
vrf-blue         UNKNOWN        a6:3f:09:34:7e:74 <NOARP,MASTER,UP,LOWER_UP>
vrf-red          DOWN           fe:a2:2d:e1:bc:ac <NOARP,MASTER>
dummy0           DOWN           66:39:cc:2b:e9:bd <BROADCAST,NOARP,SLAVE>
dummy1           DOWN           4a:4f:13:91:64:b1 <BROADCAST,NOARP,SLAVE>
dummy2           DOWN           b2:4f:b6💿bd:a6 <BROADCAST,NOARP>
dummy3           DOWN           1e:06:3d:40:b8:c2 <BROADCAST,NOARP,SLAVE>
vrf-mgmt         DOWN           ce:b2:74:41:21:df <NOARP,MASTER>

With this patch the expected output is shown:

$ ip -br link show master vrf-mgmt
dummy0           DOWN           66:39:cc:2b:e9:bd <BROADCAST,NOARP,SLAVE>

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-09-23 16:27:52 -07:00
David Ahern bc234301af ip route: Add RTM_F_LOOKUP_TABLE flag and show table id
Currently 'ip route get' does not show the table the lookup result comes
from and prior to kernel commit c36ba6603a11 the response from the kernel
was hardcoded to the main table. From the discussion this appears to be
a leftover from the route cache where the cached entry lost the table id
and so the result was hardcoded to main table.

c36ba6603a11 added the RTM_F_LOOKUP_TABLE flag to maintain that behavior
but to allow new tools to ask for the actual table id for the lookup.
This patch adds that flag to ip route get request and if the result is
not the main table shows the table id.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-09-23 16:18:56 -07:00
Stephen Hemminger 4e39bfb93a update kernel headers to 4.3 net-next 2015-09-23 16:18:34 -07:00
Andrew Vagin 5b9ac19029 route: filter routes by family if it's specified
Currently when we specify AF_INET6 when it is disabled, we will get
all routes.

For example, we can boot kernel with ipv6.disable=1 and try to get ipv6
routes:
$ ip -6 route show
default via 192.168.122.1 dev eth0  proto static  metric 100
192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.141  metric 100

Here are ipv4 routes and this is unexpected behaviour.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
2015-09-23 16:16:19 -07:00
Vadim Kochan 6c19ff10b5 man tc-htb: Fix HRB -> HTB typo
Changed HRB -> HTB.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-09-23 16:16:14 -07:00
Vadim Kochan 79c7078e3b man ip-link: Fix wording in VLAN reorder_hdr explanation
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Acked-by: Jeremy Harris <jgh@redhat.com>
2015-09-23 16:08:43 -07:00
Phil Sutter 565af7b816 tc: fq: allow setting and retrieving flow refill delay
Code to parse and export this tuneable via netlink is already present in
sched_fq.c of the kernel, so not making it accessible for users would be
a waste of resources.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 16:02:13 -07:00
Phil Sutter f171b85888 man: tc.8: mention available qdiscs
Some qdiscs still lack a manpage, so listing them here is the only way
for a user to get to know them. For the others, this serves as an
overview of what is there.

Content was taken over from the dedicated manpage if available and
suitable, so there is definitely room for improvement at least by
adjusting it more to the context in which it is now. In case there
wasn't appropriate wording available, I tried to identify key aspects of
the given qdisc.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 16:01:39 -07:00
Phil Sutter 940a96e6ca ip-link: do not support 'ip link add dev help'
Commit 0532555 ('Support "ip link add help" for rtnl_link API') added a
check for specified help parameter. Though due to the place where it has
been added to, it is not possible anymore to force a given parameter to
be interpreted as interface name by prefixing it with 'dev '. Fix this
by forcing whatever follows 'dev' to be presumed as interface name.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 16:00:48 -07:00
Phil Sutter e4ef49a465 man: rtpr: add minimal manpage
While there is not much to explain about this rather trivial shell
script, having a manpage for it serves as good point of reference for
users wondering what it might be for.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Phil Sutter f3737abf8c man: lnstat: rewrite manpage
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Phil Sutter f7afa99952 man: ip-address: document mngtmpaddr and noprefixroute flags
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Phil Sutter 5c32fa1d69 comment: Fix remaining listings of wrong FSF address
This patch follows the changes of commit 4d98ab0 ("Fix FSF address in
file headers"), fixing file headers added after it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Phil Sutter 0b17394087 man: ip-address: align synopsis with help output
When fixing the BNF syntax error, I overlooked that 'ip address help'
prints a more correct synopsis. This patch aligns them.

Fixes: 715296b ("ip-address.8.in: fix BNF syntax error")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Phil Sutter 9b2e9f4a8c man: ip: add -h[uman-readable] option
Since 'ip help' lists it, it should be described in ip.8 as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:57:50 -07:00
Vadim Kochan a25df4887d configure: Check for Berkeley DB for arpd compilation
Add check for Berkeley DB header & lib before compile arpd util.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-09-21 14:38:38 -07:00
Arthur Gautier a197883432 ip link: missing options in bond usage
Signed-off-by: Arthur Gautier <baloo@gandi.net>
2015-09-21 14:36:02 -07:00
Stephen Hemminger 3649d01895 l2tp: add missing newline on show output
After cookie there was no newline.
2015-09-11 15:26:58 -07:00
Mike Saal 4fcfb6bc71 ss format bug
Hi:

I found a formatting bug in the 4.1.1 ss command. The following line was
incorrectly output due to passing a negative length to printf() when
displaying the local address. In this instance hostapd does a "bind to
device" on cdreth0 and then does a udp "in address any" port 67 bind.
Please note the whitespace between the '*' and ' %cdreth0:67'

    'udp UNCONN 0 0 ** %cdreth0:67* *:* users:(("hostapd",pid=19241,fd=5))'

Attached is my patch for the bug fix, it might be prudent to add more
guard code looking for negative length format codes.

Sincerely, Mike
2015-09-09 08:17:42 -07:00
Denis Kirjanov 9827fa57da iproute: print more verbose error on route cache flush
Before:
kda@vfirst ~/devel/iproute2 $ ./ip/ip route flush cache
Cannot open "/proc/sys/net/ipv4/route/flush"

After:
kda@vfirst ~/devel/iproute2/ip $ ./ip route flush cache
Cannot open "/proc/sys/net/ipv4/route/flush": Permission denied

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
2015-09-07 11:10:22 -07:00
Toshiaki Makita 1eea5c46ec iplink: Add support for IFLA_BR_VLAN_PROTOCOL attribute
This patch adds support for bridge vlan_protocol.

Example:
$ ip link set br0 type bridge vlan_protocol 802.1ad
$ ip -d link show br0
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default qlen 1000
    link/ether 44:37:e6🆎cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 0 hello_time 200 max_age 2000 ageing_time 30000
stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1ad
addrgenmode eui64

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
2015-08-31 16:35:25 -07:00
Stephen Hemminger 8d62f3e294 update kernel headers to 4.2-net-next 2015-08-31 16:35:00 -07:00
Stephen Hemminger f1e225beef Merge branch 'master' into net-next 2015-08-31 16:32:10 -07:00
Stephen Hemminger ec4ef6aebd v4.2.0 2015-08-31 16:31:15 -07:00
Konstantin Shemyak 6eca32ec6f add 'vti'/'vti6' tunnel modes to ip-tunnel manual page
* "vti" and "vti6" tunnel modes added to ip-tunnel.8 manual page
* Added "hoplimit" terminology for IPv6
* Corrected usage line
* Minor language fix
2015-08-31 16:28:37 -07:00
Andy Gospodarek 5d295bb8e1 add support for brief output for link and addresses
This adds support for slightly less output than is normally provided by
'ip link show' and 'ip addr show'.  This is a bit better when you have a
host with lots of interfaces.  Sample output:

$ ip -br link show
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
p2p1             UP             08:00:27:ee:0b:3b <BROADCAST,MULTICAST,UP,LOWER_UP>
p7p1             UP             08:00:27:9d:62:9f <BROADCAST,MULTICAST,UP,LOWER_UP>
p8p1             DOWN           08:00:27:dc:d8:ca <NO-CARRIER,BROADCAST,MULTICAST,UP>
p9p1             UP             08:00:27:76:d9:75 <BROADCAST,MULTICAST,UP,LOWER_UP>
p7p1.100@p7p1    UP             08:00:27:9d:62:9f <BROADCAST,MULTICAST,UP,LOWER_UP>

$ ip -br -4 addr show
lo               UNKNOWN        127.0.0.1/8
p2p1             UP             192.168.56.2/24
p7p1             UP             70.0.0.1/24
p8p1             DOWN           80.0.0.1/24
p9p1             UP             10.0.5.15/24
p7p1.100@p7p1    UP             200.0.0.1/24

$ ip -br -6 addr show
lo               UNKNOWN        ::1/128
p2p1             UP             fe80::a00:27ff:feee:b3b/64
p7p1             UP             7000::1/8 fe80::a00:27ff:fe9d:629f/64
p8p1             DOWN           8000::1/8
p9p1             UP             fe80::a00:27ff:fe76:d975/64
p7p1.100@p7p1    UP             fe80::a00:27ff:fe9d:629f/64

$ ip -br addr show p7p1
p7p1             UP             70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64

v2: Now with color support!
v3: Better field width estimation (except netdev names to keep output at a
decent width) and whitespace fixup.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
2015-08-31 16:24:10 -07:00
Stephen Hemminger 6c5ffb9a2c iplink: cleanup whitespace and checkpatch issues
Mostly just use of {} and whitespace.
2015-08-25 15:57:04 -07:00
Vadim Kochan ab872442bd man ip-link: Add little explanations about VLAN qos map
Add little more info about how to manually set priority by iptables,
and some little clarifications about ingress/egress QoS mapping.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-08-25 13:50:52 -07:00
David Ahern 15faa0a30b add support for VRF device
Allow user to create a vrf device and specify its table binding.
Based on the iplink_vlan implementation.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2015-08-23 10:14:16 -07:00
Stephen Hemminger 75d67d356e update kernel headers to 4.2-net-next 2015-08-23 10:10:44 -07:00
Stephen Hemminger dfc3d015f6 Merge branch 'master' into net-next 2015-08-23 10:09:46 -07:00
Stephen Hemminger fcc16c2287 provide common json output formatter
Formatting JSON is moderately painful.
Provide a simple API to do the syntax formatting.
2015-08-23 10:05:29 -07:00
Vadim Kochan e612883c45 man ip-link: Add more explanation about vlan reordering
Add more explanation about VLAN reordering and what it affects.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-08-19 16:37:08 -07:00
Phil Sutter 715296b85a ip-address.8.in: fix BNF syntax error
The previous man page fixup introduced a syntax error due to missing
opening bracket, which might crash some humanoid BNF parsers.

Fixes: 4e972d5 ("ip-address: fix and extend documentation")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-19 16:32:56 -07:00
Phil Sutter 9e5ba07f49 lib/namespace: fix fd leakage in non-error case
My previous patch 5950ba9 ("lib/namespace: don't leak fd in error case")
was a step in the wrong direction. Instead of closing the opened file
descriptor in error case only, follow a better approach here and close
the fd as soon as it is not used anymore. This way the inelegant goto
statements can be dropped, and the fd leak in non-error case is fixed as
well.

Fixes: 5950ba9 ("lib/namespace: don't leak fd in error case")
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-19 16:32:56 -07:00
Zhang Shengju 6843d36e3d ip-link: cut one level indentation
Cut one level indentation to make things easier to read.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-19 16:09:09 -07:00
Stephen Hemminger 9a6422c243 Merge branch 'master' into net-next 2015-08-13 19:42:41 -07:00
Zhang Shengju e3c27c2db6 utils: add missing return value
Add missing return value to fix warnings

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-13 19:41:48 -07:00
Stephen Hemminger e0f229fb75 bond: fix return after invarg 2015-08-13 14:20:54 -07:00
Stephen Hemminger bcb4a7aa5b tc: fix return after invarg 2015-08-13 14:20:40 -07:00
Zhang Shengju 6a9ce30e78 ip-link: remove unnecessary return
Remove unnecessary retrun, because invarg() exit.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-13 14:12:33 -07:00
Pavel Šimerda 4e972d5ef4 ip-address: fix and extend documentation
* Improve manual page synopsis and built-it help
 * Use full subcommand names (e.g. 'address' and 'maddress')
 * Specify when IPv4, IPv6 or both are affected
 * Add lifetimes, home and nodad
 * Remove any remaining excess spaces

Commit 43d29f7 substantially improves generated ip-address.8 instead of
ip-address.8.in and commit e419f2d removes the generated one losing the
improvements entirely. This commit recovers the lost changes, adapts
them to the current manual page and adds more man page and help
improvements.

Original commit by: Kenyon Ralph <kenyon@kenyonralph.com>
2015-08-13 14:11:09 -07:00
Pavel Šimerda 503aa4e20d ip-link: fix and extend documentation
* Add `can` to list of supported link types
 * Document `addrgenmode`
 * Document `link-netnsid`
 * Document VLAN link type
 * Improve VXLAN link type documentation
    - Fix VXLAN srcport/dstport docs
    - Document `udpcsum`, `udp6zerocsumtx` and `udp6zerocsumrx`
2015-08-13 14:11:09 -07:00
Pavel Šimerda 142434dc51 ip: fix and extend documentation
* Use unabbreviated `address` and `maddress`
 * Keep only `-n` and `-netns` for network namespace
2015-08-13 14:11:09 -07:00
Zhang Shengju ff1e35edf5 ip-link: enhance prompt message
Enhance promtp message for 'spoofchk' and 'query_rss' flag, and fix a
typo.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-13 14:10:22 -07:00
Stephen Hemminger 892e21248c remove unnecessary extern
No need for extern on function prototypes.
2015-08-13 14:09:58 -07:00
Phil Sutter a02371fb38 misc/ss: fix memory leak in user_ent_hash_build()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:23:47 -07:00
Phil Sutter 5950ba914e lib/namespace: don't leak fd in error case
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:23:47 -07:00
Phil Sutter b95d28c380 misc/ss: add missing fclose() calls
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:23:47 -07:00
Phil Sutter 532ca40a52 misc/ss: simplify buffer realloc, fix checking realloc failure
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:23:47 -07:00
Phil Sutter e0dce0e5dc misc/ss: avoid NULL pointer dereference
This was working before, but only if realloc a) succeeded and b) did not
move the buffer to a different location. ''**buf = **new_buf' then
writes the value of *new_buf's first field into that of *buf.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:23:47 -07:00
Nikolay Aleksandrov e4d456f0bb iplink: add support for IFLA_BR_VLAN_FILTERING attribute
This patch implements support for the IFLA_BR_VLAN_FILTERING attribute
in iproute2 so it can enable/disable vlan_filtering.

Example:
$ ip link set br0 type bridge vlan_filtering 1
$ ip -d link show br0
6: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP mode DEFAULT group default
    link/ether 08:00:27:ea:07:38 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 vlan_filtering 1
    addrgenmode eui64

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-08-12 09:19:06 -07:00
Stephen Hemminger 8fcba79ed5 update header files from 4.2 net-next 2015-08-12 09:18:04 -07:00
Stephen Hemminger 941c509906 Merge branch 'master' into net-next 2015-08-12 09:14:48 -07:00
Nikolay Aleksandrov fdba05155c iplink: add ageing_time, stp_state and priority for bridge
When showing bridge attributes, show also ageing_time, stp_state and
priority if available.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-08-12 09:11:30 -07:00
Stephen Hemminger 4b942cb1df Merge branch 'master' into net-next 2015-08-12 09:09:43 -07:00
Zhang Shengju e543a6a8a0 ip-link: fix a typo in help message
fix a typo: "noarp" -> "arp"

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-12 09:05:57 -07:00
Zhang Shengju a560d850d9 iplink: shortify printing the usage of link type
Allow to print link type usage by: ip link help bridge_slave

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-12 09:05:57 -07:00
Zhang Shengju 43367ef7eb iplink: use the short format to print help info
Allow to print link type usage by: ip link help bridge

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-12 09:05:57 -07:00
Zhang Shengju d8cf93de04 iplink: add missing link type
Add missing link type "bridge_slave".

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-08-12 09:05:57 -07:00
Phil Sutter e52f3ef711 ip-link: fix minor typo in manpage
Change '-human-readble' to '-human-readable'.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-12 09:01:46 -07:00
Stephen Hemminger 2f29d6bb50 ipnetns: make net namespace cache variable size
Save some space by using variable size for nsid cache elements.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-08-12 08:53:31 -07:00
Stephen Hemminger e4a852f8c8 Merge branch 'master' into net-next 2015-08-10 11:27:35 -07:00
Phil Sutter 7f9dddbe7d misc/ss: don't imply -a when -A was specified
Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-08-10 11:23:39 -07:00
Nikolay Aleksandrov d02e46627f iplink: bonding: add support for IFLA_BOND_TLB_DYNAMIC_LB
Add support to be able to set and show the value of tlb_dynamic_lb
(IFLA_BOND_TLB_DYNAMIC_LB).
Example:
$ ip -d link show dev bond0 type bond
7: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default
    link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1
addrgenmode eui64

$ ip -d l set dev bond0 type bond tlb_dynamic_lb 0
$ ip -d link show dev bond0 type bond
7: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default
    link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1
arp_interval 0 arp_validate none arp_all_targets any primary_reselect
always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1
num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 0
addrgenmode eui64

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-08-10 11:22:09 -07:00
Stephen Hemminger 4f3489cd58 update to net-next (4.3) headers 2015-08-10 11:21:20 -07:00
Daniel Borkmann baed90842a m_bpf: add frontend support for late binding
Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly
support late binding of bpf action to a classifier").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-08-10 11:19:11 -07:00
Richard Alpe ee262ed2ad tipc: fix bearer get/set help synopsis
One option is required for bearer set and bearer get.
2015-08-10 11:18:01 -07:00
Stephen Hemminger 089d93d6f2 update kernel headers from net-next
Align with upstream kernel.
2015-07-31 18:13:56 -07:00
Nikolay Aleksandrov 90d73159d9 bridge: mdb: add deleted when monitoring delmdb event
Before this patch both addmdb and delmdb events were printed the same,
now we'll get a "Deleted" string in front when delmdb is received.
Before:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^ No way to differentiate between both events.

After:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1
(monitor) Deleted dev br0 port eth3 grp 239.0.0.1 temp

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-07-31 18:13:05 -07:00
Stephen Hemminger 68831d6b45 Merge branch 'master' into net-next 2015-07-31 18:12:57 -07:00
Antti Paila 531d5da413 ip: Preserve original portocol family in batch mode
Reset the 'preferred_family' global variable
to its initially set value before each batch
file command is processed.

Signed-off-by: Antti Paila <antti.paila@gmail.com>
2015-07-31 18:10:14 -07:00
Roopa Prabhu cd8df30b7c bridge fdb: add 'use' option to set NTF_USE flag in fdb add requests
This is similar to command options corresponding to other NTF_* flags
already exposed to the user space (examples self/master).

Also updates bridge man page (The man page patch also includes
a fix to the 'self' entry and documents 'master' for fdb entries)

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-07-31 18:09:41 -07:00
Stephen Hemminger efb169717e bridge: drop man page fragment
Left over copy/paste from ip monitor man page.
2015-07-28 16:50:19 -07:00
Nikolay Aleksandrov 6aac861713 bridge: mdb: add support for vlans
This patch allows the user to specify the vlan of the mdb group being
added or deleted and adds support for displaying the vlan when
dumping mdb information or monitoring it. It also updates the man page
to reflect the new "vid" argument for mdb.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-07-28 16:45:50 -07:00
Anuradha Karuppiah 188648270a ip link: proto_down config and display.
This patch adds support to set and display protodown on a switch port. The
switch driver can handle this error state by doing a phys down on the port.

One example user space application setting this flag is a multi-chassis
LAG application to handle split-brain situation on peer-link failure.

Example:
root@net-next:~# ip link set eth1 protodown on
root@net-next:~/iproute2# ip link show eth1
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:12:35:01 brd ff:ff:ff:ff:ff:ff protodown on
root@net-next:~/iproute2# ip link set eth1 protodown off
root@net-next:~/iproute2# ip link show eth1
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:12:35:01 brd ff:ff:ff:ff:ff:ff
root@net-next:~/iproute2#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
2015-07-28 16:43:20 -07:00
Stephen Hemminger a3563ede2d update to 4.2-net-next headers 2015-07-28 16:42:12 -07:00
Felix Janda ea343669fa Replace BSD MAXPATHLEN by POSIX PATH_MAX
Prefer using the POSIX constant PATH_MAX instead of the legacy BSD
derived MAXPATHLEN. The necessary includes for MAXPATHLEN and PATH_MAX
are <sys/param.h> and <limits.h>, respectively.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
2015-07-28 16:39:29 -07:00
Nikolay Aleksandrov 6b4867e621 bridge: mdb: add support for router add/del notifications monitoring
This patch adds support for ADDMDB/DELMDB notifications about router ports
which have been added or deleted/expired respectively.

Example output:
$ bridge -s monitor mdb
Deleted router port dev eth3 master br0
router port dev eth3 master br0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-07-27 14:39:18 -07:00
Zhang Shengju cb89c7c70a ip/ip6tunnel: fix missing return value check
Make sure that return value of each socket() call is properly checked
and do not continue processing if the call failed.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-07-27 14:37:47 -07:00
Zhang Shengju 0dc2e22978 xfrm: remove duplicated include
Remove dupldated include for <linux/xfrm.h>, since it's already
included by 'xfrm.h'.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
2015-07-27 14:36:53 -07:00
Nicolas Dichtel 611f70b287 tc: fix bpf compilation with old glibc
Error was:
f_bpf.o: In function `bpf_parse_opt':
f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv'
m_bpf.o: In function `parse_bpf':
m_bpf.c:(.text+0x587): undefined reference to `secure_getenv'
collect2: error: ld returned 1 exit status

There is no special reason to use the secure version of getenv, thus let's
simply use getenv().

CC: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 88eea53954 ("tc: {f,m}_bpf: allow to retrieve uds path from env")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
2015-07-27 14:35:42 -07:00
Vadim Kochan 814f9b9919 man ss: Fix explanation when no options specified
Really by default ss dumps not only TCP sockets but any kind of socket
which is in ESTABLISHED state (TCP/UDP/UNIX).

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Reported-by: Miha Marolt <miham@beyondsemi.com>
2015-07-27 14:35:02 -07:00
Stephen Hemminger ec7aff5c4f ip: fix all the checkpatch warnings
Zhang Shengju some places where tabs were not being used.
Go ahead and fix all the trival checkpatch warnings in ip/ip.c
Also fix bridge.c
2015-07-26 21:50:22 -07:00
Vadim Kochan 99bb68ff66 ss: fix crash when dump stats from /proc with '-p'
It really partially reverts:

    ec4d0d8a9d (ss: Replace unixstat struct by new sockstat struct)

but adds few fields (name & peer_name) from removed unixstat to sockstat struct to easy
return original code.

Fixes: ec4d0d8a9d (ss: Replace unixstat struct by new sockstat struct)
Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-07-21 10:53:19 -07:00
Stephen Hemminger 92de1c2c82 remove unnecessary checks for NULL before free
Since free(NULL) is a no-op, it is safe to remove unnecesary
if checks.
2015-07-21 10:49:54 -07:00
Jiri Pirko 122f2fc573 iproute2: ipa: show switch id
We forgot to include this patch somehow. So do it now.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
2015-07-20 14:59:50 -07:00
Nikolay Aleksandrov 235c445347 ss: fix display of raw sockets
After commit 8250bc9ff4 ("ss: Unify inet sockets output") raw sockets
are displayed as udp because dgram_show_line() is used for both and
thus IPPROTO_UDP is used for both so proto_name() returns "udp".
Fix this by checking dg_proto which is set according to the caller of
dgram_show_line().

Reported-by: Miha Marolt <miham@beyondsemi.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2015-07-20 14:57:47 -07:00
Stephen Hemminger b0085d0ee9 update kernel headers for 4.2-rc1 2015-07-20 14:57:18 -07:00
Roopa Prabhu 56d8ff0ac8 support batching of ip route get commands
This patch replaces exits with returns in
ip route get command handling. This allows batching
of ip route get commands.

$cat route_get_batch.txt
route get 10.0.14.2
route get 12.0.14.2
route get 10.0.14.4

$ip -batch route_get_batch.txt
local 10.0.14.2 dev lo  src 10.0.14.2
    cache <local>
12.0.14.2 via 192.168.0.2 dev eth0  src 192.168.0.15
    cache
10.0.14.4 dev dummy0  src 10.0.14.2
    cache

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-07-20 14:55:19 -07:00
Jan Engelhardt a6ea668c91 build: must honor pkg-config flags for libmnl
The build otherwise fails if libmnl does not directly live in a
standard search path.
2015-07-06 14:50:58 -07:00
Gustavo Zacarias acfeb55a86 tipc: make build conditional on having libmnl
Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
2015-07-06 14:48:40 -07:00
Stephen Hemminger f5386e1150 headers update
if_tun: new ioctl value
libc-compat.h: add definitions for kernel build
2015-07-06 14:47:26 -07:00
Michal Kubeček 38db20ff2d include: add copy of tipc.h
Copy of kernel include/uapi/linux/tipc.h is needed to build on systems
with pre-3.16 kernel headers.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2015-07-06 14:47:05 -07:00
Stephen Hemminger 0c4a90c446 Merge branch 'master' into net-next 2015-06-26 14:08:49 -07:00
Stephen Hemminger e3006d5210 v4.1.0 2015-06-26 12:28:25 -07:00
Daniel Borkmann cbdd1e6921 tc: bpf: add initial man page
Add a start of a man-page to the misc section as a reference and
guide on (e)BPF classifier and actions. Given that tc is only tersely
documented, this is provided in the hope that users will have an
easier getting started with tc and (e)BPF. And, that there's now more
incentive for others to also start documenting their classifier and
actions as well. ;)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-06-26 11:25:57 -07:00
Phil Sutter f32dc7467f ss: print value of IPV6_V6ONLY socket option if set
If available and set, print 'v6only:1' for AF_INET6 sockets upon request
of extended information. For IPv6 sockets bound to in6addr_any, this is
the only way to determine if they will also accept IPv4 requests or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-06-26 00:13:47 -04:00
Andy Gospodarek 528c2551cd iproute2: add support to print 'linkdown' nexthop flag
Signed-off-by: Andy Gospodaerk <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
2015-06-26 00:13:47 -04:00
Craig Gallek 6885e3bf8e ss: Include -E option for socket destroy events
Use the IPv4/IPv6/TCP/UDP multicast groups of NETLINK_SOCK_DIAG
to filter and display socket statistics as they are destroyed.

Kernel support patch series: 24029a3603cfa633e8bc2b3fb3e48e76c497831d

Signed-off-by: Craig Gallek <kraig@google.com>
2015-06-26 00:13:47 -04:00
Nikolay Aleksandrov b0197a047e iplink_bridge: add support for priority
This patch adds support to set bridge stp priority via IFLA_BR_PRIORITY.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
2015-06-26 00:06:45 -04:00
Nikolay Aleksandrov dab049628a iplink_bridge: add support for stp_state
This patch adds support to set stp_state via IFLA_BR_STP_STATE.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
2015-06-26 00:06:45 -04:00
Nikolay Aleksandrov 6c99fb6076 iplink_bridge: add support for ageing_time
This patch adds support to set ageing_time via IFLA_BR_AGEING_TIME.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
2015-06-26 00:06:45 -04:00
Nikolay Aleksandrov 7d6bc3b87a bonding: export 3ad actor and partner port state
This patch adds support to retrieve the new bond slave attributes:
IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE
IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE
which are read-only.

(Removed if_link.h changes already updated in net-next)

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
2015-06-26 00:06:45 -04:00
Stephen Hemminger 69be46c562 Merge branch 'master' into net-next 2015-06-26 00:04:04 -04:00
Eran Ben Elisha a1b99717c7 Add displaying VF traffic statistics
Enable reading and displaying SRIOV VFs traffic statistics through
the host PF netdevice using the nested IFLA_VF_STATS attribute.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
2015-06-25 23:58:06 -04:00
Roopa Prabhu f638e9f7c8 mpls: always set type RTN_UNICAST and scope RT_SCOPE_UNIVERSE for
This patch fixes incorrect -EINVAL errors due to invalid
scope and type during mpls route deletes.

$ip -f mpls route add 100 as 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
RTNETLINK answers: Invalid argument

$ip -f mpls route del 100
RTNETLINK answers: Invalid argument

After patch:

$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route show

Always set type to RTN_UNICAST for mpls route add/deletes.
Also to keep things consistent with kernel set scope to
RT_SCOPE_UNIVERSE for both mpls and ipv6 routes. Both mpls and ipv6 route
deletes ignore scope.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-06-25 23:54:27 -04:00
Craig Gallek ecb435eacd ss: add support for segs_in and segs_out
Two new tcp_info fields: tcpi_segs_in and tcpi_segs_out.
(2efd055c53c06b7e89c167c98069bab9afce7e59)

~: ss -ti src :22
	 cubic wscale:7,6 rto:201 rtt:0.244/0.012 ato:40 mss:1418 cwnd:21 bytes_acked:80605 bytes_received:20491 segs_out:414 segs_in:600 send 976.3Mbps lastsnd:23 lastrcv:23 lastack:22 pacing_rate 1952.7Mbps rcv_rtt:98 rcv_space:28960

Signed-off-by: Craig Gallek <kraig@google.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
2015-06-25 23:50:15 -04:00
Stephen Hemminger ff631c3a10 update to 4.2-pre-rc headers
This update is to santized kernel headers from net-next.
With one change for fixing the in.h header incompatiablity
(already sent upstream).
2015-06-25 22:34:26 -04:00
John W. Linville f4739b2ee7 iplink_geneve: add tos configuration at link creation
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-06-25 15:16:31 -04:00
John W. Linville f4c05c2e99 iplink_geneve: add ttl configuration at link creation
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-06-25 15:16:31 -04:00
John W. Linville c1a1d8bc4c iproute2: update ip-link.8 for geneve tunnels
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-06-25 15:16:31 -04:00
Daniel Borkmann 88eea53954 tc: {f,m}_bpf: allow to retrieve uds path from env
Allow to retrieve uds path from the environment, facilitates
also dealing with export a bit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-25 15:13:16 -04:00
Daniel Borkmann 473d7840c3 tc: {f,m}_bpf: add tail call support for parser
Kernel commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
bpf programs") added support for tail calls, this patch here adds tc
front end parts for the object parser to prepopulate a given eBPF prog
array before the root prog is pushed down for classifier creation. The
prepopulation works with any number of prog arrays in any dependencies,
e.g. prog or normal maps could also be used from progs that are
tail-called themself, etc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-25 15:13:16 -04:00
Stephen Hemminger aaf7045802 configure: cleanup
Don't echo "-e" when using builtin echo in bash.
2015-06-25 15:10:22 -04:00
Maciej Żenczykowski bbd303d183 iproute2: misc/ss.c - fix run_ssfilter af_packet when protocol == 0
s->local.data is a pointer to a field of a non-NULL struct, and hence
cannot be NULL, thus comparing it to 0 is always false, and thus the
return is always false.

Presumably this was meant to be a check whether s->local.data[0] (which
I believe stores af_packet protocol) is 0, ie. ANY.

Change-Id: Ia232f5b06ce081e3b2fb6338f1a709cd94e03ae5
Fixes:
  ss.c:1018:37: error: comparison of array 's->local.data' equal to a null pointer is always false [-Werror,-Wtautological-pointer-compare]
    return s->lport == 0 && s->local.data == 0;
                            ~~~~~~~~~^~~~    ~
  1 error generated.
2015-06-25 08:52:06 -04:00
Maciej Żenczykowski 0bbca0422f iproute2: tc/m_pedit.c - remove dead code
The initializers are simply not needed.

These if-blocks are outright dead code, because '0 > unsigned' is always
false, so only else clause triggers and regardless of which clause triggers
it only updates 'ind' which is later unconditionally written to before
being used anyway.

Otherwise we get errors from clang:

  m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
    if (0 > tkey->off) {
        ~ ^ ~~~~~~~~~
  m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
    if (0 > tkey->off) {
        ~ ^ ~~~~~~~~~
  2 errors generated.

Change-Id: I3c9e9092915088fc56f992e5df736851541a4458
2015-06-25 08:52:06 -04:00
Mazhar Rana 45b01c46d4 mroute: "ip mroute show" not working when "to" and/or "from" is given
The command "ip mroute show" is not showing routes when "to" and/or "from"
filter is applied.

root@mazhar:~# ip mroute show
(10.202.30.101, 235.1.2.3)       Iif: eth0       Oifs: eth1

But When I applied filter, it does not show anything.

root@mazhar:~# ip mroute show 235.1.2.3 from 10.202.30.101
root@mazhar:~#

Signed-off-by: Mazhar Rana <ranamazharp@gmail.com>
2015-06-25 08:47:07 -04:00
Thadeu Lima de Souza Cascardo 4e4b78324f Fix changing tunnel remote and local address to any
If a tunnel is created with a local address, you can't change it to any.

 # ip tunnel add tunl1 mode ipip remote 10.16.42.37 local 10.16.42.214 ttl 64
 # ip tunnel show tunl1
 tunl1: ip/ip  remote 10.16.42.37  local 10.16.42.214  ttl 64
 # ip tunnel change tunl1 local any
 # echo $?
 0
 # ip tunnel show tunl1
 tunl1: ip/ip  remote 10.16.42.37  local 10.16.42.214  ttl 64

It happens that parse_args zeroes ip_tunnel_parm, and when creating the
tunnel, it is OK to leave it as is if the address is any. However, when
changing the tunnel, the current parameters will be read from
ip_tunnel_parm, and local and remote address won't be zeroes anymore, so
it needs to be explicitly set to any.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-06-25 08:45:24 -04:00
Stephen Hemminger f975059a51 Merge branch 'master' into net-next 2015-06-25 08:01:51 -04:00
Stephen Hemminger 586b397851 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 into net-next 2015-06-25 08:01:41 -04:00
Vadim Kochan 30383b074d tests: Add output testing
Added possibility to check command output by grep from the testing
script.

Now TMP_OUT & TMP_ERR are passed from Makefile and changed to
STD_ERR & STD_OUT.

Also changed some existing tests to make output testing.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-06-24 23:37:26 -04:00
Daniel Borkmann ad1fe0d8e9 tc: util: fix print_rate for ludicrous speeds
The for loop should only probe up to G[i]bit rates, so that we
end up with T[i]bit as the last max units[] slot for snprintf(3),
and not possibly an invalid pointer in case rate is multiple of
kilo.

Fixes: 8cecdc2837 ("tc: more user friendly rates")
Reported-by: Jose R. Guzman Mosqueda <jose.r.guzman.mosqueda@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-24 23:34:20 -04:00
Eric Dumazet 518af1e0b1 ss: do not bindly dump two families
ss currently dumps IPv4 sockets, then IPv6 sockets from the kernel,
even if -4 or -6 option was given. Filtering in user space then has to
drop all sockets of wrong family. Such a waste of time...

Before :

$ time ss -tn -4 | wc -l
251659

real	0m1.241s
user	0m0.423s
sys	0m0.806s

After:

$ time ss -tn -4 | wc -l
251672

real	0m0.779s
user	0m0.412s
sys	0m0.386s

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-06-24 23:11:33 -04:00
Eric Dumazet 22588a0e65 ss: speedup resolve_service()
Lets implement a full cache with proper hash table, memory got cheaper
these days.

Before :

$ time ss -t | wc -l
529678

real	0m22.708s
user	0m19.591s
sys	0m2.969s

After :

$ time ss -t | wc -l
528291

real	0m5.078s
user	0m4.099s
sys	0m0.985s

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-06-24 23:11:33 -04:00
Eric Dumazet d2055ea597 ss: Fix allocation of cong control alg name
On Fri, 2015-05-29 at 13:30 +0300, Vadim Kochan wrote:
> From: Vadim Kochan <vadim4j@gmail.com>
>
> Use strdup instead of malloc, and get rid of bad strcpy.
>
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> ---
>  misc/ss.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/misc/ss.c b/misc/ss.c
> index 347e3a1..a719466 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -1908,8 +1908,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
>
>  		if (tb[INET_DIAG_CONG]) {
>  			const char *cong_attr = rta_getattr_str(tb[INET_DIAG_CONG]);
> -			s.cong_alg = malloc(strlen(cong_attr + 1));
> -			strcpy(s.cong_alg, cong_attr);
> +			s.cong_alg = strdup(cong_attr);
>  		}
>
>  		if (TCPI_HAS_OPT(info, TCPI_OPT_WSCALE)) {

I doubt TCP_CA_NAME_MAX will ever change in the kernel : 16 bytes.

Its typically "cubic" and less than 8 bytes.

Using 8 bytes to point to a malloc(8) is a waste.

Please remove the memory allocation, or store the pointer, since
tcp_show_info() does the malloc()/free() before return.
2015-06-24 23:11:33 -04:00
Vadim Kochan b6907403ef configure: Check for libmnl
Indicate existence of libmnl which is required by tipc.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-06-24 23:09:25 -04:00
Mike Frysinger 232aaf4f4b enable transparent LFS
Make sure we use 64-bit filesystem functions everywhere.  This applies not
only to being able to read large files (which generally doesn't apply to
us), but also being able to simply stat them (as they might be using large
inodes).

Signed-off-by: Mike Frysinger <vapier@chromium.org>
2015-06-24 23:07:34 -04:00
Stephen Hemminger 439951f8bf pkt_cls: update header
Upstream changes removed some kernel only stuff from header file.
2015-05-28 09:18:28 -07:00
Stephen Hemminger 03371c7d98 Merge branch 'master' into net-next
Conflicts:
	include/linux/tcp.h
	lib/libnetlink.c
2015-05-28 09:18:01 -07:00
Stephen Hemminger c52827e907 change of rtnetlink to use RTN_F_OFFLOAD
The definition of offload flag changed during 4.1 rc process.
2015-05-27 18:29:02 -07:00
Stephen Hemminger ebfe49224b update to 4.1-rc5 headers
Pull in some changes like RTN_F_EXTERNAL
2015-05-27 18:27:42 -07:00
Stephen Hemminger c079e121a7 libnetlink: add size argument to rtnl_talk
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.

Also drop the unused peer and group arguments to rtnl_talk.
2015-05-27 13:00:21 -07:00
Jetchko Jekov bde5baa547 gre: raising the size of the buffer holding nl messages.
Now it matches the size for the answer defined in rtnl_talk()
and prevents stack corruption with answer > 1024 bytes.
2015-05-27 12:27:31 -07:00
David Ward aacee2695a tc: gred: Add support for TCA_GRED_LIMIT attribute
Allow the qdisc limit to be set, which is particularly useful when
the default VQ is not configured with RED parameters.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 15:30:39 -07:00
Nicolas Dichtel b6ec53e300 xfrmmonitor: allows to monitor in several netns
With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where is socket is opened.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Nicolas Dichtel 449b824ad1 ipmonitor: allows to monitor in several netns
With this patch, it's now possible to listen in all netns that have an nsid
assigned into the netns where the socket is opened.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Nicolas Dichtel 3b0006f818 ipmonitor: introduce print_headers
The goal of this patch is to avoid code duplication.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Nicolas Dichtel 0628cddd9d libnetlink: introduce rtnl_listen_filter_t
There is no functional change with this commit. It only prepares the next one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Nicolas Dichtel 2503247d58 man: update ip monitor page
Add label option.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Jonathan Toppins 6fc1f8add3 iplink_bond: add support for ad_actor and port_key options
This adds support for setting and displaying the following bonding
options:
* ad_user_port_key
* ad_actor_sys_prio
* ad_actor_system

Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
2015-05-21 15:26:48 -07:00
Eric Dumazet df1c7d9138 codel: add ce_threshold support to codel & fc_codel
codel & fq_codel packet schedulers are now able to have a threshold
for CE marking packets, regardless of the drop/nodrop decision taken by
CoDel.

This is particularly useful for dctcp and variants, that do not use
traditional ECN.

Note that fq_codel users would have to specify noecn if ce_threshold is
used, otherwise results would be not very interesting, as ecn is default
on for fq_codel.

$ tc -s qdisc show dev eth1
qdisc codel 8002: root refcnt 45 limit 1000p target 5.0ms ce_threshold
1.0ms interval 100.0ms
 Sent 4908469888317 bytes 3351813967 pkt (dropped 0, overlimits 0
requeues 21624365)
 rate 37671Mbit 3231836pps backlog 4904740b 250p requeues 21624365
  count 0 lastcount 0 ldelay 1.1ms drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 410861803

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-05-21 15:25:05 -07:00
Jiri Pirko 30eb304ecd tc: add support for Flower classifier
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-05-21 15:22:49 -07:00
Eric Dumazet 1a4dda7103 ss: add support for bytes_acked & bytes_received
tcp_info has 2 new fields : bytes_acked & bytes_received

$ ss -ti src :22
...
	 cubic wscale:7,6 rto:234 rtt:33.199/17.225 ato:17.225 mss:1418 cwnd:9
ssthresh:9 send 3.1Mbps lastsnd:3 lastrcv:4 lastack:193
bytes_acked:188396 bytes_received:13639 pacing_rate 6.2Mbps unacked:1
retrans:0/4 reordering:4 rcv_rtt:47.25 rcv_space:28960

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-05-21 15:21:04 -07:00
John W. Linville 908755dc49 iproute2: GENEVE support
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2015-05-21 15:17:53 -07:00
Stephen Hemminger f9b004020a Merge branch 'master' into net-next 2015-05-21 14:52:42 -07:00
Stephen Hemminger 8f42ceaf24 Update kernels for net-next
Get latest files
2015-05-21 14:52:08 -07:00
Vadim Kochan 2631b85666 ss: Show more info (ring,fanout) for packet socks
Print such info like version, tx/rx ring, fanout for
packet sockets when '-e' option was specified.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-21 14:47:44 -07:00
Vadim Kochan fede6dd9b3 tests: Add test for 'ip route add default'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-21 14:45:21 -07:00
Vadim Kochan 64dedc4739 tests: Run each test in network namespace
Changed to forcely running each test in network
namespace to do not affect on current network setup.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-21 14:45:17 -07:00
Richard Alpe f043759dd4 tipc: add new TIPC configuration tool
tipc is a user-space configuration tool for TIPC (Transparent
Inter-process Communication). It utilizes the TIPC netlink API in the
kernel to fetch data or perform actions.

The tipc tool has somewhat similar syntax to the ip tool meaning that
users of the ip tool should not feel that unfamiliar with this tool.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
2015-05-21 14:41:41 -07:00
Stephen Hemminger cbb99f7fbe Update to latest kernel headers
Also add tipc_netlink.h for later TIPC support
2015-05-21 14:41:11 -07:00
David Ward 357c45ad3a tc: gred: Adopt the term VQ in the command syntax and output
In the GRED kernel source code, both of the terms "drop parameters"
(DP) and "virtual queue" (VQ) are used to refer to the same thing.
Each "DP" is better understood as a "set of drop parameters", since
it has values for limit, min, max, avpkt, etc. This terminology can
result in confusion when creating a GRED qdisc having multiple DPs.
Netlink attributes and struct members with the DP name seem to have
been left intact for compatibility, while the term VQ was otherwise
adopted in the code, which is more intuitive.

Use the VQ term in the tc command syntax and output (but maintain
compatibility with the old syntax).

Rewrite the usage text to be concise and similar to other qdiscs.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward eb6d7d6af1 tc: gred: Handle unsigned values properly in option parsing/printing
DPs, def_DP, and DP are unsigned values that are sent and received
in TCA_GRED_* netlink attributes; handle them properly when they
are parsed or printed. Use MAX_DPs as the initial value for def_DP
and DP, and fix the operator used for bounds checking them.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward 1693a4d392 tc: gred: Improve parameter/statistics output
Make the output more consistent with the RED qdisc, and only show
details/statistics if the appropriate flag is set when calling tc.

Show the parameters used with "gred setup". Add missing statistics
"pdrop" and "other". Fix format specifiers for unsigned values.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward a77905ef6a tc: gred: Print usage text if no arguments appear after "gred"
This is more helpful to the user, since the command takes two forms,
and the message that would otherwise appear about missing parameters
assumes one of those forms.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward d73e0408e2 tc: gred: Fix whitespace issues in code
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward 7bf17a2264 tc: red: Mark "bandwidth" parameter as optional in usage text
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward d93c909a4c tc: red, gred: Notify when using the default value for "bandwidth"
The "bandwidth" parameter is optional, but ensure the user is aware
of its default value, to proactively avoid configuration problems.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward 6c99695da2 tc: red, gred: Fix format specifier in burst size warning
burst is an unsigned value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward 9d9a67c756 tc: red, gred: Rename overloaded variable wlog
It is used when parsing three different parameters, only one of
which is Wlog. Change the name to make the code less confusing.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
Vadim Kochan 699589f6df man ip-link: Remove extra GROUP explanation
Remove double explanation of GROUP option from 'ip link set' section.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-14 15:38:10 -07:00
Lennert Buytenhek 2c0feda8be man ip-link: Add missing lowpan link type
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
2015-05-11 09:19:22 -07:00
Daniel Borkmann ec6f5abcea tc: minor cleanup on ingress
Fix whitespacing and remove the unnecessary condition.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-05-11 09:18:10 -07:00
Eric Dumazet 3bf5445c5e ss: dctcp changes
Missing space before dctcp: markers.

With dctcp, cwnd=2 is pretty common, just display cwnd value even
if cwnd has this value, it makes parsing easier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2015-05-11 09:16:43 -07:00
Eric Dumazet 656e8fdd2d ss: small optim in tcp_show_info()
Kernel can give us smaller tcp_info than our.

We copy the kernel provided structure and fill with 0
the remaining part.

Lets clear only the missing part to save some cycles, as we intend to
slightly increase tcp_info size in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-05-11 09:15:08 -07:00
Thomas Graf 38a7f26828 route: Add missing newline in helptext
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2015-05-11 09:14:44 -07:00
WANG Cong 285e7768e8 tc: fill in handle before checking argc
When deleting a specific basic filter with handle,
tc command always ignores the 'handle' option, so
tcm_handle is always 0 and kernel deletes all filters
in the selected group. This is wrong, we should respect
'handle' in cmdline.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2015-05-11 09:13:20 -07:00
Thomas Graf f7dd7e5e71 iproute2: Fix typo in get_prefix_1()
Fixes a typo in get_prefix_1() which broke the prefix default
names { default | any | all }.

The most obvious fallout from this bug was:

	$ ip route add default via 1.1.1.1
	Error: an inet prefix is expected rather than "default".

Fixes: dacc5d4197 ("add basic mpls support to iproute")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2015-05-11 09:11:53 -07:00
Stephen Hemminger 906cafe3ff ip: fix exit code for addrlabel
The exit code for ip label was not correct.
The return from the command function is negated and turned into
the exit code on failure.
2015-05-07 08:11:30 -07:00
Stephen Hemminger 076ae7089a ip: fix exit code for rule failures
If ip rule command fails talking to kernel, exit code should be 2.
The sub-command is called by cmd loop and the exit code is negative
of return value from the command callback.
2015-05-07 08:11:30 -07:00
Stephen Hemminger d58ba4ba2a ip: return correct exit code on route failure
If kernel complains about ip route request, exit status should be
2 not 1.

This fixes regression introduced by:
commit 42ecedd4ba
Author: Roopa Prabhu <roopa@cumulusnetworks.com>
Date:   Tue Mar 17 19:26:32 2015 -0700

    fix ip -force -batch to continue on errors
2015-05-07 08:11:30 -07:00
Stephen Hemminger ce743da171 ip: document exit code
The ip command has always had a consistent exit status
document it so that developers see it.
2015-05-07 08:11:28 -07:00
Vlad Zolotarov 6c55c8c461 ip link set vf: Added "query_rss" command
Add a new option to toggle the ability of querying the RSS configuration of a specific VF.

VF RSS information like RSS hash key may be considered sensitive on some devices where
this information is shared between VF and PF and thus its querying may be prohibited by default.

This new option allows a system administrator with privileges to modify a PF state
to control if the above VF querying is allowed or not.

For example:
 To enable RSS querying of VF[0] of ethX:
 >> ip link set dev ethX vf 0 query_rss on

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-05-04 09:08:26 -07:00
Stephen Hemminger 270763546a update headers to 4.1-rc1 net-next 2015-05-04 09:04:59 -07:00
Vadim Kochan 8916ccf66c ip link: Add group in usage() for 'ip link delete'
Show deleting by group in 'ip link help' output:

...
ip link delete { DEVICE | dev DEVICE | group DEVGROUP } type TYPE [ ARGS ]
...

Also show separately DEVICE option in { } list.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-04 09:00:59 -07:00
Vadim Kochan 7f74cf6de0 man ip-link: Add deleting links by group
Indicate possibility deleting virtual links by group.

Also changed the alignment of 'ip link delete' args
descriptions, to look like similary to 'ip link set'.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-05-04 09:00:52 -07:00
Vadim Kochan 57ff5a1096 ss: Fix wrong filter behaviour
Fixed applying family & socket type filters.
It was not possible to select UDP & UNIX sockets together.

Now selected families are ORed.

The problem was that filters were combined by AND.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Reported-By: Mihai Moldovan <ionic@ionic.de>
2015-05-04 08:58:47 -07:00
Daniel Borkmann d937a74b6d tc: {m, f}_ebpf: add option for dumping verifier log
Currently, only on error we get a log dump, but I found it useful when
working with eBPF to have an option to also dump the log on success.
Also spotted a typo in a header comment, which is fixed here as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-05-04 08:43:08 -07:00
Mathias Nyman d7bd2db52c ip: Add color output option
It is hard to quickly find what you are looking for in the output of the
ip command. Color helps.

This patch adds a '-c' flag to highlight these with individual colors:
  - interface name
  - ip address
  - mac address
  - up/down state

Signed-off-by: Mathias Nyman <m.nyman@iki.fi>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
2015-05-04 08:39:17 -07:00
Stephen Hemminger aeedd8e1e7 update headers to reflect BPF changes
Reclone sanitized headers from 4.1-rc
2015-04-29 12:33:24 -07:00
Daniel Borkmann 279d6a8ba7 examples: bpf: fix ld offs to have same prog loaded on ingress/egress
Fix up the eBPF example program to match our kernel fix in a166151cbe33 ("bpf:
fix bpf helpers to use skb->mac_header relative offsets"). Tested on ingress
and egress paths.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-27 16:42:32 -07:00
Daniel Borkmann 4bd624467b tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6        <- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5        <- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7        <- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-27 16:39:23 -07:00
Nicolas Dichtel 505f91869f mroute: remove invalid check against NLM_F_MULTI
This flag is only for the netlink protocol (multi-part messages), no reason
to reject messages without it.

Note that this flag was removed by the following kernel patches (v3.14)
65886f439ab0 ipmr: fix mfc notification flags
f518338b1603 ip6mr: fix mfc notification flags

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-27 11:41:46 -07:00
Nicolas Dichtel b765eda924 libnamespaces: fix warning about syscall()
The warning was:
In file included from namespace.c:14:0:
../include/namespace.h: In function ‘setns’:
../include/namespace.h:37:2: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration]

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-27 11:41:46 -07:00
Nicolas Dichtel afa5158f02 tc: fix compilation warning on 32bits arch
The warning was:
m_simple.c: In function ‘parse_simple’:
m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat]

Useful to be able to compile with -Werror.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-27 11:41:46 -07:00
Vadim Kochan 46679bbbe8 tc util: Fix possible buffer overflow when print class id
Use correct handle buffer length.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-20 10:06:02 -07:00
Nicolas Dichtel 782cf01dc0 ipxfrm: wrong nl msg sent on deleteall cmd
XFRM netlink family is independent from the route netlink family. It's wrong
to call rtnl_wilddump_request(), because it will add a 'struct ifinfomsg' into
the header and the kernel will complain (at least for xfrm state):

netlink: 24 bytes leftover after parsing attributes in process `ip'.

Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-20 10:04:20 -07:00
Nicolas Dichtel d652ccbf81 netns: allow to dump and monitor nsid
Two commands are added:
 - ip netns list-id
 - ip monitor nsid

A cache is also added to remember the association between the iproute2 netns
name (from /var/run/netns/) and the nsid.
To avoid interfering with the rth socket, a new rtnl socket (rtnsh) is used to
get nsid (we may send rtnl request during listing on rth).

Example:
$ ip netns list-id
nsid 0 (iproute2 netns name: foo)
$ ip monitor nsid
Deleted nsid 0 (iproute2 netns name: foo)
nsid 16 (iproute2 netns name: bar)

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-20 10:02:38 -07:00
Pavel Šimerda b1410e0ab1 lnstat: dump to stdout, not stderr
See also:

 * https://bugzilla.redhat.com/show_bug.cgi?id=736332

Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2015-04-20 09:58:46 -07:00
Pavel Šimerda e7e2913fe4 lnstat: run indefinitely by default
See also:

 * https://bugzilla.redhat.com/show_bug.cgi?id=977845

Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2015-04-20 09:58:11 -07:00
Pavel Šimerda a51842dcd7 cbq: fix find syntax in example
Without modification, using the example resulted in the following error:

[root@localhost sbin]# cbq restart
find: warning: you have specified the -maxdepth option after a
non-option argument (, but options are not positional (-maxdepth affects
tests specified before it as well as those specified after it).  Please
specify options before other arguments.

find: warning: you have specified the -maxdepth option after a
non-option argument (, but options are not positional (-maxdepth affects
tests specified before it as well as those specified after it).  Please
specify options before other arguments.

**CBQ: failed to compile CBQ configuration!

See also:

 * https://bugzilla.redhat.com/show_bug.cgi?id=539232

Reported-by: Mads Kiilerich <mads@kiilerich.com>
Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2015-04-20 09:57:14 -07:00
Pavel Šimerda 11a3e5c4b3 ip-xfrm: support 'proto any' with 'sport' and 'dport'
When creating an IPsec SA that sets 'proto any' (IPPROTO_IP) and
specifies 'sport' and 'dport' at the same time in selector, the
following error is issued:

"sport" and "dport" are invalid with proto=ip

However using IPPROTO_IP with ports is completely legal and necessary
when one wants to share the SA on both TCP and UDP. One of the
applications requiring sharing SAs is 3GPP IMS AKA authentication.

See also:

 * https://bugzilla.redhat.com/show_bug.cgi?id=497355

Reported-by: Jiří Klimeš <jklimes@redhat.com>
Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2015-04-20 09:56:44 -07:00
Pavel Šimerda 06ec9039c3 turn Makefile more distribution friendly
Changes:

 * Accept directory settings from environment.
 * Remove redundant ROOTDIR variable.
 * Set KERNEL_INCLUDE default to '/usr/include'.
 * Use CFLAGS from environemnt.

Note: In the long term it might be better to improve the configure
script to generate those parts of the Makefile in a manner similar
to autoconf. It might be even practical to autotoolize the package.

Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2015-04-20 09:53:24 -07:00
Felix Fietkau b8d5c9a71b tc: add support for connmark action
Add ability to add the netfilter connmark support.

Typical usage:
...lets tag outgoing icmp with mark 0x10..
iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
..add on ingress of $ETH an extractor for connmark...
tc filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff \
flowid 1:1 \
action connmark continue
...if the connmark was 0x11, we police to a ridic rate of 10Kbps
tc filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 fw flowid 1:1 \
action police rate 10kbit burst 10k

Other ways to use the connmark is to supply the zone, index and
branching choice. Refer to help.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-13 10:49:45 -07:00
Stephen Hemminger 94f665387e update kernel headers and add tc_connmark.h
Needed for later tc action patches
2015-04-13 10:49:33 -07:00
Andy Gospodarek aa05b988f5 iproute2: unify naming for entries offloaded to hardware
The kernel now has the capability to offload FDB and FIB entries to hardware.
It is important to let users know if table entries are also offloaded to
hardware.  Currently offloaded FDB entries are indicated by the existence of
the flag 'external' on the entry as of the following commit:

commit 28467b7f3f
Author: Scott Feldman <sfeldma@gmail.com>
Date:   Thu Dec 4 09:57:15 2014 +0100

    bridge/fdb: add flag/indication for FDB entry synced from offload device

When the patch to add support for indicating that FIB entries were also
offloaded as posted to netdev by Scott Feldman it became clear that 'external'
would not be an ideal name for routes.  There could definitely be confusion
about what this might mean since many routes are to external networks -- a
collision/confusion that did not happen with FDB.

Scott Feldman asked me to check with others and build concensus around a name.
After speaking with several people about this I am proposing we refer to both
FDB and FIB entries that are currently backed by hardware (based on the work
done in rocker) with the flag 'offload' appended to the end ofthe entry.

Some people liked the string 'external,' others liked 'hardware,' but the point
is to communicate that these routes are available to something that will will
offload the forwarding normally done by the kernel.  Since the term 'offload'
is used so frequently it seems appropriate to use the same language in
ip/bridge output.

The term 'offload' also seems to resonate with many of the people who have
responded on Scott's original thread or to those who I reached out to directly
and did respond to my query, so it seems we have reached consensus that it
should be the term used going forward.

v2: rebased against net-next branch

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: John W. Linville <linville@tuxdriver.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Scott Feldman <sfeldma@gmail.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
2015-04-13 09:40:46 -07:00
Stephen Hemminger 93531fac41 Merge branch 'master' into net-next 2015-04-13 09:39:46 -07:00
Stephen Hemminger 672acc7238 fix whitespace 2015-04-13 09:39:34 -07:00
Stephen Hemminger aed6d85d15 v4.0.0 2015-04-13 08:55:11 -07:00
Nicolas Dichtel 4c7d9a5888 ipnetns: add a runtime check for RTM_GETNSID support
The goal of this patch is to test during the runtime if the command RTM_GETNSID
is supported by the kernel.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-13 08:50:10 -07:00
Nicolas Dichtel 5a2ce86823 Revert "ip netns: Fix rtnl error while print netns list"
This reverts commit d116ff3414.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-13 08:50:10 -07:00
Nicolas Dichtel 694ed195a0 Revert "configure: add missing INCLUDE to netnsid detection"
This reverts commit d059de70ca.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-13 08:50:10 -07:00
Daniel Borkmann 6256f8c9e4 tc, bpf: finalize eBPF support for cls and act front-end
This work finalizes both eBPF front-ends for the classifier and action
part in tc, it allows for custom ELF section selection, a simplified tc
command frontend (while keeping compat), reusing of common maps between
classifier and actions residing in the same object file, and exporting
of all map fds to an eBPF agent for handing off further control in user
space.

It also adds an extensive example of how eBPF can be used, and a minimal
self-contained example agent that dumps map data. The example is well
documented and hopefully provides a good starting point into programming
cls_bpf and act_bpf.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-10 13:31:19 -07:00
Stephen Hemminger f0eb8da59a Merge branch 'master' into net-next 2015-04-10 13:27:37 -07:00
Vadim Kochan 943c30de80 man tc: Add description about class name option
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-10 13:26:28 -07:00
Jiri Benc d059de70ca configure: add missing INCLUDE to netnsid detection
Fixes: d116ff3414 ("ip netns: Fix rtnl error while print netns list")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2015-04-10 13:23:35 -07:00
Christophe Gouault 811aca0448 xfrm: revise man page and document ip xfrm policy set
- document ip xfrm policy set
- update ip xfrm monitor documentation
- in DESCRIPTION section, reorganize grouping of commands

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
2015-04-10 13:21:30 -07:00
Christophe Gouault 025fa9dc7a xfrm: add command for configuring SPD hash table
add a new command to configure the SPD hash table:
   ip xfrm policy set [ hthresh4 LBITS RBITS ] [ hthresh6 LBITS RBITS ]

and code to display the SPD hash configuration:
  ip -s -s xfrm policy count

hthresh4: defines minimum local and remote IPv4 prefix lengths of
selectors to hash a policy. If prefix lengths are greater or equal
to the thresholds, then the policy is hashed, otherwise it falls back
in the policy_inexact chained list.

hthresh6: defines minimum local and remote IPv6 prefix lengths of
selectors to hash a policy, otherwise it falls back
in the policy_inexact chained list.

Example:

% ip -s -s xfrm policy count
         SPD IN  0 OUT 0 FWD 0 (Sock: IN 0 OUT 0 FWD 0)
         SPD buckets: count 7 Max 1048576
         SPD IPv4 thresholds: local 32 remote 32
         SPD IPv6 thresholds: local 128 remote 128

% ip xfrm pol set hthresh4 24 16 hthresh6 64 56

% ip -s -s xfrm policy count
         SPD IN  0 OUT 0 FWD 0 (Sock: IN 0 OUT 0 FWD 0)
         SPD buckets: count 7 Max 1048576
         SPD IPv4 thresholds: local 24 remote 16
         SPD IPv6 thresholds: local 64 remote 56

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
2015-04-10 13:21:27 -07:00
Stephen Hemminger e46efaed0f update kernel headers for net-next
Current santized kernel headers from net-next
2015-04-10 13:18:38 -07:00
Stephen Hemminger 9339077928 xfrm: fix build with later kernel headers
Need to include netinet/in.h to get the correct glibc headers
instead of getting definitions in linux/in6.h
2015-04-10 13:17:54 -07:00
Stephen Hemminger bd733e4088 Merge branch 'master' into net-next
Conflicts:
	man/man8/ip-route.8.in
2015-04-07 08:56:14 -07:00
Pavel Šimerda a89d5329d4 docs: make spacing consistent
Result of the following command:

    sed -ri 's/\.  /. /g' man/*/*

Signed-Off-By: Pavel Šimerda <psimerda@redhat.com>
2015-04-07 08:41:36 -07:00
Vadim Kochan b6d6b5a1cd man ip-link: Add missing link types - vti,ipvlan,nlmon
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:36:20 -07:00
Vadim Kochan 21107f52b0 ip-link: Align usage at [link-netns ID] line
Output of the usage was shifted be cause of missing TAB

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:36:20 -07:00
Vadim Kochan bbf2f7c66d man ip-netns: Fix shifted layout at bottom of 'ip netns del'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:35:46 -07:00
Vadim Kochan 8b90a9907e tc class: Ignore if default class name file does not exist
If '-nm' specified that do not fail if there is no
default class names file in /etc/iproute2.

Changed default class name file cls_names -> tc_cls.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:31:56 -07:00
Lubomir Rintel 194e9b855d ip: support RFC4191 router preference
This allows querying and setting the route preference. It's usually set from
the IPv6 Neighbor Discovery Router Advertisement messages.

Introduced in "ipv6: expose RFC4191 route preference via rtnetlink", enqueued
for Linux 4.1.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
2015-03-24 15:45:23 -07:00
Eric W. Biederman dacc5d4197 add basic mpls support to iproute
- Pull in the uapi mpls.h
- Update rtnetlink.h to include the mpls rtnetlink notification multicast group.
- Define AF_MPLS in utils.h if it is not defined from elsewhere
  as is done with AF_DECnet

The address syntax for multiple mpls labels is a complete invention.
When I looked there seemed to be no wide spread convention for talking
about an mpls label stack in text for.  Sometimes people did:
"{ Label1, Label2, Label3 }", sometimes people would do:
"[ label3, label2, label1 ]", and most of the time label
stacks were not explicitly shown at all.

The syntax I wound up using, so it would not have spaces and so it
would visually distinct from other kinds of addresses is.

label1/label2/label3 Where label1 is the label at the top of the label
stack and label3 is the label at the bottom on the label stack.

When there is a single label this matches what seems to be convention
with other tools.  Just print out the numeric value of the mpls label.

The netlink protocol for labels uses the on the wire format for a
label stack. The ttl and traffic class are expected to be 0.  Using
the on the wire format is common and what happens with other address
types. BGP when passing label stacks also uses this technique with the
exception that the ttl byte is not included making each label in a BGP
label stack 3 bytes instead of 4.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 6f7a9f4dc5 add support for the RTA_NEWDST attribute.
This attribute is like RTA_DST except it specifies the destination
address to place on a packet when it leaves the host.  For ip based
protocols this is destination NAT and not a common part of forwarding.
For protocols like MPLS label swapping is something that typically
happens on every hop.

There is likely to be a RTA_NEWSRC at some point so RTA_NEWDST
is printed as "as to"  and can be specified either as "as to"
or just "as"

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 93ae283594 add support for the RTA_VIA attribute
Add support for the RTA_VIA attribute that specifies an address family
as well as an address for the next hop gateway.

To make it easy to pass this reorder inet_prefix so that it's tail
is a proper RTA_VIA attribute.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 8e8f8de42f misc whitespace cleanup 2015-03-24 15:45:23 -07:00
Eric W. Biederman 45c90d1990 add address family to/from string helper functions.
Add the functions family_name and read_family to convert an address
family to a string and to convernt a string to an address family.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 0b218ab18d add support for printing AF_PACKET addresses
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 71b4d59b30 make the addr argument of ll_addr_n2a const
This avoids build warnings when AF_PACKET support is added
to rt_addr_n2a.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Eric W. Biederman 26dcdf3a91 add a source addres length parameter to rt_addr_n2a
For some address families (like AF_PACKET) it is helpful to have the
length when prenting the address.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-03-24 15:45:23 -07:00
Daniel Borkmann 11c39b5e98 tc: add eBPF support to f_bpf
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").

A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.

Example usage:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y

  tc filter show dev em1 [...]
  filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o

I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-03-24 15:45:23 -07:00
Stephen Hemminger cbdc3ed88a update kernel headers to net-next 4.0-rc5
Lastest features
2015-03-24 15:45:23 -07:00
Daniel Borkmann b54ac87ef8 misc: header rebase, add bpf.h
Include the bpf.h uapi header file.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-03-24 15:45:23 -07:00
Madhu Challa e31867ac30 ip: enable configuring multicast group autojoin
Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.

Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.

By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic.

example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5
2015-03-24 15:45:23 -07:00
Scott Feldman 655444bdad route: label externally offloaded routes
On ip route print dump, label externally offloaded routes with "external".
Offloaded routes are flagged with RTNH_F_EXTERNAL, a recent additon to
net-next.  For example:

$ ip route
default via 192.168.0.2 dev eth0
11.0.0.0/30 dev swp1  proto kernel  scope link  src 11.0.0.2 external
11.0.0.4/30 via 11.0.0.1 dev swp1  proto zebra  metric 20 external
11.0.0.8/30 dev swp2  proto kernel  scope link  src 11.0.0.10 external
11.0.0.12/30 via 11.0.0.9 dev swp2  proto zebra  metric 20 external
12.0.0.2  proto zebra  metric 30 external
        nexthop via 11.0.0.1  dev swp1 weight 1
        nexthop via 11.0.0.9  dev swp2 weight 1
12.0.0.3 via 11.0.0.1 dev swp1  proto zebra  metric 20 external
12.0.0.4 via 11.0.0.9 dev swp2  proto zebra  metric 20 external
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
2015-03-24 15:45:23 -07:00
Stephen Hemminger 61333d2442 update headers files for net-next
Use sanitized headers from 4.0.0-rc3
2015-03-24 15:45:23 -07:00
Daniel Borkmann 51cf36756c tc: m_bpf: fix next arg selection after tc opcode
Next argument after the tc opcode/verdict is optional, using NEXT_ARG()
requires to have another argument after that one otherwise tc will bail
out. Therefore, we need to advance to the next argument manually as done
elsewhere.

Fixes: 86ab59a666 ("tc: add support for BPF based actions")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
2015-03-24 15:39:53 -07:00
Vadim Kochan 599fc319eb man ip-netns: Fix syntax in default ns process, indent's
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-24 15:14:53 -07:00
Vadim Kochan d59102975e man ip-link: Add ip-netns(8) in 'SEE ALSO'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-24 15:13:45 -07:00
Roopa Prabhu 106ca2779e lib utils: fix family during af_bit_len calculation
commit f3a2ddc124 ("lib utils: Use helpers to get AF bit/byte len")
used a wrong family or family of zero in the default case
during af_bit_len calculation causing ip route commands to
fail with below error

Error: an inet prefix is expected rather than "10.0.2.14/24".

Reported-by: Sven-Haegar Koch <haegar@sdinet.de>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-03-24 15:03:35 -07:00
philipp@redfish-solutions.com 6f4cad9120 xfrm: Fix -o (oneline) being broken in xfrm and correct mark radix
Don't insert newline in -o (oneline) mode; print mark as hex.

Oneline mode is supposed to force all output to be on oneline and
machine-parsable, but this isn't the case for "ip xfrm" as shown:

% ip -o xfrm monitor
...
src 0.0.0.0/0 dst 0.0.0.0/0 \   dir out priority 2051 ptype main \  mark -1879048191/0xffffffff
    tmpl src 203.0.130.10 dst 198.51.130.30\        proto esp reqid 16384 mode tunnel\
...

as that's 2 lines, not one. Also, the "mark" is shown in signed
decimal, but the mask is in hex. This is confusing: let's use
hex for both.

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
2015-03-24 15:01:20 -07:00
Roopa Prabhu 42ecedd4ba fix ip -force -batch to continue on errors
This patch replaces exits with returns in several
iproute2 commands. This fixes `ip -batch -force`
to not exit but continue on errors.

$cat c.txt
route del 1.2.3.0/24 dev eth0
route del 1.2.4.0/24 dev eth0
route del 1.2.5.0/24 dev eth0
route add 1.2.3.0/24 dev eth0

$ip -force -batch c.txt
RTNETLINK answers: No such process
Command failed c.txt:2
RTNETLINK answers: No such process
Command failed c.txt:3

Reported-by: Sven-Haegar Koch <haegar@sdinet.de>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-03-24 14:59:40 -07:00
Andy Gospodarek 822e9609e7 bridge: drop reference to unused option embedded from manpage
While looking at the manpage, I noticed a reference to 'embedded' that was
added by this commit:

	commit d611682a8c
	Author: John Fastabend <john.r.fastabend@intel.com>
	Date:   Thu Sep 13 23:50:36 2012 -0700

	    iproute2: bridge: finish removing replace option in man pages

I no longer see any reference to the 'embedded' option in any c- or h-files, so
it seems worthwhile to remove.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: John Fastabend <john.r.fastabend@intel.com>
2015-03-24 14:54:53 -07:00
Mark Einon 473544d96d ip: Make uniform the use of synonyms list, show and lst
Where used in the ip tool, the 'show' option always has the synonyms
'list' and 'lst', except for ip-token and ip-addrlabel, which are missing
'lst'. Add this as a synonym for these commands.

Signed-off-by: Mark Einon <mark.einon@gmail.com>
2015-03-24 14:49:21 -07:00
Vadim Kochan 4612d04d6b tc class: Show class names from file
It is possible to use class names from file /etc/iproute2/cls_names
which tc will use when showing class info:

    # tc/tc -nm class show dev lo
	class htb 1:10 parent 1:1 leaf 10: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb web#1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb voip#1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

or to specify via file path:

    # tc/tc -nm -cf /tmp/cls_names class show dev lo

Class names file contains simple "maj:min  name" structure:

1:20    web
1:40    voip

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:27:40 -07:00
Vadim Kochan d116ff3414 ip netns: Fix rtnl error while print netns list
Observed on the Linux 3.18:

    # ip netns
    RTNETLINK answers: Operation not supported
    net0

CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: d182ee1307 ("ipnetns: allow to get and set netns ids")
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:17:34 -07:00
Vadim Kochan f3a2ddc124 lib utils: Use helpers to get AF bit/byte len
Added funcs to get AF_XXX len in bit/bytes and replace
places where switch(AF_XXX) is used for this.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:15:19 -07:00
Eric Dumazet 2e7e805d0a ss: better 32bit support
Socket cookies are 64bit, even if ss happens to be
a 32bit binary, running on a 64 bit host.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-03-15 12:11:43 -07:00
Vadim Kochan 7871f7dbf0 ss: Allow to specify sport/dport without ':'
Ugly change but it allows to specify sport/dport w/o ':'

    # ss dport = 80 and sport = 44862

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:11:42 -07:00
Vadim Kochan ee9b34778c man ip-netns: Notice about loose device when do 'del'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:10:21 -07:00
Vadim Kochan 8ce21c6b93 man tc: Highlight minor & major, notice they are hex
Also added some trivial form of the ID as "major:minor",
just for visualisation of explained words.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:10:21 -07:00
Vadim Kochan 032b4f4d19 man ip-link: Add short description about 'group'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:10:21 -07:00
Vadim Kochan 36324eba37 man ip-link: Add notice about local netns devices
Added some clarification why 'ip link set netns' can not
change network namespace for some kind of devices.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-27 19:03:26 -08:00
Daniel Borkmann 32caee9fc7 m_bpf: remove unrelevant help lines
Left-overs when copying this over from cls_bpf. ;) Lets remove them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
2015-02-27 19:00:51 -08:00
Ebben Aries 409998c5a4 iproute: ip-gue/ip-fou manpages
Add missing GUE/FOU manpages to Makefile

Signed-off-by: Ebben Aries <exa@fb.com>
2015-02-27 18:59:27 -08:00
Roopa Prabhu 22a98f5140 bridge link: add support to specify master
This patch adds support to specify 'master' keyword,
to target a bridge link command explicitly to the software
bridge driver.

Adds self/master keywords to usage and man page

v2:
	fix usage to say (self and master) and not (self or master)

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-02-27 18:58:04 -08:00
Vadim Kochan 34c8a95cd7 man ip-link: Add short info about 'dynamic' flag
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-27 18:54:44 -08:00
Hagen Paul Pfeifer b5024ee1ed ss: group DCTCP socket statistics
Keep ss output consistent and format DCTCP socket statistics similar to skmen
and timer where a group of logical values are grouped by brackets. This makes
parser scripts *and* humans more happy.

Current output of 'ss -inetm dst :80':
ESTAB       0      0 192.168.11.14:55511 173.194.66.189:443
        timer:(keepalive,14sec,0) uid:1000 ino:428768
        sk:ffff88020ceb5b00 <-> skmem:(r0,rb372480,t0,tb87040,f0,w0,o0,bl0)
        ts sack wscale:7,7 rto:250 rtt:49.225/20.837 ato:40 mss:1408 cwnd:10
        ce_state 23 alpha 23 ab_ecn 23 ab_tot 23 send 2.3Mbps
        lastsnd:121026 lastrcv:121026 lastack:30850 pacing_rate 4.6Mbps
        retrans:0/2 rcv_rtt:40.416 rcv_space:2920

New grouped output:
ESTAB       0      0 192.168.11.14:55511 173.194.66.189:443
        timer:(keepalive,14sec,0) uid:1000 ino:428768
        sk:ffff88020ceb5b00 <-> skmem:(r0,rb372480,t0,tb87040,f0,w0,o0,bl0)
        ts sack wscale:7,7 rto:250 rtt:49.225/20.837 ato:40 mss:1408 cwnd:10
        dctcp(ce_state:23,alpha:23,ab_ecn:23,ab_tot:23) send 2.3Mbps
        lastsnd:121026 lastrcv:121026 lastack:30850 pacing_rate 4.6Mbps
        retrans:0/2 rcv_rtt:40.416 rcv_space:2920

Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2015-02-24 15:59:44 -08:00
Lennart Sorensen c9ae9bae6e Fix misspelling of defrag in ip-l2tp.8 2015-02-24 15:59:44 -08:00
Nicolas Dichtel 2dd5909d9d ip-monitor: allow to monitor ip rules
Now done by default or with 'ip monitor rule'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-24 15:59:44 -08:00
Vadim Kochan 5f24ec0e06 ss: Skip filtered netlink sockets before detailed info
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-24 15:59:44 -08:00
Vadim Kochan 29999b0ff2 ss: Add filter before printing unix stats from Netlink
Detailed info can be printed if filter should not pass
the socket info.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-24 15:59:44 -08:00
Alex Pilon 6b8c871dc1 Allow specifying bridge port STP state by name rather than number.
The existing behaviour forces one to memorize the integer constants for
STP port states.

    # bridge link set dev dummy0 state 3

This patch makes it possible to use the lowercased port state name.

    # bridge link set dev dummy0 state forwarding

Invalid non-integer inputs now cause exit with status -1.

Signed-off-by: Alex Pilon <alp@alexpilon.ca>
2015-02-24 15:59:44 -08:00
Nicolas Dichtel a4797670d3 bridge/fdb: display link netns id
When this attribute is set, it means that the i/o part of the related netdevice
is in another netns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-21 16:54:53 -08:00
Nicolas Dichtel ccdcbf35f1 iplink: add support of IFLA_LINK_NETNSID attribute
This new attribute is now advertised by the kernel for x-netns interfaces.
It's also possible to set it when an interface is created (and thus creating a
x-netns interface with one single message).

Example:
 $ ip netns add foo
 $ ip netns add bar
 $ ip -n foo netns set bar 15
 $ ip -n foo link add ipip1 link-netnsid 15 type ipip remote 10.16.0.121 local 10.16.0.249
 $ ip -n foo link ls ipip1
 3: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
     link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 15

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-21 16:54:53 -08:00
Nicolas Dichtel d182ee1307 ipnetns: allow to get and set netns ids
The kernel now provides ids for peer netns. This patch implements a new command
'set' to assign an id.
When netns are listed, if an id is assigned, it is now displayed.

Example:
 $ ip netns add foo
 $ ip netns set foo 1
 $ ip netns
 foo (id: 1)
 init_net

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-21 16:54:53 -08:00
Vadim Kochan c16298bea0 ip xfrm mon: Add objects list to the usage output
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:43:23 -08:00
Vadim Kochan 5bf9f5c5a0 ip xfrm: Allow to specify "all" option for monitor
Just to be aligned with the usage output.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:43:23 -08:00
Bryton Lee a221d621bb prevent the read ahead of /proc/slabinfo in ss
Signed-off-by: Bryton Lee <brytonlee01@gmail.com>
2015-02-21 16:41:41 -08:00
Vadim Kochan 11ba90fcbd ss: Fixed wrong tcp ato value from netlink
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan b217df108c ss: Unify socket address output by one generic func
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan f1b39e1bd6 ss: Unify details info output:ino,uid,sk
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan 2d791bc87c ss: Unify state socket output:netid, state, rq, wq
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan ec4d0d8a9d ss: Replace unixstat struct by new sockstat struct
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan 89f634f917 ss: Replace pktstat struct by new sockstat struct
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan 055840f27f ss: Split tcpstap struct to sockstat & tcpstat
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Vadim Kochan 1527a17ed8 ss: Fix filter expression parser
Seems expression parser did not work correctly some
long time and such simple things did not work too:

    # ss -a '( sport = :ssh )'

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-21 16:40:26 -08:00
Stephen Hemminger 1f01dd89f5 update headers to 3.20-rc1
Add net_namespace.h and update other headers
2015-02-20 16:58:45 -08:00
Stephen Hemminger 823d587e79 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 into net-next 2015-02-12 06:08:17 -08:00
Stephen Hemminger 3a641f531e Merge branch 'net-next' 2015-02-10 15:20:57 -08:00
Stephen Hemminger 46d364fe8f v3.19.0 2015-02-10 15:14:32 -08:00
Vadim Kochan 95ce04bc86 ss: Show stats from specified network namespace
Added new '-N NSNAME, --net=NSNAME' option to show socket stats
from the specified network namespace name.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-10 15:11:59 -08:00
Stephen Hemminger ebd58d19f0 Merge branch 'master' into net-next 2015-02-05 10:56:06 -08:00
Tom Herbert 90f1df715e iproute: Descriptions of fou and gue options in ip-link man pages
Add section for additional arguments to GRE, IPIP, and SIT types
that are related to Foo-over-UDP and Generic UDP Encapsulation.
Also, added an example GUE configuration in the examples section.

Signed-off-by: Tom Herbert <therbert@google.com>
2015-02-05 10:55:43 -08:00
Stephen Hemminger 41d46674cb Merge branch 'master' into net-next 2015-02-05 10:51:36 -08:00
Tom Herbert 858dbb208e ip link: Add support for remote checksum offload to IP tunnels
This patch adds support to remote checksum checksum offload
confinguration for IPIP, SIT, and GRE tunnels. This patch
adds a [no]encap-remcsum to ip link command which applicable
when configured tunnels that use GUE.

http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00

Example:

ip link add name tun1 type gre remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap fou encap-sport auto encap-dport 7777 encap-csum \
   encap-remcsum

This would create an GRE tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet),
checksums in the encapsulating UDP header are enabled (needed.for
remote checksum offload), and remote checksum ffload is configured to
be used on the tunnel (affects TX side).

Signed-off-by: Tom Herbert <therbert@google.com>
2015-02-05 10:50:02 -08:00
Stephen Hemminger 9ca23a5995 Merge branch 'master' into net-next 2015-02-05 10:48:19 -08:00
Roopa Prabhu a2f7934dd0 iproute2: bridge vlan show new option to print ranges
Introduce new option -c[ompressvlans] to request
vlan ranges from kernel

(pls suggest better option names if this does not look ok)

$bridge vlan show
port	vlan ids
dummy0	 1 PVID Egress Untagged

dummy1	 1 PVID Egress Untagged
	 2
	 3
	 4
	 5
	 6
	 7
	 9
	 10
	 12

br0	 1 PVID Egress Untagged

$bridge help
Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }
where  OBJECT := { link | fdb | mdb | vlan | monitor }
       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |
                    -o[neline] | -t[imestamp] | -n[etns] name |
                    -c[ompressvlans] }
$bridge -c vlan show
port	vlan ids
dummy0	 1 PVID Egress Untagged

dummy1	 1 PVID Egress Untagged
	 2-7
	 9-10
	 12

br0	 1 PVID Egress Untagged

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2015-02-05 10:46:31 -08:00
Roopa Prabhu 3ac0d36ddd iproute2: bridge: support vlan range adds
This patch adds vlan range support to bridge add command
using the newly added vinfo flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END.

$bridge vlan show
port    vlan ids
br0      1 PVID Egress Untagged

dummy0   1 PVID Egress Untagged

$bridge vlan add vid 10-15 dev dummy0
port    vlan ids
br0      1 PVID Egress Untagged

dummy0   1 PVID Egress Untagged
         10
         11
         12
         13
         14
         15

$bridge vlan del vid 14 dev dummy0

$bridge vlan show
port    vlan ids
br0      1 PVID Egress Untagged

dummy0   1 PVID Egress Untagged
         10
         11
         12
         13
         15

$bridge vlan del vid 10-15 dev dummy0

$bridge vlan show
port    vlan ids
br0      1 PVID Egress Untagged

dummy0   1 PVID Egress Untagged

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
2015-02-05 10:46:31 -08:00
Jiri Pirko 86ab59a666 tc: add support for BPF based actions
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jiri Pirko 1d129d191a tc: push bpf common code into separate file
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Oliver Hartkopp 82a307e835 can: Add support for CAN FD non-ISO feature
This patch makes CAN_CTRLMODE_FD_NON_ISO netlink feature configurable.

During the CAN FD standardization process within the ISO it turned out that
the failure detection capability has to be improved.

The CAN in Automation organization (CiA) defined the already implemented CAN
FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937

Starting with the - currently non-ISO - driver for M_CAN v3.0.1 introduced in
Linux 3.18 this bit needs to be propagated to userspace. In future drivers this
bit will become configurable depending on the CAN FD controllers capabilities.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2015-02-05 10:35:24 -08:00
Stephen Hemminger c5ecc59f10 Merge branch 'master' into net-next 2015-02-05 10:33:13 -08:00
Thomas Graf 2eb90dc762 vxlan: Group policy extension
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2015-02-05 10:31:43 -08:00
Andreas Henriksson 5e5055bc26 iproute2/ip: fix up filter when printing addresses
"ip addr show up" would exclude the interface (link), but include the
addresses of down interfaces (which looked like they where indented
under a different interface). This fixes the filtering.

For a full example see the original bug report at:
http://bugs.debian.org/776040

Reported-by: Paul Slootman <paul@debian.org>
CC: 776040@bugs.debian.org
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2015-02-05 10:30:29 -08:00
Vadim Kochan 3372493909 ip netns: Delete all netns
Allow delete all namespace names by:

    $ ip -all netns del

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:28:19 -08:00
Vadim Kochan b13ba03f54 ip netns: Allow exec on each netns
This change allows to exec some cmd on each
named netns (except default) by specifying '-all' option:

    # ip -all netns exec ip link

Each command executes synchronously.

Exit status is not considered, so there might be a case
that some CMD can fail on some netns but success on the other.

EXAMPLES:

1) Show link info on all netns:

$ ip -all netns exec ip link

netns: test_net
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether 1a:19:6f:25:eb:85 brd ff:ff:ff:ff:ff:ff

netns: home0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether ea:1a:59:40:d3:29 brd ff:ff:ff:ff:ff:ff

netns: lan0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
    link/ether ce:49:d5:46:81:ea brd ff:ff:ff:ff:ff:ff

2) Set UP tap0 device for the all netns:

$ ip -all netns exec ip link set dev tap0 up

netns: test_net

netns: home0

netns: lan0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:28:19 -08:00
Vadim Kochan e998e118dd lib: Exec func on each netns
Added possibility to run some func on each netns.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:28:19 -08:00
Stephen Hemminger 8c58d4036b update kernel headers based on net-next 3.21
Pull in headers from later tree
2015-02-05 10:20:58 -08:00
Stephen Hemminger 668dfab274 Merge branch 'master' into net-next 2015-02-05 10:20:10 -08:00
Stephen Hemminger 4c7d75de95 can: update kernel header
Sanitized header from upstream 3.20-rc kernel
2015-02-05 10:17:50 -08:00
Vadim Kochan 8250bc9ff4 ss: Unify inet sockets output
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:16:25 -08:00
Vadim Kochan db08bdb816 ss: Unify meminfo output
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:16:25 -08:00
Vadim Kochan 4cec9db0b4 tests: Add few 'ip link' related tests
Added two tests which checks the following fixed issues:

    1) Bug when not possible add new virtual interface via:

        $ ip link add dev XXX type

       It was fixed a few releases ago.

    2) Crash on older kernels when VF rate info does not exist:

        $ ip link show

       Used dump file from William Dauchy <william@gandi.net>:
           testsuite/tests/ip/link/dev_wo_vf_rate.nl

       So 'ip link show' replaced by 'ip -d monitor file ...' which does
       the same thing.

Also added new func in testsuite/lib/generic.sh to gen new random dev name.

Added 'clean' dependency on running all tests.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:16:25 -08:00
Vadim Kochan f42a457470 ss: Filter inet dgram sockets with established state by default
As inet dgram sockets (udp, raw) can call connect(...)  - they
might be set in ESTABLISHED state. So keep the original behaviour of
'ss' which filtered them by ESTABLISHED state by default. So:

    $ ss -u

    or

    $ ss -w

Will show only ESTABLISHED UDP sockets by default.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-02-05 10:15:24 -08:00
Nicolas Dichtel 1ff6b16e2d lib: fix setns() function when !HAVE_SETNS
When HAVE_SETNS is not set, iproute2 provides a local implementation of this
function based on __NR_setns.
This macro is defined in sys/syscall.h, which was not included, thus the local
implementation always returned -1.

CC: Vadim Kochan <vadim4j@gmail.com>
Fixes: eb67e4498a ("lib: Add netns_switch func for change network namespace")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-05 10:11:51 -08:00
Nicolas Dichtel ffff693130 lib: fix warning in namespace.h
Warning was:
In file included from bridge.c:16:0:
../include/namespace.h:33:12: warning: ‘setns’ defined but not used [-Wunused-function]

CC: Vadim Kochan <vadim4j@gmail.com>
Fixes: eb67e4498a ("lib: Add netns_switch func for change network namespace")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-02-05 10:11:35 -08:00
Reese Moore d36d9d41d6 iproute2: ip-link.8.in: Spelling fixes
In the ip-link(8) man page, for the gretap, ip6gre, and ip6gretap types, the
word tunnel was incorrectly spelled 'tuunel'.

Signed-off-by: Reese Moore <ram@vt.edu>
2015-02-05 10:10:15 -08:00
Stephen Hemminger 4dacfdcf4d update to lateset net-next headers 2015-01-28 14:30:45 +00:00
Stephen Hemminger be515305a3 Merge branch 'master' into net-next 2015-01-28 14:30:37 +00:00
Stephen Hemminger 0575fa22e5 update kernel kernel headers from 3.19-rc 2015-01-28 14:28:33 +00:00
Stephen Hemminger 542b0cc759 neighbor: check return values
Need to check for invalid address and buffer overrun in ip neigh
command with invalid paramters.
2015-01-13 18:07:23 -08:00
Stephen Hemminger 242a9f73b6 Merge branch 'master' into net-next 2015-01-13 17:43:45 -08:00
Daniel Borkmann 6ef87f9cce ip: route: add congestion control metric
This patch adds configuration and dumping of congestion control metric
for ip route, for example:

  ip route add <dst> dev foo congctl [lock] dctcp

Reference: http://thread.gmane.org/gmane.linux.network/344733
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
2015-01-13 17:40:49 -08:00
Stephen Hemminger f233410d20 update kernel headers to 3.19 net-next 2015-01-13 17:39:32 -08:00
Vadim Kochan c3087c10f1 netns: Rename & move get_netns_fd to lib
Renamed get_netns_fd -> netns_get_fd and moved to
lib/namespace.c

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-13 17:34:47 -08:00
Vadim Kochan ddb1129b75 Use one func to print timestamp from nlmsg
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-13 17:34:47 -08:00
Vadim Kochan 27b14f2e87 Add define for nlmsg_types with timestamp
Add #define for nlmsg_type = 15

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-13 17:34:47 -08:00
Vadim Kochan ff041f1619 ss: Usage filter state names, options alignment
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-13 17:29:17 -08:00
Vadim Kochan ace5cb31b1 ss: Fix case when UDP is printed as ipproto-xxx
When 'ss' prints UDP sockets info together with RAW sockets
e.g.:

    $ ss -a

then UDP sockets are resolved as "ipproto-xxx".

It was caused that dg_proto was set after printing UDP
socket info from netlink. So fixed issue by moving
setting dg_proto before printing info from Netlink.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-13 17:29:17 -08:00
Vadim Kochan 8c29ae7cc2 ip link: Fix crash on older kernels when show VF dev
The issue was caused that ifla_vf_rate does not exist on
older kernels and should be checked if it exists as nested attr.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Reported-by: William Dauchy <william@gandi.net>
Tested-by: William Dauchy <william@gandi.net>
2015-01-13 17:22:44 -08:00
Jamal Hadi Salim 564663b4ca actions: Get vlan action to work in pipeline
When specified in a graph such as:
action vlan ... action foobar
the vlan action chewed more than it can swallow

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-01-13 17:22:44 -08:00
Jiri Pirko ee0067a918 iplink: print out addrgenmode attribute
addrgenmode is currently write only by ip. So display this information
if provided by kernel as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-01-13 17:22:44 -08:00
Vadim Kochan 9db7bf15e2 ss: Filtering logic changing, with fixes
This patch fixes some filtering combinations issues which does not
work on the 'master' version:

    $ ss -4
    shows inet & unix sockets, instead of only inet sockets

    $ ss -u
    needs to specify 'state closed'

    $ ss src unix:*X11*
    needs to specify '-x' shortcut for UNIX family

    $ ss -A all
    shows only sockets with established states

There might some other issues which was not observed.

Also changed logic for calculating families, socket types and
states filtering. I think that this version is a little simpler
one. Now there are 2 predefined default tables which describes
the following maping:

    family  -> (states, dbs)
    db      -> (states, families)

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-07 15:14:19 -08:00
Vadim Kochan 4a0053b606 ss: Unify packet stats output from netlink and proc
Refactored to use one func for output packet stats info
from both /proc and netlink.

Added possibility to get packet stats info from /proc
by setting environment variable PROC_ROOT or PROC_NET_PACKET.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-07 15:13:29 -08:00
Vadim Kochan bf4ceee6ae ss: Unify unix stats output from netlink and proc
Refactored to use one func for output unix stats info
from both /proc and netlink.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-07 15:13:29 -08:00
Jiri Pirko decbb4378c libnetlink: add parse_rtattr_one_nested helper
Sometimes, it is more convenient to get only one specific nested attribute by
type. For example for IFLA_AF_SPEC where type is address family (AF_INET6).
So add this helper for this purpose.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-01-07 15:11:35 -08:00
Stephen Hemminger dd8fac8cee fix spelling of Kuznetsov
Suggested by Vadim Kochan
2015-01-03 09:58:41 -08:00
Scott Feldman 674bb438bc bridge/link: add learning_sync policy flag
v2:

Resending now that the dust has cleared in 3.18 on "self" vs. hwmode debate for
brport settings.  learning_sync is now set/cleared using "self" qualifier on
brport.

v1:

Add 'learned_sync' flag to turn on/off syncing of learned MAC addresses from
offload device to bridge's FDB.   Flag is be set/cleared on offload device port
using "self" qualifier:

  $ sudo bridge link set dev swp1 learning_sync on self

  $ bridge -d link show dev swp1
  2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 2
      hairpin off guard off root_block off fastleave off learning off flood off
  2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0
      learning on learning_sync on

Adds new IFLA_BRPORT_LEARNED_SYNCED attribute for IFLA_PROTINFO on the SELF
brport.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
2015-01-01 10:02:53 -08:00
Vadim Kochan b93fe57840 man ss: Add state filter description
Stolen from generated doc/ss.html
Also added reference to RFC 793 for TCP states.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-01 10:01:06 -08:00
Vadim Kochan d9d1d1fae1 man tc: Add description for -graph option
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-01 09:57:09 -08:00
Vadim Kochan a925535c5d ip: Small corrections of '-tshort' option in usage
Fixed -t[short] to -ts[hort] as '-t' is related to
-timestamp option.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-01-01 09:56:43 -08:00
Vadim Kochan 67e1d73be1 tc: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS tc OPTIONS COMMAND OBJECT

    to

    tc -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    tc -net vnet0 qdisc

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:34 -08:00
Vadim Kochan 527910c801 bridge: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS bridge OPTIONS COMMAND OBJECT

    to

    bridge -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    bridge -net vnet0 fdb

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:32 -08:00
Vadim Kochan 52700d40a2 ip: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS ip OPTIONS COMMAND OBJECT

    to

    ip -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    ip -net vnet0 link add br0 type bridge
    ip -n vnet0 link

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:29 -08:00
Vadim Kochan eb67e4498a lib: Add netns_switch func for change network namespace
New netns_switch func moved to the lib/namespace.c from ip/ipnetns.c
so it can be used from the other tools for fast switching
network namespace.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:27 -08:00
Vadim Kochan 486ccd99a0 ss: Use rtnl_dump_filter for inet_show_netlink
Just another refactoring for ss to use rtnl API from lib

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:21:10 -08:00
Vadim Kochan 417b2180a5 man ip-link: Small example of 'ip link show master'
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:19:59 -08:00
Vadim Kochan d954b34a1f tc class: Show classes as ASCII graph
Added new '-g[raph]' option which shows classes in the graph view.

Meanwhile only generic stats info output is supported.

e.g.:

$ tc/tc -g class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

$ tc/tc -g -s class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |          rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    |     rate 0bit 0pps backlog 0b 0p requeues 0
|    |    |
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |               rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|               rate 0bit 0pps backlog 0b 0p requeues 0
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |    rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
                Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
                rate 0bit 0pps backlog 0b 0p requeues 0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:16:51 -08:00
Heiner Kallweit 18c8bbe3db ip: extend "ip-address" man page to reflect the recent flag extensions
Extend "ip-address" man page to reflect the recent extension of
allowing to list addresses with flags tentative, deprecated, dadfailed
not being set.

Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
2014-12-27 10:15:57 -08:00
Roopa Prabhu 6fdb465869 bridge link: add option 'self'
Currently self is set internally only if hwmode is set.
This makes it necessary for the hw to have a mode.
There is no hwmode really required to go to hardware. So, introduce
self for anybody who wants to target hardware.

v1 -> v2
    - fix a few bugs. Initialize flags to zero: this was required to
    keep the current behaviour unchanged.

v2 -> v3
    - fix comment

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
2014-12-24 12:29:46 -08:00
Duan Jiong a1e2e5fcee ip link: use addattr_nest()/addattr_nest_end()
Use addattr_nest() and addattr_nest_end() to simplify the code.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
2014-12-24 12:26:05 -08:00
Stephen Hemminger 5c2c10b17e Merge branch 'net-next' 2014-12-24 12:23:00 -08:00
Stephen Hemminger bfbccea783 v3.18.0 2014-12-24 12:20:49 -08:00
Vadim Kochan 712249d8fa ip link: Show devices by type
Added new option 'type' to 'ip link show'
command which allows to filter devices by type:

    ip link show type bridge
    ip link show type vlan

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-24 12:19:14 -08:00
Heiner Kallweit b5f39b2588 ip: allow ip address show to list addresses with certain flags not being set
Sometimes it's needed to have "ip address show" list only addresses
with certain flags not being set, e.g. in network scripts.
As an example one might want to exclude addresses in "tentative"
or "deprecated" state.

Support listing addresses with flags tentative, deprecated, dadfailed
not being set by prefixing the respective flag with a minus.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
2014-12-24 12:16:31 -08:00
Vadim Kochan 79aa79d058 ip lib: Added shorter timestamp option
Added another timestamp format to look like more logging info:

[2014-12-22T22:36:50.489 ] 2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default
    link/ether 3c:97:0e:a3:86:2e brd ff:ff:ff:ff:ff:ff

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-24 12:07:36 -08:00
Stephen Hemminger 3d0b7439df whitespace cleanup
Remove all trailing whitespace and space before tabs.
2014-12-20 15:47:17 -08:00
Vadim Kochan b9ea445d52 ss: Dont show netlink and packet sockets by default
Checking by SS_CLOSE state was remowed in:

    (45a4770bc0) ss: Remove checking SS_CLOSE state for packet and netlink

which is not really correct because now by default all sockets are seen
when do 'ss'.

Here is most correct fix which considers specified family.

To see netlink sockets:
    ss -A netlink

To see packet sockets:
    ss -A packet

And ss by default will show only connected/established sockets as it
was before all the time.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-20 15:43:56 -08:00
Stephen Hemminger 093f18fd7a update kernel headers to 3.19-rc1 2014-12-20 12:22:01 -08:00
Stephen Hemminger effdfc9e87 Merge branch 'master' into net-next 2014-12-20 12:18:14 -08:00
vadimk 8a4025f6a4 ss: Use rtnl_dump_filter in handle_netlink_request
Replaced handling netlink messages by rtnl_dump_filter
from lib/libnetlink.c, also:

    - removed unused dump_fp arg;
    - added MAGIC_SEQ #define for 123456 seq id;
    - silently exit if ENOENT errno is caused for NETLINK_SOCK_DIAG proto
        in lib/libnetlink.c: rtnl_duml_filter_l(...) function. This fix
        was added in a3fd8e58c1 by Eric
        for misc/ss.c

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-20 12:17:02 -08:00
Stephen Hemminger 8a504fc356 resolve header file conflict betwen linux/in6.h and netinet/in.h
Go back to kernel version of if_bridge.h and use patched
version of linux/in6.h and libc-compat.h
2014-12-20 12:14:30 -08:00
Stephen Hemminger b0d30f7f3f rt_names can't be const
Needs to be built at runtime.
2014-12-20 11:36:54 -08:00
vadimk b00daf6a83 ss: Use nl_proto_a2n for filtering by netlink proto
Now it is posible to filter by existing Netlink protos:

    ss -A netlink src uevent
    ss -A netlink src nft
    ss -A netlink src genl

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-09 20:39:33 -08:00
vadimk f00073e8b9 lib names: Add helper func for parse id and name from file
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-09 20:38:02 -08:00
vadimk 4e5615b34c lib names: Use CONFDIR for specify 'group' file path
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-09 20:36:03 -08:00
Nikita Edward Baruzdin d26caee7e9 iproute2: Add support for CAN presume-ack feature
This patch makes CAN_CTRLMODE_PRESUME_ACK netlink feature configurable.
When enabled, the feature sets CAN controller in mode in which
acknowledgement absence is ignored.

Signed-off-by: Nikita Edward Baruzdin <nebaruzdin@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
2014-12-09 20:34:43 -08:00
Eric Dumazet d471791427 iproute2/nstat: Bug in displaying icmp stats
On Fri, 2014-12-05 at 17:13 -0800, Eric Dumazet wrote:

> I guess we could count number of spaces/fields in both lines,
> and disable the iproute2 trick if counts match.

Something like that maybe ?

 misc/nstat.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2014-12-09 20:33:32 -08:00
vadimk d68e00f704 ss: Fix layout/output issues introduced by regression
This patch fixes the following issues which was introduced by me in commits:

    #1 (2dc854854b) ss: Fixed broken output for Netlink 'Peer Address:Port' column
    ISSUE: Broken layout when all sockets are printed out

    #2 (eef43b5052) ss: Identify more netlink protocol names
    ISSUE: Protocol id is not printed if 'numbers only' output was specified (-n)

Also aligned the width of the local/peer ports to be more wider.

I tested with a lot of option combinations (I may miss some test cases),
but layout seems to me better than the previous released version of iproute2/ss.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-09 20:31:12 -08:00
Stephen Hemminger 9d2c16438c if_bridge: remove in6.h
Adding in6.h breaks build with redefined values.
2014-12-09 20:19:26 -08:00
vadimk 6fcabac5e0 ip monitor: Fix issue when timestamp is printed w/o msg
The issue was observed when IPv6 router broadcasted NDUSEROPT
messages which are not handled by monitor and caused printing
'Timestamps' w/o message because such kind of rtnl messages is not
handled by monitor.

As 'ip monitor' by default subscribes to the all mcast rtnl groups except
RTGRP_TC then all messages of these rtnl groups which are not handled by
monitor may cause such issues.

Fixed by subscribing by default to rtnl mcast groups which are
supported by 'ip monitor'.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-09 20:17:29 -08:00
Nicolas Dichtel 2ec28933b6 ipaddress: enable -details option
This option was used only for 'ip link', but it can be useful to have it for
'ip address'. Thus it is possible to display link details and addresses with one
command.

Example:
$ ip -d a ls dev gre1
9: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1468 qdisc noqueue state UNKNOWN group default
    link/gre 10.16.0.249 peer 10.16.0.121 promiscuity 0
    gre remote 10.16.0.121 local 10.16.0.249 ttl inherit ikey 0.0.0.10 okey 0.0.0.10 icsum ocsum
    inet 192.168.0.249 peer 192.168.0.121/32 scope global gre1
       valid_lft forever preferred_lft forever
    inet6 fe80::5efe:a10:f9/64 scope link
       valid_lft forever preferred_lft forever

Suggested-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-12-09 20:17:01 -08:00
Nicolas Dichtel 2ea49a3804 ipaddress: enable -details option
This option was used only for 'ip link', but it can be useful to have it for
'ip address'. Thus it is possible to display link details and addresses with one
command.

Example:
$ ip -d a ls dev gre1
9: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1468 qdisc noqueue state UNKNOWN group default
    link/gre 10.16.0.249 peer 10.16.0.121 promiscuity 0
    gre remote 10.16.0.121 local 10.16.0.249 ttl inherit ikey 0.0.0.10 okey 0.0.0.10 icsum ocsum
    inet 192.168.0.249 peer 192.168.0.121/32 scope global gre1
       valid_lft forever preferred_lft forever
    inet6 fe80::5efe:a10:f9/64 scope link
       valid_lft forever preferred_lft forever

Suggested-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-12-09 20:13:21 -08:00
Scott Feldman 28467b7f3f bridge/fdb: add flag/indication for FDB entry synced from offload device
Add NTF_EXT_LEARNED flag to neigh flags to indicate FDB entry learned by
device has been learned externally to bridge FDB.  For these entries,
add "external" annotation in bridge fdb show output:

  00:02:00:00:03:00 dev swp2 used 2/2 master br0 external
  00:02:00:00:03:00 dev swp2 self permanent

In the example above, 00:02:00:00:03:00 is shown twice on dev swp2.  The
first entry if from the bridge (master) and is marked as "external" by
the offload device.  The second entry is from the brport offload device (self),
and was learned by the device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-12-09 16:34:44 -08:00
Stephen Hemminger c9b8aef6ae Merge branch 'master' into net-next 2014-12-09 16:33:59 -08:00
Scott Feldman 85c1807f16 bridge/fdb: fix statistics output spacing
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-09 16:29:27 -08:00
Florian Westphal 29d1f730b8 ip route: enable per-route ecn settings via 'features' option
This permits to selectively enable explicit congestion notification via
the routing table.

If this ecn feature is not set, the kernel will use the tcp_ecn sysctl
to decide wheter to use ECN when establising a TCP connection.

At the time of this writing, the kernel supports ecn and allfrags, but
allfrags is of dubious value and not implemented here.

Example:

ip route change 192.168.2.0/24 dev eth0 features ecn

Signed-off-by: Florian Westphal <fw@strlen.de>
2014-12-09 16:26:39 -08:00
Stephen Hemminger 5a311b0bec need libc-compat.h for new in6.h
The header wars continue...
2014-12-05 12:47:34 -08:00
Stephen Hemminger 69fdff1fdb add local version of linux/in6.h
Need this header file to avoid build issues on older systems
like Debian 7
2014-12-05 12:16:36 -08:00
Stephen Hemminger f66611d823 ip-link: fix unterminated string in manpage
Missing "
2014-12-03 19:35:36 -08:00
Stephen Hemminger b2e116d6c3 tc: minor spelling fixes 2014-12-03 19:28:34 -08:00
Stephen Hemminger 14e9767330 tunnel: decode ESP tunnel type
Add ESP to decode switch.
2014-12-03 19:08:41 -08:00
Stephen Hemminger 9de4c6e9e9 rt_dsfield: fix Expedited Forwarding PHB
RFC 2598 defines Expedited Forwarding in section 2.3
   Codepoint 101110 is recommended for the EF PHB.
which translates to B8 as encoded in rt_dsfield
2014-12-03 18:50:59 -08:00
Mahesh Bandewar 81eaf677f9 ip link: Add ipvlan support to the iproute2/ip util
Adding basic support to create virtual devices using 'ip'
utility. Following is the syntax -

	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
	e.g. ip link add link eth0 ipvl0 type ipvlan mode l3

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
2014-12-03 09:37:37 -08:00
Jiri Pirko 8b1c0216d8 tc: add support for vlan tc action
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Cong Wang <cwang@twopensource.com>
2014-12-03 09:29:21 -08:00
Stephen Hemminger 1c0986873e update kernel headers to net-next (3.18-rc6)
Early merge of upstream headers
2014-12-03 09:27:43 -08:00
vadimk 8322d28dca man ip-link: Fix indentation for 'ip link show' options
BEFORE:
              The show command has additional formatting options:

       -s, -stats, -statistics
              output more statistics about packet usage.

       -d, -details
              output more detailed information.

       -h, -human, -human-readble
              output statistics with human readable values number followed by suffix

       -iec   print human readable rates in IEC units (ie. 1K = 1024).
AFTER:
       The show command has additional formatting options:

              -s, -stats, -statistics
                     output more statistics about packet usage.

              -d, -details
                     output more detailed information.

              -h, -human, -human-readble
                     output statistics with human readable values number followed by suffix

              -iec   print human readable rates in IEC units (ie. 1K = 1024).

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-03 09:17:36 -08:00
Pavel Simerda 922b482204 ip route: don't assume default route
Just print the help when "ip route del" is called without any other
arguments.

Resolves:

 * https://bugzilla.redhat.com/show_bug.cgi?id=997965

Signed-off-by: Pavel Šimerda <psimerda@redhat.com>
2014-12-03 09:16:07 -08:00
vadimk 10ed8b7f67 configure: Add check for the doc tools
Added checking existence of the doc files converters.
If the XXX tool exists then HAVE_XXX:=y will be written
to the Config file. Example of the configure script output:

TC schedulers
 ATM	no
 IPT	using xtables
 IPSET  yes

iptables modules directory: /usr/lib/iptables
libc has setns: yes
SELinux support: no

Docs
 latex: no
 WARNING: no docs can be built from LaTeX files
 sgml2html: yes

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-30 09:50:25 -08:00
Stephen Hemminger e9c4b7c38f update if_bridge
Use current upstream header.
2014-11-30 09:48:14 -08:00
vadimk 3b28be6e14 ss: Use generic handle_netlink_request for packet
Get rid of self-handling and creating of Netlink socket for show packet
socket stats.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-30 09:43:43 -08:00
vadimk 1f299e9249 man ip-link: Add description for 'help' command
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-30 09:40:08 -08:00
vadimk 5fb421d434 ss: Refactor to use macro for define diag nl request
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:29:36 -08:00
Vadim Kochan 1b94414854 ip link: Allow to filter devices by master dev
Added 'master' option to 'ip link show' command
to filter devices by master dev.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:27:41 -08:00
Dave Taht 1fa804e0d3 iproute2: Add support for babel protocol table entry 2014-11-29 11:24:25 -08:00
vadimk 2dc854854b ss: Fixed broken output for Netlink 'Peer Address:Port' column
When output the netlink sockets:

    ss -A netlink state close

the layout is a little broken with a shifted 'Peer Address:Port'
stars and empty new lines. Fixed by making the port field to be
wider for 'Local Address:Port' column.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:21:42 -08:00
vadimk 45a4770bc0 ss: Remove checking SS_CLOSE state for packet and netlink
I dont see a reason that packet and netlink states will be
printed only if SS_CLOSE state is set in filter, in that case
to print states of netlink or packet sockets it is needed to run:

    ss -A netlink state close

instead of:

    ss -A netlink

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:20:51 -08:00
vadimk 0948adc01a ip netns: Identify netns for the current process
As 'ip' util will share the same netns from the caller
process then we can just look at /proc/self/.. to show
the netns of the current process by:

    ip netns id

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:19:11 -08:00
vadimk 18f39a3a02 tests: Move tc related tests to testsuite/tests/tc folder
With this change the results of tc tests will be recorded under:

    testsuite/results/tc/

The ip related tests can be added under:

    testsuite/tests/ip

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:17:11 -08:00
vadimk 093b76466e ip monitor: Allow to filter events by dev
Added 'dev' option to allow filtering events by device.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:15:40 -08:00
vadimk eef43b5052 ss: Identify more netlink protocol names
There were only few Netlink protocol names
which were printed on the screen:

    rtnl, fw, tcpdiag

So added the ability to identify Netlink proto name
from /etc/iproute/nl_protos or from static table.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-29 11:13:38 -08:00
Tom Herbert 666cdc506f vxlan: Add support for enabling UDP checksums
Add udpcsum option to enable transmitting UDP checksums when doing
VXLAN/IPv4. Add udp6zerocsumtx, and udp6zerocsumrx options to enable
sending zero checksums and receiving zero checksums in VXLAN/IPv6.

Signed-off-by: Tom Herbert <therbert@google.com>
2014-11-29 11:07:00 -08:00
Or Gerlitz 8ca8fac7aa ip-link: Document IPoIB link type in the man page
Add documentation on how to create devices of type IP-over-Infiniband
in the man page.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
2014-11-29 11:05:51 -08:00
Florian Westphal 4d9a264f09 utils: relax strtoX checking in get_time_rtt
ip route change dev tap0 192.168.7.0/24 rto_min 1ms
Error: argument "1ms" is wrong: "rto_min" value is invalid

get_time_rtt() checks for 's' or 'msec' and converts to milliseconds
if needed.

Fixes: 697ac63905 (utils: fix range checking for get_u32/get_u64 et all)
Signed-off-by: Florian Westphal <fw@strlen.de>
2014-11-22 13:52:44 -08:00
Tom Herbert e4fc7e2625 iproute2: Man pages for fou and gue
Man pages for Foo-over-UDP and Generic UDP Encapsulation receive
port configuration. gue man page links to fou one.

Signed-off-by: Tom Herbert <therbert@google.com>
2014-11-06 16:17:34 -08:00
Tom Herbert 80c24b097e ip link gre: Add support to configure FOU and GUE
This patch adds support to configure foo-over-udp (FOU) and Generic
UDP Encapsulation for GRE tunnels. This configuration allows selection
of FOU or GUE for the tunnel, specification of the source and
destination ports for UDP tunnel, and enabling TX checksum. This
configuration only affects the transmit side of a tunnel.

Example:

ip link add name tun1 type gre remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap fou encap-sport auto encap-dport 7777 encap-csum

This would create an GRE tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet) and
checksums in the encapsulating UDP header are enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
2014-11-06 16:17:34 -08:00
Tom Herbert c1159152e1 ip link ipip: Add support to configure FOU and GUE
This patch adds support to configure foo-over-udp (FOU) and Generic
UDP Encapsulation for IPIP and sit tunnels. This configuration allows
selection of FOU or GUE for the tunnel, specification of the source and
destination ports for UDP tunnel, and enabling TX checksum. This
configuration only affects the transmit side of a tunnel.

Example:

ip link add name tun1 type ipip remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap gue encap-sport auto encap-dport 9999 encap-csum

This would create an IPIP tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet) and
checksums in the encapsulating UDP header are enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
2014-11-06 16:17:34 -08:00
Tom Herbert 6928747b6e ip fou: Support to configure foo-over-udp RX
Added 'ip fou...' commands to enable/disable UDP ports for doing
foo-over-udp and Generic UDP Encapsulation variant. Arguments are port
number to bind to and IP protocol to map to port (for direct FOU).

Examples:

ip fou add port 7777 gue
ip fou add port 8888 ipproto 4

The first command creates a GUE port, the second creates a direct FOU
port for IPIP (receive payload is a assumed to be an IPv4 packet).

Signed-off-by: Tom Herbert <therbert@google.com>
2014-11-06 16:17:34 -08:00
Masatake YAMATO e37a9c73d2 man: ip-link: fix a typo
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-11-06 16:02:38 -08:00
Christian Hesse 50ec66507b ip-link: in human readable output use dynamic precision length 2014-11-06 16:02:33 -08:00
vadimk 5cb6aa0348 doc ip-cref: Added missing ip options
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-11-06 16:02:30 -08:00
Christian Hesse f4fe81d26c ip-link: fix column alignment
Width is the maximum number of characters used for the value, excluding a
field separator. So append a single whitespace.
2014-11-06 16:02:29 -08:00
Stephen Hemminger 1e264abc3a ip: add iec formatted option and cleanup code
Add a new -iec option in addition to -human.
Cleanup code so the formatting of numbers is done in one function,
not 2 ways and 2 sizes.
2014-11-02 12:49:19 -08:00
Christian Hesse b68d983754 ip-link: add switch to show human readable output
Byte and packet count can increase to really big numbers. This adds a
switch to show human readable output.

4: wl: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether 00🇩🇪ad:be:ee:ef brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1523846973 3969051  0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    8710088361 6077735  0       0       0       0
4: wl: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether 00🇩🇪ad:be:ee:ef brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1.5G       3.9M     0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    8.7G       6.0M     0       0       0       0
2014-11-02 11:53:29 -08:00
Alexey Andriyanov a0638e18b2 iproute2: ip6_tunnel mode bugfixes: any,vti6
- any ipv6 tunnel mode (proto == 0) could not be set
due to incomplete set of cases in do_add, do_del.
- vti6 logic was inverted: it was using "ip6_vti0" basedev
UNLESS mode is set to vti6.

We don't need a switch by p.proto in do_add()/do_del(): it
already exists in parse_args(). So if parse_args() call
was successful, no need to check tunnel mode again.

Signed-off-by: Alexey Andriyanov <alan@al-an.info>
2014-11-02 11:48:43 -08:00
Nicolas Dichtel eeb669a740 man: update doc after support of ESN and anti-replay window
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-10-30 09:39:25 -07:00
Stephen Hemminger 0bf4c355ee Merge branch 'net-next'
Conflicts:
	include/linux/if_tunnel.h
2014-10-30 09:38:56 -07:00
dingzhi 0151b56d10 xfrm: add support of ESN and anti-replay window
This patch allows to configure ESN and anti-replay window.

Signed-off-by: dingzhi <zhi.ding@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-10-29 22:50:48 -07:00
Eric Dumazet e557212eda netlink: extend buffers to 16K
Starting from linux-3.15 (commit 9063e21fb026, "netlink: autosize skb
lengths"), kernel is able to send up to 16K in netlink replies.

This change enables iproute2 commands to get bigger chunks,
without breaking compatibility with old kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2014-10-29 22:43:04 -07:00
Daniel Borkmann 907e1aca5f ss: output dctcp diag information
Dump useful DCTCP state/debug information gathered from diag.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2014-10-29 22:37:45 -07:00
Stephen Hemminger dddfc7f67e Update kernel headers to 3.18-rc2 2014-10-29 22:32:02 -07:00
vadimk 14f8854fa3 tests: Allow to run tests recursively
Such approach allows to run *.t scripts from any
tests/ subdirectories.

One point is that tests from tests/cls/*.t (which are needed
by tests/cls-testbed.t but does not exist yet) will also
be ran aside with tests/cls-testbed.t which is not good
because in such case they will be ran twice, so renamed these
tests path to tests/cls/*.c in tests/cls-testbed.t

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:59 -07:00
vadimk 8d391512b7 tests: Skip cls-testbed.t if tests/cls dir does not exist
Curently tests/cls-testbed.t tries to run any *.t in
tests/cls/ folder but such folder does not exist.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:57 -07:00
vadimk a5eafa9a5e man ip: Add missing '-details' option
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:16 -07:00
vadimk 08adf73a0a gitignore: Ignore 'doc' files generated at runtime
The list is based on doc/Makefile 'clean' target

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:15 -07:00
vadimk 2dc6731dce doc make: Add *.pdf files to the 'clean' target
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:14 -07:00
vadimk 338f07c699 man ip-link: Fixed missing 'up' option in 'ip link show' synopsis
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-29 22:26:11 -07:00
Stephen Hemminger 50231ad2a5 v3.17.0 2014-10-09 08:40:14 -07:00
Stephen Hemminger edd3979272 emp: fix warning on deprecated bison directive
emp_ematch.y:12.1-13: warning: deprecated directive, use ‘%name-prefix’ [-Wdeprecated]
 %name-prefix="ematch_"
 ^^^^^^^^^^^^^
2014-10-09 08:31:10 -07:00
vadimk 561e650eff ip link: Shortify printing the usage of link type
Allow to print particular link type usage by:

    ip link help [TYPE]

Currently to print usage for some link type it is needed
to use the following way:

    ip link { add | del | set } type TYPE help

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-09 08:29:47 -07:00
vadimk f29543125f tests: Check existing of /proc/config.gz before use it
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-09 08:29:39 -07:00
Jamal Hadi Salim 863ecb04b4 discourage use of direct policer interface
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim 287bf3a990 route classifier support for multiple actions
route can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim 08139c2ffb tcindex classifier support for multiple actions
tcindex can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:56 -07:00
Dmitry Popov 4cb8d03078 ip tunnel: fix 'ip -oneline tunnel show' for some GRE tunnels
'ip -oneline tunnel show' was not "oneline" for GRE tunnels with iseq:
# ip tun add gre_test remote 1.1.1.1 local 2.2.2.2 mode gre iseq oseq
# ip -oneline tun show gre_test | wc -l
2

The problem existed because of a typo: '\n' was printed when it shouldn't be.
Fixed.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
2014-10-09 08:24:01 -07:00
Jiri Benc 5d5cf1b437 ip address: print stats with -s
Make ip address show accept the -s option similarly to ip link. This creates
an one command replacement for "ifconfig -a" useful for people who still
stay with ifconfig because of this feature.

Print the stats as the last thing for the interface. This requires some code
shuffling.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2014-10-09 08:24:01 -07:00
Eric Dumazet 624a06e63f ss: add more tcp socket diagnostics
Display 4 additional tcp socket info fields :

backoff : exponential backoff
lastsnd : time in milli second since last send
lastrcv : time in milli second since last receive
lastack : time in milli second since last acknowledgement

$ ss -ti dst :22
State       Recv-Q Send-Q                  Local Address:Port
Peer Address:Port
ESTAB       0      0                        172.16.5.1:58470
172.17.131.143:ssh
	 cubic wscale:7,7 rto:228 rtt:30/20 ato:40 mss:1256 cwnd:6 ssthresh:4
send 2.0Mbps lastsnd:3480 lastrcv:3464 lastack:3464 rcv_rtt:81.5
rcv_space:87812

Signed-off-by: Eric Dumazet <edumazet@google.com>
2014-10-09 08:24:01 -07:00
Atzm Watanabe 68ac9ab339 iplink: do not require assigning negative ifindex at link creation
Since commit 3c682146ae, iplink requires assigning negative
ifindex (-1) to the kernel when creating interface without
specifying index.

v2: checking whether index is -1, suggested by Cong Wang.

Cc: Cong Wang <cwang@twopensource.com>
Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Acked-by:  Cong Wang <cwang@twopensource.com>
2014-10-09 08:24:01 -07:00
vadimk ae0d90737c tests: Allow policer test to be ran
Renamed testsuite/tests/policer to testsuite/tests/policer.t

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-09 08:24:01 -07:00
vadimk d14cc6be00 tests: Add runtime generated files to .gitignore
When make tests then 2 folders are generated:

    testsuite/results
    testsuite/iproute2/iproute2-this

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-09 08:24:00 -07:00
vadimk 40aadf8b09 ip monitor: Changed 'Unknown message' format to be more informative
In case if unknown message was handled then it will be displayed as:

    Unknown message: type=0x00000044(68) flags=0x00000000(0) len=0x0000004c(76)

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-10-09 08:24:00 -07:00
Andy Furniss a07c6d6135 add missing underscore to man page and example nf_mark ematch
The man page and the "fail" example are missing an underscore in the
nf_mark ematch.

eg.

tc filter add dev eth0 parent ffff:  basic match 'meta(nfmark gt 24)'
classid 2:4

meta: unknown meta id

... >>meta(nfmark gt 24)<< ...
... meta(>>nfmark<< gt 24)...
Usage: meta(OBJECT { eq | lt | gt } OBJECT)
where: OBJECT  := { META_ID | VALUE }
        META_ID := id [ shift SHIFT ] [ mask MASK ]

Example: meta(nfmark gt 24)
          meta(indev shift 1 eq "ppp")
          meta(tcindex mask 0xf0 eq 0xf0)

For a list of meta identifiers, use meta(list).
Illegal "ematch"

meta(list) does correctly show nf_mark and the above test works with
nf_mark.

Signed-off-by: Andy Furniss adf.lists@gmail.com
2014-10-09 08:24:00 -07:00
vadimk c1cbb18adb ip netns: Create /var/run/netns dir when do 'ip netns monitor'
netns monitor fails when there is no /var/run/netns dir
which might be created later while monitoring.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-29 08:53:31 -07:00
vadimk 9ecff68d11 tests: Fix problem with test running
Tests were not allowed to be ran, the following
issues were fixed:
    - creating the results folder before test running
    - sudo $PREFIX moved before variables definition which
        allow to pass them through the sudo to test script.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-29 08:51:51 -07:00
Jamal Hadi Salim 10f5a375ea rsvp classifier support for multiple actions
Example setup:

sudo tc qdisc del dev eth0 root handle 1:0 prio
sudo tc qdisc add dev eth0 root handle 1:0 prio

sudo tc filter add dev eth0 pref 10 proto ip parent 1:0 \
rsvp session 10.0.0.1 ipproto icmp \
classid 1:1  \
action police rate 1kbit burst 90k pipe \
action ok

tc -s filter show dev eth0 parent 1:0

filter protocol ip pref 10 rsvp
filter protocol ip pref 10 rsvp fh 0x0001100a flowid 1:1 session
10.0.0.1 ipproto icmp
        action order 1:  police 0x5 rate 1Kbit burst 23440b mtu 2Kb
action pipe overhead 0b
ref 1 bind 1
        Action statistics:
        Sent 98000 bytes 1000 pkt (dropped 0, overlimits 761 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 2 ref 1 bind 1 installed 60 sec used 3 sec
        Action statistics:
        Sent 74578 bytes 761 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: John Fastabend <john.r.fastabend@intel.com>
2014-09-29 08:47:33 -07:00
Jamal Hadi Salim 954de6c72b actions: BugFix action stats to display with -s
Was broken by commit 288abf513f
Lets not be too clever and have a separate call to print flushed
actions info.

Broken looks like:
root@moja-1:~# tc actions add  action drop index 4
root@moja-1:~# tc -s actions ls action gact

    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec

The fixed version looks like:
    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec
         Sent 108948 bytes 1297 pkts (dropped 1297, overlimits 0)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-09-29 08:47:19 -07:00
Jiri Pirko 28d84b429e add bridge master device support
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-09-28 16:33:29 -07:00
Jiri Pirko 8c39db391d add bridge_slave device support
Note this depends on "iproute2: allow to change slave options via
type_slave"
2014-09-28 16:31:04 -07:00
Stephen Hemminger f55fa86dc7 update headers to 3.17.0 net-next 2014-09-28 16:28:00 -07:00
Steffen Klassert 2f7fbec2eb iproute2: VTI6 support for ip -6 link command.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-28 16:23:12 -07:00
Steffen Klassert f36d1140f2 iproute2: Add support for IPv6 VTI tunnels to ip6tunnel
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-28 16:23:11 -07:00
vadimk 08ce8ae95d ip tuntap: Added missing commands in usage
show, list, lst and help commands were not printed in usage.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-28 16:19:31 -07:00
vadimk f1a505aca8 ip tuntap: Add checking if tun/tap mode was set by default
This checking was performed only when adding interface but
it is needed also when deleting, otherwise the error will be:

    ioctl(TUNSETIFF): Invalid argument

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-28 16:19:31 -07:00
Nicolas Dichtel 6ad5399c3a ip/vxlan: fix display of maxaddress option
Parenthesis are required else maxaddr value is a bool and thus output is always
1 when the option is set.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-09-28 16:19:31 -07:00
Nicolas Dichtel c2fbc57ee7 ip/vxlan: add a help for ageing and maxaddress options
These options were missing in usage and man pages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-09-28 16:19:31 -07:00
Jiri Pirko 7feb76ce98 add help command to bonding master
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-09-28 16:07:07 -07:00
Nikolay Aleksandrov 620ddedada iproute2: allow to change slave options via type_slave
This patch adds the necessary changes to allow altering a slave device's
options via ip link set <device> type <master type>_slave specific-option.
It also adds support to set the bonding slaves' queue_id.

Example:
 ip link set eth0 type bond_slave queue_id 10

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
2014-09-28 16:05:24 -07:00
WANG Cong 3c682146ae iplink: forbid negative ifindex and modifying ifindex
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-09-28 16:03:38 -07:00
Eric Dumazet 9464a5f26c ip: support of usec rtt in tcp_metrics
Starting from linux-3.15, kernel supports new tcp metric attributes :
TCP_METRIC_RTT_US & TCP_METRIC_RTTVAR_US

Update ip command to detect their use.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2014-09-28 15:58:36 -07:00
vadimk c56361f4b5 ip monitor: Skip IPv6 ND user option messages
IPv6 router sends ND messages with RDNSS option
which causes the printing of unknown message by 'ip monitor':

    Unknown message: 0000004c 00000044 00000000

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-28 15:58:27 -07:00
vadimk 2271779d80 ip monitor: Dont print timestamp or banner-label for cloned routes
This is ugly fix but solves the case when timestamp
or banner-label is printed before the cloned route will be skipped
by iproute filter which filters out all cached routes by default.
In such case timestamp will be printed twice:

    Timestamp: Thu Sep  4 19:46:59 2014 457933 usec
    Timestamp: Thu Sep  4 19:47:07 2014 977970 usec
    10.3.5.1 dev wlp3s0 lladdr XX:XX:XX:XX:XX:XX STALE

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-09-28 15:57:52 -07:00
Stephen Hemminger eb5d01ff38 update dsfield file values
Update the rt_dsfield file to contain values defined in current RFC.
The days of TOS precedence are gone, even Cisco doesn't refer
to these in the documents.
2014-09-14 20:40:37 -07:00
vadimk f1b66ff83a ip link: Remove unnecessary device checking
The real checking is performed later in iplink_modify(..) func which
checks device existence if NLM_F_CREATE flag is set.

Also it fixes the case when impossible to add veth link which was
caused by 9a02651a87 (ip: check for missing dev arg when doing VF rate)
because these devices are not exist yet.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2014-09-03 18:37:42 -07:00
vadimk 2f937359dd ip man: Added short description for hsr link type
For hsr link there was no short description in ip-link man page.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2014-09-03 18:37:16 -07:00
vadimk bcf1aae8a8 ip netns: Show error message if mkdir failed to create /var/run/netns
Currently if mkdir failed with "Permission denied" error then "mount --make-shared ..."
error message will be showed because /var/run/netns does not exist.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2014-08-29 00:13:52 -07:00
Eric Dumazet cdb2227e9c nstat: 64bit support on 32bit arches
SNMP counters can be provided as 64bit numbers.
nstat needs to cope with this even if running in 32bit mode.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2014-08-29 00:13:36 -07:00
Daniel Borkmann 1910618074 ll_types: add netlink ARPHRD
This adds ARPHRD_NETLINK to ll_types so that it can be properly
shown e.g. in `ip a`:

 8: nlmon: <NOARP,UP,LOWER_UP> mtu 3776 qdisc noqueue state UNKNOWN group default
    link/netlink

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2014-08-29 00:13:22 -07:00
Stephen Hemminger cd63507430 Merge branch 'net-next' 2014-08-04 12:58:36 -07:00
Stephen Hemminger a9ae422486 v3.16.0 2014-08-04 12:43:46 -07:00
Jiri Pirko ff7c208440 iproute2: allow to ipv6 set address generation mode
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-08-04 12:41:14 -07:00
Stephen Hemminger 945eaebdf7 Update kernel headers to net-next 2014-08-04 12:39:49 -07:00
Stephen Hemminger 656111b2f9 cleanup warnings
ll_index can return -1 but was declared unsigned.
rt_addr_n2a had unused length parameter
2014-08-04 10:30:35 -07:00
Jay Vosburgh 3757185b29 tc/netem: loss gemodel options fixes
First, the default value for 1-k is documented as being 0, but is
currently being set to 1. (100%).  This causes all packets to be dropped
in the good state if 1-k is not explicitly specified.  Fix this by setting
the default to 0.

	Second, the 1-h option is parsed correctly, however, the kernel is
expecting "h", not 1-h.  Fix this by inverting the "1-h" percentage before
sending to and after receiving from the kernel.  This does change the
behavior, but makes it consistent with the netem documentation and the
literature on the Gilbert-Elliot model, which refer to "1-h" and "1-k,"
not "h" or "k" directly.

	Last, fix a minor formatting issue for the options reporting.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-08-04 10:15:10 -07:00
Jamal Hadi Salim aeb14eb0b2 iproute2 bridge: bring to above par with brctl show macs
root@moja-mojo:bridge# ./bridge fdb help
Usage: bridge fdb { add | append | del | replace } ADDR dev DEV {self|master} [ temp ]
              [router] [ dst IPADDR] [ vlan VID ]
              [ port PORT] [ vni VNI ] [via DEV]
       bridge fdb {show} [ br BRDEV ] [ brport DEV ]

 Lets start with two bridges each with a port...

root@moja-mojo:bridge# ./bridge link
10: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master sw1 state disabled priority 32 cost 100
11: eth1 state DOWN : <BROADCAST,NOARP> mtu 1500 master br0 state disabled priority 32 cost 100

show all...
root@moja-mojo:bridge# ./bridge fdb show
33:33:00:00:00:01 dev ifb0 self permanent
33:33:00:00:00:01 dev ifb1 self permanent
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:92:c0:60 dev eth0 self permanent
33:33:00:00:00:fb dev eth0 self permanent
01:00:5e:00:00:fb dev eth0 self permanent
01:00:5e:7f:ff:fd dev eth0 self permanent
01:00:5e:00:00:01 dev wlan0 self permanent
33:33:00:00:00:01 dev wlan0 self permanent
33:33:ff:c2:84:3b dev wlan0 self permanent
33:33:00:00:00:fb dev wlan0 self permanent
01:00:5e:00:00:01 dev virbr0 self permanent
01:00:5e:00:00:fb dev virbr0 self permanent
33:33:00:00:00:01 dev br0 self permanent
33:33:00:00:00:01 dev sw1 self permanent
33:33:00:00:00:01 dev dummy0 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master sw1 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent

Lets see a netdev that is *not* attached to a bridge

root@moja-mojo:bridge# ./bridge fdb show brport eth0
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
33:33:ff:92:c0:60 self permanent
33:33:00:00:00:fb self permanent
01:00:5e:00:00:fb self permanent
01:00:5e:7f:ff:fd self permanent

Lets see a netdev that is bridge port
root@moja-mojo:bridge# ./bridge fdb show brport eth1
hadi@jhs-1:/media/MT1/other-gits/iproute-jul04/bridge$ ./bridge fdb show brport eth1
b6:5e:dd:ce:d7:5e vlan 0 master br0 permanent
33:33:00:00:00:01 self permanent

Specify the correct bridge and you get good stuff
root@moja-mojo:bridge# ./bridge fdb show brport eth1 br br0
6:5e:dd:ce:d7:5e vlan 0 master br0 permanent
33:33:00:00:00:01 self permanent

Specify the wrong bridge and you get good nada
root@moja-mojo:bridge# ./bridge fdb show brport eth1 br sw1

dump br0
root@moja-mojo:bridge# ./bridge fdb show br br0
33:33:00:00:00:01 dev br0 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent

dump sw1
root@moja-mojo:bridge# ./bridge fdb show br sw1
33:33:00:00:00:01 dev sw1 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master sw1 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent

Lets move a port from one bridge to another for shits-and-giggles
(as the New Brunswickians like to say)
root@moja-mojo:bridge# ip link set sw1-p1 master br0

Now dump again br0
root@moja-mojo:bridge# ./bridge fdb show br br0
33:33:00:00:00:01 dev br0 self permanent
5e:f4:03:44:da:9a dev sw1-p1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev sw1-p1 self permanent
b6:5e:dd:ce:d7:5e dev eth1 vlan 0 master br0 permanent
33:33:00:00:00:01 dev eth1 self permanent

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-08-04 09:34:16 -07:00
Roopa Prabhu 50b9950dd9 link dump filter
This patch avoids a full link wildump request when the user has specified
a single link. Uses RTM_GETLINK without the NLM_F_DUMP flag.

This helps on a system with large number of interfaces.

This patch currently only uses the link ifindex in the filter.
Hoping to provide a subsequent kernel patch to do link dump filtering on
other attributes in the kernel.

In iplink_get, to be safe, this patch currently sets the answer buffer
size to the max size that libnetlink rtnl_talk can copy. The current api
does not seem to provide a way to indicate the answer buf size.

changelog from RFC to v1:
    - incorporated comments from stephen (fixed comment and fixed if/else block)

changelog from v1 to v2:
    - fix whitespaces error

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2014-08-04 09:32:13 -07:00
Rami Rosen e4c356827a iplink: macvtap: fix man page
This patch adds description about macvtap to ip-link.8 man page.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
2014-08-04 09:31:02 -07:00
Dmitry Popov 23d526c426 fix ip tunnel for vti tunnels with ikey
Consider the following command:

ip tunnel add mode vti remote 12.0.0.1 local 12.0.0.3 ikey 15

i_flags will be GRE_KEY|VTI_ISVTI. So, in order to distinguish between ipip and
vti we have to check just VTI_ISVTI bit, not the equality of i_flags and
VTI_ISVTI.

* Note, that there also was a bug in ip_tunnel/ip_vti, see
commit 7c8e6b9c281(ip_vti: Fix 'ip tunnel add' with 'key' parameters),
https://lkml.org/lkml/2014/6/7/125.
Even patched iproute could be unable to create vti tunnels with non-zero keys.

1) Unpatched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip                    4197  0 
tunnel4                 1659  1 ipip
ip_tunnel               9295  1 ipip
[root@vm ~]# ip tunnel show
tunl0: ip/ip  remote any  local any  ttl inherit
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
ipip0: ip/ip  remote 1.2.3.4  local any  ttl inherit 
tunl0: ip/ip  remote any  local any  ttl inherit 
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ipip                    4197  0 
tunnel4                 1659  1 ipip
ip_tunnel               9295  1 ipip

# ipip tunnels are created instead of vti

2) Patched iproute2:
[root@vm ~]# ip tunnel show
[root@vm ~]# lsmod | egrep '(ipip|vti)'
[root@vm ~]# ip tunnel add mode vti ikey 1
[root@vm ~]# lsmod | egrep '(ipip|vti)'
ip_vti                  5258  0 
ip_tunnel               9295  1 ip_vti
[root@vm ~]# ip tunnel show
vti0: ip/ip  remote any  local any  ttl inherit  ikey 1  okey 0 
ip_vti0: ip/ip  remote any  local any  ttl inherit  nopmtudisc key 0
[root@vm ~]# ip tunnel add mode vti remote 1.2.3.4 ikey 2
[root@vm ~]# ip tunnel show
vti0: ip/ip  remote any  local any  ttl inherit  ikey 1  okey 0
vti1: ip/ip  remote 1.2.3.4  local any  ttl inherit  ikey 2  okey 0 
ip_vti0: ip/ip  remote any  local any  ttl inherit  nopmtudisc key 0

# Vti tunnels are created as expected
# * If you have unpatched kernel your vti tunnels will have ikey == okey == 0

Same story exists with ip tunnel show/del with non-zero [io]key: requests are 
routed to tunl0 instead of ip_vti0.


Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
2014-07-15 09:49:17 -07:00
Vasily Averin 319624499f ipnetns: fixed typo "seting" -> "setTing"
Signed-off-by: Vasily Averin <vvs@openvz.org>
2014-07-15 09:45:37 -07:00
Daniel Borkmann cd509528ed man: token: fix couple of typos
Not sure how these typos slipped in back then, I suspect
too much coffee. ;) So lets fix them up properly.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
2014-07-15 09:45:00 -07:00
vadimk cfea8b3509 ip: Added missing usage for netconf object 2014-07-15 09:43:53 -07:00
Masatake YAMATO 7968262df6 ip: add nlmon as a device type to help message
Though nlmon device can be added, it was not listed
in the output of "ip link help".

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
2014-07-15 09:41:44 -07:00
Stephen Hemminger 9a02651a87 ip: check for missing dev arg when doing VF rate
New VF rate code was not handling case where device not specified.
Caught by GCC warning about uninitialized variable.
2014-07-14 12:08:05 -07:00
Stephen Hemminger 1199c4f569 ip: add paren to silence warning
Gcc doesn't like mixed || and && in same conditional.
2014-07-14 12:06:52 -07:00
Stephen Hemminger 76723fd1c6 Update kernel headers to 3.16-rc5 2014-07-14 11:56:33 -07:00
Stephen Hemminger 0f6d24032f Merge branch 'net-next' 2014-06-10 10:38:00 -07:00
Stephen Hemminger ce7aff3b8d v3.15.0 2014-06-10 09:39:14 -07:00
Roopa Prabhu cc273a51d0 bridge: Add master device name to bridge fdb show
This patch adds master dev name from NDA_MASTER netlink attribute
 to bridge fdb show output

current iproute2 tries to print 'master' in the output if NTF_MASTER
is present. But, kernel today does not set NTF_MASTER during dump
requests. Which means I have not seen iproute2 bridge cmd print 'master' atall.
This patch overrides the NTF_MASTER flag if NDA_MASTER attribute is present.

Example output:

before this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 permanent
44:38:39:00:27:bb dev bond4.2003 permanent
44:38:39:00:27:bc dev bond2.2004 permanent

After this patch:
# bridge fdb show
44:38:39:00:27:ba dev bond2.2003 master br-2003 permanent
44:38:39:00:27:bb dev bond4.2003 master br-2003 permanent
44:38:39:00:27:bc dev bond2.2004 master br-2004 permanent

For comparision with the above, below is the output for NTF_SELF today,
# bridge fdb show
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:00:01:cc dev eth0 self permanent

If change in output is a concern, 'master' can be put at the end of the fdb
output line or made optional with -d[etails] option.

change from v1 to v2:
    use 'bridge' instead of 'master' in fdb show output

change from v2 to v3:
    use 'master' instead of 'bridge' in fdb show output
    (master could also be a vxlan device)

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2014-06-09 12:56:23 -07:00
Sucheta Chakraborty f89a2a05ff Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool
o "min_tx_rate" option has been added for minimum Tx rate. Hence, for
  consistent naming, "max_tx_rate" option has been introduced for maximum
  Tx rate.

o Change in v2: "rate" can be used along with "max_tx_rate".
  When both are specified, "max_tx_rate" should override.

o Change in v3:
  * IFLA_VF_RATE: When IFLA_VF_RATE is used, and user has given only one of
    min_tx_rate or max_tx_rate, reading of previous rate limits is done in
    userspace instead of in kernel space before ndo_set_vf_rate.

  * IFLA_VF_TX_RATE: When IFLA_VF_TX_RATE is used, min_tx_rate is always read
    in kernel space. This takes care of below scenarios:
    (1) when old tool sends "rate" but kernel is new (expects min and max)
    (2) when new tool sends only "rate" but kernel is old (expects only "rate")

o Change in v4 as suggested by Stephen Hemminger:
  * As per iproute policy, input and output formats should match. Changing display
    of max_tx_rate and min_tx_rate options accordingly.
	./ip/ip link show p3p1
	8: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
        link/ether 00:0e:1e:16:ce:40 brd ff:ff:ff:ff:ff:ff
        vf 0 MAC 2a:18:8f:4d:3d:d4, tx rate 700 (Mbps), max_tx_rate 700Mbps, min_tx_rate 200Mbps
        vf 1 MAC 72:dc:ba:f9:df:fd

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
2014-06-09 12:51:57 -07:00
Stephen Hemminger fd5c1d4391 Update to current net-next kernel headers
Update sanitized headers
2014-06-09 12:50:30 -07:00
Jiri Pirko 316c2346f7 iproute2: utils: change hexstring_n2a and hexstring_a2n to do not work with ":"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-06-09 12:45:55 -07:00
Jiri Pirko dd50247dba iproute2: arpd: use ll_addr_a2n and ll_addr_n2a
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-06-09 12:45:54 -07:00
Yang Yingliang aeb199d5ce fq: allow options of fair queue set to ~0U
Some options of fair queue cannot be (~0U). It leads to maxrate
cannot be reset to unlimited because it cannot be (~0U). Allow
the options being ~0U.

Tested by the following command:
 # tc qdisc add dev eth4 root handle 1: fq limit 2000 flow_limit 200 maxrate 100mbit quantum 2000 initial_quantum 1600
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 2000p flow_limit 200p buckets 1024 quantum 2000 initial_quantum 1600 maxrate 100Mbit
 Sent 1492 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  1 flows (0 inactive, 0 throttled)
  0 gc, 0 highprio, 0 throttled

 # tc qdisc change dev eth4 root handle 1: fq limit 4294967295 flow_limit 4294967295 maxrate 34359738360 quantum 4294967295 initial_quantum 4294967295
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 4294967295p flow_limit 4294967295p buckets 1024 quantum 4294967295 initial_quantum 4294967295
 Sent 38372 bytes 216 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  2 flows (1 inactive, 0 throttled)
  0 gc, 2 highprio, 7 throttled

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-06-09 12:42:36 -07:00
Andreas Henriksson 9dca899b2d bridge: Make filter_index match in signedness
Michael Tautschnig wrote:

During a rebuild [...]. Please note that we use our research
compiler tool-chain (using tools from the cbmc package), which permits extended
reporting on type inconsistencies at link time.

[...]
gcc   bridge.o fdb.o monitor.o link.o mdb.o vlan.o ../lib/libnetlink.a ../lib/libutil.a  ../lib/libnetlink.a ../lib/libutil.a -o bridge
file link.c line 18: error: conflicting types for variable "filter_index"
old definition in module fdb file fdb.c line 29
signed int
new definition in module link file link.c line 18
unsigned int
<builtin>: recipe for target 'bridge' failed
make[3]: *** [bridge] Error 64
make[3]: Leaving directory '/srv/jenkins-slave/workspace/sid-goto-cc-iproute2/iproute2-3.14.0/bridge'
Makefile:45: recipe for target 'all' failed

While practical constraints may limit the value of filter_index to remain within
the bounds of a positive signed int, there is certainly no such guarantee here.
Also, a plain majority vote suggests that this really just a wrong declaration
in link.c as several declarations of filter_index as signed int exist.

[...]

My followup on this was:

I think the majority is wrong.

filter_index is assigned exclusively from if_nametoindex or ll_name_to_index
which both return unsigned int.

Changing it to unsigned everywhere seems better.

This has been minimally tested by using the bridge tool
to add vids and showing available vids on different devices.

Reported-by: Michael Tautschnig <mt@debian.org>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2014-06-09 12:40:45 -07:00
Cong Wang 0cb6bb51b4 do not exit silently when link is not found
When we create a tunnel on top of a link and the link specified
in cmdline doesn't exist, an error message should be shown.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-06-09 12:38:32 -07:00
Eric Dumazet eb6028b263 ss: display pacing_rate/max_pacing_rate
Since linux-3.15, kernel exports tcpi_pacing_rate and
tcpi_max_pacing_rate in tcp_info

Add TCP pacing_rate information on ss -i output :

lpaa23:~# ./ss -ti dst 10.246.7.151
State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port
ESTAB      0      325800    10.246.7.151:57614
10.246.7.152:46811
	 cubic wscale:7,7 rto:201 rtt:0.081/0.006 mss:1448 cwnd:90 ssthresh:63
send 12871.1Mbps pacing_rate 15397.8Mbps unacked:90 retrans:0/305
rcv_space:29200

If SO_MAX_PACING_RATE is set on the socket, we add /max_pacing_rate as
in :

... pacing_rate 1570.5Mbps/2.0Gbps ...

Signed-off-by: Eric Dumazet <edumazet@google.com>
2014-06-09 12:36:49 -07:00
Stephen Hemminger 468dec75a5 Fix non-literal string format warnings
The lnstat program was building a format string, then using it.
This was safe, but simpler to just use format character * to
get width.
2014-05-29 10:49:55 -07:00
Stephen Hemminger 4ec0ffde42 fix format warnings
Enable format security, and fix the warning caused by printing
with string for format.
2014-05-29 10:31:30 -07:00
Vlad Yasevich f0f4ab600b bridge: Add learning and flood support
Add ability to control learning and flood flags on bridge
ports.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2014-05-28 17:10:45 -07:00
Sergey V. Lobanov 3ff10e82c1 Fixed 'tc qdisc show' for tbf when latency<0
When limit<burst latency becomes <0, for example:
 # tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s

If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
2014-05-28 17:08:16 -07:00
Oliver Hartkopp 2b70fe156b iplink: can: fix help text and man page
Controller Area Network (CAN) interfaces are physical network interfaces.
They can't be 'created' like software devices by 'ip link add type can'.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2014-05-28 16:58:13 -07:00
Jiri Pirko c897067480 iproute2: ipa: show port id
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-05-28 16:57:32 -07:00
Jamal Hadi Salim 288abf513f actions: correctly report the number of actions flushed
This also fixes a long standing bug of not sanely reporting the
action chain ordering

Sample scenario test

on window 1(event window):
run "tc monitor" and observe events

on window 2:
sudo tc actions add action drop index 10
sudo tc actions add action ok index 12
sudo tc actions ls action gact
sudo tc actions flush action gact

See the event window reporting two entries
(doing another listing should show empty generic actions)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:31 -07:00
Jamal Hadi Salim 9282d08d93 actions: keyword flowid or classid terminates action pipeline
scenario testcase:

TC="sudo ./tc/tc"
DEV="dev eth0"
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 action police rate 6Mbit burst 6Mbit drop flowid :1
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 action police rate 1Gbit burst 1Gbit pass flowid :1
$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 flowid 1:1 action police rate 6Mbit burst 6Mbit drop
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 flowid 1:2 action police rate 1Gbit burst 1Gbit pass

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
flowid 1:10 \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0 \
flowid 1:10

$TC -s filter ls $DEV parent ffff: protocol ip

Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:28 -07:00
Jamal Hadi Salim cacba03b10 Remove unnecessary debug statement
Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:26 -07:00
Natanael Copa dd9cc0ee81 iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.

This fixes the following compile errors with musl libc:

f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
...

tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...

tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...

fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
        (vni >> 24) || vni == ULONG_MAX)
                              ^

lnstat.h:28:17: error: field 'last_read' has incomplete type
  struct timeval last_read;  /* last time of read */
                 ^

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
2014-05-28 16:51:39 -07:00
Andreas Greve 6e2e5ec28b fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.

Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.

Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.

Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
2014-05-13 13:10:31 -07:00
Or Gerlitz 63f60e3ab3 Document VF link state control in the ip-link man page
Document the support added by commit 07fa9c1 "Add VF link state
control" in the ip-link man page.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
2014-05-13 12:54:35 -07:00
Mike Rapoport 55713c8c72 ipnetns: fix misprint in an error message
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
2014-05-13 12:53:18 -07:00
Sergey V. Lobanov 7bc7fcaadb TBF man page fix (tbf is not classless)
TBF is not classless qdisc. man page corrected, added example
describing the use of inner qdisc

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
2014-05-09 13:35:18 -07:00
Sergey V. Lobanov 96e8ab7c58 Fix Linux priority and band for TOS==0x2 (man 8 tc-prio)
Due to commit 4a2b9c3(in Linux kernel) Linux priority(skb->priority)
changed for TOS==0x2

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
2014-05-09 13:30:30 -07:00
Stephen Hemminger 4b726cb176 Whitespace and indentation cleanup
Need to go over whole source and scrub..
2014-05-09 12:36:46 -07:00
david decotigny 30b557929f iproute2: show counter of carrier on<->off transitions
This patch allows to display the current counter of carrier on<->off
transitions (IFLA_CARRIER_CHANGES, see kernel commit "expose number of
carrier on/off changes"):

  ip -s -s link show dev eth0
  32: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
    link/ether ................. brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    125552461  258881   0       0       0       10150
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    40426119   224444   0       0       0       0
    TX errors: aborted fifo    window  heartbeat transns
               0        0       0       0        3

Tested:
  - kernel with patch "net-sysfs: expose number of carrier on/off
    changes": see "transns" column above
  - kernel wthout the patch: "transns" not displayed (as expected)

Signed-off-by: David Decotigny <decot@googlers.com>
2014-05-09 12:13:12 -07:00
Terry Lam ac74bd2a71 support for Heavy Hitter Filter (HHF) qdisc
$tc qdisc add dev eth0 hhf help
Usage: ... hhf [ limit PACKETS ] [ quantum BYTES]
               [ hh_limit NUMBER ]
               [ reset_timeout TIME ]
               [ admit_bytes BYTES ]
               [ evict_timeout TIME ]
               [ non_hh_weight NUMBER ]

$tc -s -d qdisc show dev eth0
qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048
reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
    drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0

HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight:  DRR weight for mice (default 2)

Signed-off-by: Terry Lam <vtlam@google.com>
2014-05-09 12:10:47 -07:00
Jay Vosburgh 8f9672af7a tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer.  Correct this by assigning the loss state to the correct
pointer.

	Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized.  Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-05-09 12:06:58 -07:00
Oliver Hartkopp 2bfe047017 iproute2: can: support CAN FD control interface
For CAN FD a new set of bittiming configuration and enabling functions for the
data section is provided by the CAN driver infrastructure.

This patch allows to configure the newly introduced CAN FD properties.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2014-05-09 12:04:55 -07:00
Oliver Hartkopp 3bbff7df0c iproute2: can: fix indention white spaces
When preparing a patch for CAN FD support these white space issues showed up.
Fix it in the current code to be able to provide a proper follow up patch.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2014-05-09 12:04:55 -07:00
Stephen Hemminger ce3436ca05 Update to 3.15-rc2 headers 2014-04-21 08:26:44 -07:00
Stephen Hemminger e4d5edba68 Merge branch 'net-next' 2014-04-11 18:06:13 -07:00
Stephen Hemminger af6e4234d6 v3.14.0 2014-04-11 17:48:41 -07:00
Heiner Kallweit a424c39360 ip: officially support flag mngtmpaddr also for "ip addr del"
Kernel is being extended to support flag IFA_F_MANAGETEMPADDR also for
deletion of addresses. This will allow a userspace application to indicate
that for a global address the kernel should delete all related temporary
addresses as well.

"ip addr del" internally calls ipaddr_modify which silently accepts
any flag provided on the command line already, independent of the
actual command.
Therefore only the usage documentation needs to be extended.

Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
2014-04-11 17:47:04 -07:00
WANG Cong 8b21f88dd0 ipaddress: do not add IFA_FLAGS when not necessary
commit 37c9b94ed2 (add support for extended ifa_flags)
introduced a regression:

        # ./ip/ip addr add 192.168.0.1/24 dev eth0
        RTNETLINK answers: Invalid argument

This is due to old kernels don't support IFA_FLAGS flag, we should not
use it if we don't use the flags beyond old .ifa_flags.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-04-11 17:44:57 -07:00
Kusanagi Kouichi 1891754487 veth: Handle flags correctry
Flags for a peer override flags for the other and not used for the
peer.

before:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:5c:cd:f5:63:d2 brd ff:ff:ff:ff:ff:ff
3: veth1: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:b0:fa:1e:76:7a brd ff:ff:ff:ff:ff:ff

after:
# ip link add up type veth peer down multicast off
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6e:db:03:b3:bd:ff brd ff:ff:ff:ff:ff:ff
3: veth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether a6:62:d9:84:f0:73 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
2014-04-11 17:44:48 -07:00
Stephen Hemminger 882e754cd4 fix indentation of ip neighbour man page
Formatting was awful and unclear on ip neighbour
2014-03-31 20:23:40 -07:00
Nicolas Dichtel f687d73c96 ipxfrm: allow to setup filter when dumping SA
It's now possible to filter SA directly into the kernel by specifying
XFRMA_PROTO and/or XFRMA_ADDRESS_FILTER.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2014-03-21 14:24:41 -07:00
Stephen Hemminger 53e16e395b Merge branch 'master' into net-next 2014-03-21 14:24:22 -07:00
Mike Rapoport 9e763fa5d3 bridge: fix reporting of IPv6 addresses
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
2014-03-21 14:23:05 -07:00
Masatake YAMATO 577e5a53fc iproute: Show default type, table, proto and scope of route
In "ip route show" output unicast type, main table, boot protocol and
universe scope are hidden as default labels.

Sometimes it is helpful to show the hidden label for people not enough
familiar with routing subsystem to map the output of "ip route show" and
kernel source code.

With this patch "ip route show" with -d option shows the default labels.

Example of difference of output with -d option:

    $ ./ip/ip -4   route show table all dev virbr1
    ...
    192.168.121.0/28  proto kernel  scope link  src 192.168.121.1
    ...
    $ ./ip/ip -4 -d  route show table all dev virbr1
    ...
    unicast 192.168.121.0/28  table main  proto kernel  scope link  src 192.168.121.1
    ...

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-03-21 14:21:26 -07:00
Hiroaki SHIMODA 4d4da09e00 htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
2014-03-21 14:20:06 -07:00
Stephen Hemminger bf9f122de3 Update headers to net-next 2014-03-21 14:16:17 -07:00
Richard Haines 116ac9270b ss: Add support for retrieving SELinux contexts
The process SELinux contexts can be added to the output using the -Z
option. Using the -z option will show the process and socket contexts (see
the man page for details).
For netlink sockets: if valid process show process context, if pid = 0
show kernel initial context, if unknown show "unavailable".

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
2014-03-10 13:20:49 -07:00
Masatake YAMATO 81ebcb2ae9 iproute2: use named constants instead of number literals to fill rtnl_rttable_hash
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-03-10 13:16:08 -07:00
Masatake YAMATO 58ed50ee25 iproute2: use named constants instead of number literals to fill rtnl_rtscope_tab
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-03-10 13:16:07 -07:00
John Fastabend 39935c9374 iproute2: add man page for mqprio
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-03-04 16:59:51 -08:00
Michal Kubeček 574e748806 iplink_bond_slave: show mii_status only once
With "ip -d link show", bonding slave mii status is displayed
twice, once as a number and once as a name.

Fixes: 730d3f61 ("iplink: add support for bonding slave")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2014-02-28 10:13:46 -08:00
Michal Kubeček f7a45e0955 iplink_bond: fix parameter value matching
Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,

  ip link set bond0 type bond xmit_hash_policy layer2+3

sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.

Use strcmp() to compare whole strings instead.

v2: look for an exact match only

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2014-02-17 10:58:56 -08:00
Stephen Hemminger 4806867a6c kill spaces before tabs 2014-02-17 10:56:31 -08:00
Stephen Hemminger 0612519e01 Remove trailing whitespace 2014-02-17 10:55:31 -08:00
Jiri Pirko 730d3f61d9 iplink: add support for bonding slave
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-02-17 10:53:34 -08:00
Jiri Pirko fbea611564 introduce support for slave info data
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-02-17 10:53:33 -08:00
Michal Kubeček 32ad31fba1 iplink_bond: fix arp_all_targets parameter name in output
Name of arp_all_targets parameter in output of "ip -d link show"
is missing trailing "s".

Fixes: 63d127b0 ("iproute2: finish support for bonding attributes")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
2014-02-17 10:48:25 -08:00
FX Le Bail 7c8a3cfba0 ss: display interface name as zone index when needed
This change enable the ss command to display the interface name as zone index
for local addresses when needed.

For this enhanced display *_diag stuff is needed.

It is based on a first version by Bernd Eckenfels.

example:
Netid  State   Recv-Q Send-Q                 Local Address:Port    Peer Address:Port
udp    UNCONN  0      0      fe80::20c:29ff:fe1f:7406%eth1:9999              :::*
udp    UNCONN  0      0                                 :::domain            :::*
tcp    LISTEN  0      3                                 :::domain            :::*
tcp    LISTEN  0      5      fe80::20c:29ff:fe1f:7410%eth2:99                :::*

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
2014-02-17 10:44:16 -08:00
Pavel Emelyanov 77a8ca8118 iproute: Fix Netid value for multi-families output
When requesting simultaneous output of TCP and UDP sockets
the netid field shows "tcp" always.

[root@xemvm1 iproute2]# ./misc/ss -a -tu
Netid State      Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port
tcp   UNCONN     0      0                                             *:32713                                          *:*
tcp   UNCONN     0      0                                             *:bootpc                                         *:*
tcp   UNCONN     0      0                                            :::57879                                         :::*
tcp   LISTEN     0      128                                           *:ssh                                            *:*
tcp   ESTAB      0      48                                      1.2.3.5:ssh                                      1.2.3.4:45826
tcp   ESTAB      0      0                                       1.2.3.5:ssh                                      1.2.3.4:45814
tcp   LISTEN     0      128                                          :::ssh                                           :::*

While the 1st 3 sockets are UDP ones:

[root@xemvm1 iproute2]# ./misc/ss -a -u
State       Recv-Q Send-Q                              Local Address:Port                                  Peer Address:Port
UNCONN      0      0                                               *:32713                                            *:*
UNCONN      0      0                                               *:bootpc                                           *:*
UNCONN      0      0                                              :::57879                                           :::*

Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Tested-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
2014-02-10 14:47:54 -08:00
Christoph Paasch c33049044e tcp_metrics: Allow removal based on the source-IP
This patch allows adding the source-IP attribute to the netlink-command.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
2014-02-10 14:46:11 -08:00
Christoph Paasch 114aa720fa tcp_metrics: Display source-address
This patch allows to display the source-IP.
stype will be used in the next patch that allows to remove based on the
source-IP.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
2014-02-10 14:46:11 -08:00
Christoph Paasch 54b237a058 tcp_metrics: Rename addr to daddr and add local variable
Renaming addr to daddr, because we will introduce saddr later.

The local variable is necessary to store RTA_PAYLOAD(a) temporarily.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
2014-02-10 14:46:11 -08:00
WANG Cong 1c9af05071 pedit: do not print debugging information by default
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-02-10 14:43:52 -08:00
Masatake YAMATO 3669d218b7 genl: fix a typo in help message of ctrl
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-02-10 14:41:25 -08:00
Stephen Hemminger 8cfeddabce Update kernel headers to 3.13-rc2 2014-02-10 14:40:33 -08:00
Stephen Hemminger a37c74724a Merge branch 'net-next-for-3.13' 2014-02-10 14:39:20 -08:00
Mythili Prabhu 8c45275594 PIE: Add man page
This adds the manpage for  PIE: Proportional Integral controller Enhanced AQM
scheme.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Vijay Subramanian <vijaynsu@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
2014-01-20 12:32:17 -08:00
Yang Yingliang dad2f72bef netem: add 64bit rates support
netem support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-01-20 12:32:15 -08:00
Yang Yingliang a01de0a336 tbf: support sending burst/mtu to kernel directly
To avoid loss when transforming burst to buffer in userspace, send
burst/mtu to kernel directly.

Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute")
make it can handle burst/mtu.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-01-20 12:32:14 -08:00
Thomas Haller 58c69b226f add support for IFA_F_NOPREFIXROUTE
Signed-off-by: Thomas Haller <thaller@redhat.com>
2014-01-20 12:30:45 -08:00
Jiri Pirko 5b7e21c417 add support for IFA_F_MANAGETEMPADDR
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-01-20 12:30:44 -08:00
Stephen Hemminger 3ba9ccda87 Update headers files from net-next 2014-01-20 12:28:42 -08:00
Stephen Hemminger 514cdfb443 Revert "vxlan: remove dstport option"
This reverts commit 92deabcf29.

Conflicts:
	ip/iplink_vxlan.c

Allow setting dst_port in 3.12
2014-01-10 15:17:06 -08:00
sfeldma@cumulusnetworks.com 63d127b05d iproute2: finish support for bonding attributes
Add support for bonding attributes just added to net-next.
On set, allow string or number value for enumerated attributes.
On show, use always use string value for attribute.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
2014-01-09 23:09:01 -08:00
Stephen Hemminger 6a5295a414 Merge branch 'master' into net-next-for-3.13 2014-01-09 23:08:50 -08:00
Masatake YAMATO 56dee73ea1 ss: add unix_seqpacket to the help message and the man page
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-01-09 23:05:26 -08:00
Masatake YAMATO 0d2e01c5ee ss: enable query by type in unix domain related socket
This patch enables -A unix_stream, -A unix_dgram and
-A unix_seqpacket option even if ss gets socket information
via netlink.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-01-09 23:05:26 -08:00
Masatake YAMATO 30b669d7ac ss: handle seqpacket type of unix domain socket
ss didn't distignish seqpacket type from dgram type.
With this patch ss can distignish it.

 $ misc/ss -x -a | grep seq
 u_seq  LISTEN     0      128    /run/udev/control 10966                 * 0
 u_seq  ESTAB      0      0                    * 115103                * 115104
 u_seq  ESTAB      0      0                    * 115104                * 115103

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2014-01-09 23:05:26 -08:00
Vijay Subramanian 80dd880dd0 PIE: Proportional Integral controller Enhanced
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
2014-01-09 22:50:47 -08:00
Jiri Pirko 37c9b94ed2 add support for extended ifa_flags
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-01-09 22:49:29 -08:00
Stephen Hemminger af9cd91228 Update to 3.13-rc6 + net-next headers 2014-01-09 22:45:49 -08:00
Stephen Hemminger ef056b2190 Merge branch 'master' into net-next-for-3.13 2014-01-09 22:44:17 -08:00
Pavel Emelyanov 4de8d8851d iproute: Document the "ip link add index IDX" possibility
Signed-off-by: Pavel Emelyanov <xemul@paralles.com>
2014-01-09 22:42:01 -08:00
Hangbin Liu 1c28bd597b iptunnel: Allow GRE_KEY for vti interface
The vti interface will use GRE_KEY to match the right policy in kernel. So we
can not return fail when the tunnel is vti.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2013-12-28 11:29:53 -08:00
Pavel Emelyanov 5e25cf77b9 iproute: Make it possible to specify index on link creation
The RTM_NEWLINK message accepts ifi_index non-zero value and lets
creation of links with given index (if it's free, or course). This
functionality is available since linux-v3.5.

This patch makes this API available via ip tool.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-28 11:24:11 -08:00
Stephen Hemminger a4c51eb348 update to latest net-next headers 2013-12-28 11:15:10 -08:00
Jamal Hadi Salim f24a7e7205 dont skip action order
attached.

cheers,
jamal
commit 58d78f9f6447df324cdeb99262442c5e3f1f924b
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 10:34:18 2013 -0500

    dont skip displaying of action chains or lists by TCA_ACT_MAX_PRIO

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim b159a7f1ae allow batch gets of actions
Attached.

cheers,
jamal
commit c5f30cabef14c951596210b96bc9b423b0d39592
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Sun Dec 22 10:24:17 2013 -0500

    Allow batching of action gets
    Example:
    ----
    tc actions get \
    action gact index 100 \
    action gact index 4
    ----

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 352f6f97be simple print newline
attached.

cheers,
jamal
commit d7869e6167c3553e93e254940b0647032b40fed8
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 07:46:28 2013 -0500

    print new line at the end for aesthetics

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 4bfb21ca20 policer - retire old syntax
attached.

cheers,
jamal
commit b82057d9ec851a8aba8a295b959190ef5098f330
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sat Dec 21 17:00:11 2013 -0500

    After a decade of trying to deprecate the old policer syntax,
    I believe it is time to kill it. The kernel build option for old
    policer is gone for at least 5 years now (although backward
    compatibility is still there). Being backward compatible meant
    hijacking the keyword "action" and was obstructing policies like:

    tc filter add dev eth0 parent ffff: protocol ip pref 10 \
    u32 match ip protocol 1 0xff flowid 1:10 \
    action skbedit mark 1 \
    action police rate 10kbit burst 10k pipe \
    action skbedit mark 2 \
    action police rate 20kbit burst 20k pipe \
    action action mirred egress mirror dev dummy0

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 02b1d345b7 skbedit print missing metadata
skbedit should print the index and other generic metadata info

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 64b7db4db7 skbedit to default to pipe
Allow skbedit to be used as is in an action chain by default
without need to specify pipe

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Arvid Brodin 5c0aec93a5 ip: Add HSR support
Add basic support for High-Availability Seamless Redundancy (HSR) network
devices.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
2013-12-20 08:33:19 -08:00
Sergey Popovich e0d47aa303 Handle netdev group for veth peer too
Currently ip-link(8) parses, but ignores "group" argument to
peer interface on veth creation.

Insert IFLA_GROUP attribute for peer interface when present.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
2013-12-20 08:27:51 -08:00
Stephen Hemminger be2c3142f9 veth: fix uninitialized arguments
Based on patch by Sergey Popovich <popovich_sergei@mail.ru>
This fixes crash when ip-link(8) invoced with command:

  ip link add dev veth1a type veth peer
2013-12-20 08:25:13 -08:00
Stephen Hemminger d2468da0a3 check return value of rtnl_send and related functions
Use warn_unused_result to enforce checking return value of rtnl_send,
and fix where the errors are.

Suggested by initial patch from Petr Písař <ppisar@redhat.com>
2013-12-20 08:24:44 -08:00
Stephen Hemminger 29cc864089 netconf: add support for neighbor proxy attribute
Report changes to proxy_arp/proxy_ndp attribute.
2013-12-17 22:32:58 -08:00
Stephen Hemminger ec69a50cc8 Update header files to 3.13-rc2 net-next 2013-12-17 22:32:19 -08:00
Stephen Hemminger 4d98ab00de Fix FSF address in file headers 2013-12-06 15:05:07 -08:00
Eric Dumazet 8cecdc2837 tc: more user friendly rates
Display more user friendly rates.

10Mbit is more readable than 10000Kbit

Before :
class htb 1:2 root prio 0 rate 10000Kbit ceil 10000Kbit ...

After:
class htb 1:2 root prio 0 rate 10Mbit ceil 10Mbit ...

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-12-02 23:48:11 -08:00
Yang Yingliang ddc6243e9a tbf: add 64bit rates support
tbf support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc tbf 1: root refcnt 2 rate 40000Mbit burst 230000b peakrate 50000Mbit minburst 87500b lat 50.0ms

This is a followup to ("htb: support 64bit rates").

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Eric Dumazet <edumazet@google.com>
2013-12-02 23:46:56 -08:00
Stephen Hemminger 4a79c7a2dc Update headers to 3.13-rc2 2013-12-02 23:42:58 -08:00
Eric Dumazet 8334bb325d htb: support 64bit rates
Starting from linux-3.13, we can break the 32bit limitation of
rates on HTB qdisc/classes.

Prior limit was 34.359.738.360 bits per second.

lpq83:~# tc -s qdisc show dev lo ; tc -s class show dev lo
qdisc htb 1: root refcnt 2 r2q 2000 default 1 direct_packets_stat 0 direct_qlen 6000
 Sent 6591936144493 bytes 149549182 pkt (dropped 0, overlimits 213757419 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
class htb 1:1 root prio 0 rate 50000Mbit ceil 50000Mbit burst 200000b cburst 0b
 Sent 6591942184547 bytes 149549310 pkt (dropped 0, overlimits 0 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
 lended: 149549310 borrowed: 0 giants: 0
 tokens: 336 ctokens: -164

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-11-22 17:36:18 -08:00
Stephen Hemminger dc0e9c7f22 update to net-next headers 2013-11-22 17:29:02 -08:00
Stephen Hemminger fb876d8996 update kernel headers to 3.13-rc1 2013-11-22 17:22:35 -08:00
Stephen Hemminger a067644497 Merge branch 'net-next-3.11' 2013-11-22 17:20:57 -08:00
Stephen Hemminger 23f7bd8b2e v3.12.0 2013-11-22 17:10:33 -08:00
Sami Kerola ffa35d930b ip: make -resolve addr to print names rather than addresses
As a system admin I occasionally want to be able to check that all
interfaces has a name in DNS or /etc/hosts file.

Signed-off-by: Sami Kerola <kerolasa@iki.fi>
2013-11-22 17:09:25 -08:00
Andreas Henriksson 2a4fa1c305 ss: avoid passing negative numbers to malloc
Example:

$ ss state established \( sport = :4060  or sport = :4061 or sport = :4062  or sport = :4063 or sport = :4064  or sport = :4065 or sport = :4066  or sport = :4067 \)  > /dev/null
Aborted

In the example above ssfilter_bytecompile(...) will return (int)136.
char l1 = 136; means -120 which will result in a negative number
being passed to malloc at misc/ss.c:913.

Simply declare l1 and l2 as integers to avoid the char overflow.

This is one of the issues originally reported in http://bugs.debian.org/511720

Fix the same problem in other code paths as well (thanks to Eric Dumazet).

Reported-by: Andreas Schuldei <andreas@debian.org>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
Reviewed-by: Eric Dumazet <edumazet@google.com>
2013-11-22 17:09:10 -08:00
Hangbin Liu 9787033481 ipaddrlabel: use uint32_t instead of int32_t
As both linux kernel and function ipaddrlabel_modify use unsigned int for
label. We should also use unsigned int value when print addrlabel in case of
misunderstanding.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2013-11-22 17:08:28 -08:00
Sami Kerola fa10855a7e ip: make -resolve addr to print names rather than addresses
As a system admin I occasionally want to be able to check that all
interfaces has a name in DNS or /etc/hosts file.

Signed-off-by: Sami Kerola <kerolasa@iki.fi>
2013-11-22 17:04:06 -08:00
Andreas Henriksson f26ef6ec09 ss: avoid passing negative numbers to malloc
Example:

$ ss state established \( sport = :4060  or sport = :4061 or sport = :4062  or sport = :4063 or sport = :4064  or sport = :4065 or sport = :4066  or sport = :4067 \)  > /dev/null
Aborted

In the example above ssfilter_bytecompile(...) will return (int)136.
char l1 = 136; means -120 which will result in a negative number
being passed to malloc at misc/ss.c:913.

Simply declare l1 and l2 as integers to avoid the char overflow.

This is one of the issues originally reported in http://bugs.debian.org/511720

Fix the same problem in other code paths as well (thanks to Eric Dumazet).

Reported-by: Andreas Schuldei <andreas@debian.org>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
Reviewed-by: Eric Dumazet <edumazet@google.com>
2013-11-22 17:03:23 -08:00
Hangbin Liu bc7635a8b3 ipaddrlabel: use uint32_t instead of int32_t
As both linux kernel and function ipaddrlabel_modify use unsigned int for
label. We should also use unsigned int value when print addrlabel in case of
misunderstanding.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2013-11-22 17:03:15 -08:00
Daniel Borkmann d05df6861f tc: add cls_bpf frontend
This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2013-10-30 16:45:05 -07:00
Jiri Pirko cc26a8909f iplink: add support for bonding netlink
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2013-10-30 16:45:04 -07:00
Stephen Hemminger 793da0e702 Update kernel headers
Lastest from net-next
2013-10-30 16:42:03 -07:00
Stephen Hemminger f1f1aeb2ad Merge branch 'master' into net-next-3.11
Conflicts:
	tc/q_fq.c
2013-10-30 16:41:07 -07:00
Nigel Kukard 9bea14ff6b Fix tc stats when using -batch mode
There are two global variables in tc/tc_class.c:
__u32 filter_qdisc;
__u32 filter_classid;

These are not re-initialized for each line received in -batch mode:
class show dev eth0 parent 1: classid 1:1
class show dev eth0 parent 1: classid 1:1
Error: duplicate "classid": "1:1" is the second value.

This patch fixes the issue by initializing the two globals when we
enter print_class().

Signed-off-by: Nigel Kukard <nkukard@lbsd.net>
2013-10-30 16:37:07 -07:00
WANG Cong aa574cd60e vxlan: add ipv6 support
The kernel already supports it, so add the support
to iproute2 as well.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2013-10-30 16:37:05 -07:00
Stephen Hemminger 03ddbbd5ad update kernel headers 2013-10-30 16:36:47 -07:00
Stephen Hemminger 734c0ca2ca htb: remove old unused duplicate qdisc name
Alexey had htb2 as name for version in ancient code.
2013-10-27 12:28:38 -07:00
Stephen Hemminger 0a502b21e3 Fix handling of qdis without options
Some qdisc like htb want the parse_qopt to be called even if no options
present. Fixes regression caused by:

e9e78b0db0 is the first bad commit
commit e9e78b0db0
Author: Stephen Hemminger <stephen@networkplumber.org>
Date:   Mon Aug 26 08:41:19 2013 -0700

    tc: allow qdisc without options
2013-10-27 12:26:47 -07:00
Nicolas Dichtel 1253a10a63 iplink: update available type list
macvtap and vti were missing.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-10-09 15:29:09 -07:00
Christophe Gouault b557416532 xfrm: enable to set non-wildcard mark 0 on SAs and SPs
ip xfrm considers that the user-defined mark is "any" as soon as
(mark.v & mark.m == 0), which prevents from specifying non-wildcard
marks that include the value 0 (typically 0/0xffffffff).

Yet, matching exactly mark 0 is useful for instance to separate
vti policies from global policies.

Always configure the user mark if mark.m != 0.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
2013-10-09 15:29:05 -07:00
xeb@mail.ru 9abde37cde iproute2: ip6gre: update man pages
Update man pages with ip6gre info.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
2013-10-04 11:26:09 -07:00
Stephen Hemminger 4e20cc55e9 ipv6 gre: add entry to ether types 2013-09-30 21:40:05 -07:00
xeb@mail.ru af89576d7a iproute2: GRE over IPv6 tunnel support.
GRE over IPv6 tunnel support.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
2013-09-30 21:33:55 -07:00
Jamal Hadi Salim e26520e5c1 action: typo nat fix
If you taketh you giveth.
I Went the LinuxWay and copied this for m_simple.c and noticed
this one typo (I wonder where it came from?;->).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:31:40 -07:00
Jamal Hadi Salim 087f46ee4e tc: introduce simple action
Simple action is already in the kernel for years now as an
example. This complements it with user space control.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:29:34 -07:00
Stephen Hemminger 29ff4d2e11 tc: add default action to kernel headers 2013-09-30 21:28:09 -07:00
Fan Du 99500b56d9 xfrm: use memcpy to suppress gcc phony buffer overflow warning.
This bug is reported from below link:
https://bugzilla.redhat.com/show_bug.cgi?id=982761

An simplified command from its original reproducing method in bugzilla:
ip xfrm state add src 10.0.0.2 dst 10.0.0.1 proto ah spi 0x12345678 auth md5 12
will cause below spew from gcc.

Reported-by: Sohny Thomas <sthomas@linux.vnet.ibm.com>
2013-09-30 21:09:05 -07:00
Petr Písař 101847446e iproute2: bridge: Close file with bridge monitor file
The `bridge monitor file FILENAME' reads dumped netlink messages from
a file. But it forgot to close the file after using it. This patch
fixes it.

Signed-off-by: Petr Písař <ppisar@redhat.com>
2013-09-30 21:00:06 -07:00
Stephen Hemminger f3abcfed46 lnstat, nstat, ifstat: update man pages
New json option
2013-09-24 12:00:41 -07:00
Stephen Hemminger a4f9e8df37 lnstat: add json output format 2013-09-24 11:55:27 -07:00
Stephen Hemminger ec3e625c41 ifstat: add json output format 2013-09-24 11:55:13 -07:00
Stephen Hemminger 404582c8eb nstat: revise json output
Also add long options
2013-09-24 11:54:45 -07:00
Stephen Hemminger f4e75c6f40 Merge /tmp/iproute2 2013-09-24 10:17:18 -07:00
Stephen Hemminger af60cf40c9 Merge branch 'net-next-3.11' 2013-09-23 13:16:48 -07:00
Stephen Hemminger 6b2ed935ae Update to 3.12-rc1 headers 2013-09-23 13:15:49 -07:00
Eric Dumazet b43f331828 htb: add support for direct_qlen attribute
TCA_HTB_DIRECT_QLEN attribute is supported since linux-3.10

HTB classes use an internal pfifo queue, which limit was not reported
by tc, and value inherited from device tx_queue_len at setup time.

With this patch, tc displays the value and can change it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:48:13 -07:00
Eric Dumazet 8f7574edd8 tc: support TCA_STATS_RATE_EST64
Since linux-3.11, rate estimator can provide TCA_STATS_RATE_EST64
when rate (bytes per second) is above 2^32 (~34 Mbits)

Change tc to use this attribute for high rates.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:46:33 -07:00
Petr Písař 54e9c3a34d iproute2: bridge: document mdb
This augments bridge(8) manual page with `bridge mdb' and `bridge
monitor mdb' commands which have been added recently.

Signed-off-by: Petr Písař <ppisar@redhat.com>
2013-09-20 09:45:35 -07:00
Eric Dumazet bc113e46a3 pkt_sched: fq: Fair Queue packet scheduler
Support for FQ packet scheduler

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows (511 inactive, 0 throttled)
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

limit	: max number of packets on whole Qdisc (default 10000)

flow_limit : max number of packets per flow (default 100)

quantum : the max deficit per RR round (default is 2 MTU)

initial_quantum : initial credit for new flows (default is 10 MTU)

maxrate : max per flow rate (default : unlimited)

buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

[no]pacing : disable/enable pacing (default is enable)

Usage :

tc qdisc add dev $ETH root fq

tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
  tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:43:40 -07:00
Stephen Hemminger d48ed3f451 nstat: add json output format
New command line flag to output statistics in JSON format.
In our envrionment, we have scripts that parse output of commands.
It is better to use a format supported by existing parsers.
2013-09-13 10:31:41 -07:00
Eric Dumazet 6d64ec0237 pkt_sched: fq: Fair Queue packet scheduler
Support for FQ packet scheduler

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows (511 inactive, 0 throttled)
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

limit	: max number of packets on whole Qdisc (default 10000)

flow_limit : max number of packets per flow (default 100)

quantum : the max deficit per RR round (default is 2 MTU)

initial_quantum : initial credit for new flows (default is 10 MTU)

maxrate : max per flow rate (default : unlimited)

buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

[no]pacing : disable/enable pacing (default is enable)

Usage :

tc qdisc add dev $ETH root fq

tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
  tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-03 08:48:35 -07:00
Stephen Hemminger c92716d1ed Update to 3.11 net-next kernel headers 2013-09-03 08:46:40 -07:00
Stephen Hemminger c5e3ee2c1f Merge branch 'master' into net-next-3.11 2013-09-03 08:45:27 -07:00
Stephen Hemminger d3c77c46cd v3.11.0 2013-09-03 08:23:03 -07:00
Jesper Dangaard Brouer 3e92ff522a linklayer interface between kernel and tc/userspace
This iproute2 tc patch is connected to the kernel
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The rate table calculated by tc, have gotten replaced in the kernel
and is no-longer used for lookups.

This happened in kernel release v3.8 caused by kernel
 - commit 56b765b79 ("htb: improved accuracy at high rates").
This change unfortunately caused breakage of tc overhead and
linklayer parameters.

 Kernel overhead handling got fixed in kernel v3.10 by
 - commit 01cb71d2d47 (net_sched: restore "overhead xxx" handling)

 Kernel linklayer handling got fixed in kernel v3.11 by
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The linklayer fix introduced a struct change, that allow the linklayer
attribute to be transferred between tc and kernel. This patch make use
of this linklayer attribute.

The linklayer setting is transfer to the kernel.  And linklayer
setting received from the kernel is printed with a prefixed
"linklayer" when listing current configuration.  The default
TC_LINKLAYER_ETHERNET is only printed in detailed output mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
2013-09-03 08:21:24 -07:00
Nicolas Dichtel 3c61c01a66 ipnetns: fix ip batch mode when using 'netns exec'
Since commit a05f6511f5, ip batch mode is broken when using 'netns exec' cmd.

When WIFEXITED() returns true, it means that the child exited normally, hence
we must not call exit() but just returns the status. If we call exit, the next
commands in the file file are not executed.
If WIFEXITED() returns false, we can call exit() because it means that the
child failed.

This patch partially reverts commit a05f6511f5.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-09-03 08:20:16 -07:00
Stephen Hemminger efa4dde4c7 Update kernel headers to 3.11
Last minute addition to pkt_sched.h
2013-09-03 08:18:25 -07:00
Thomas Egerer 1ed509bb52 ip/xfrm: Fix potential SIGSEGV when printing extra flags
The git-commit dc8867d0, that added support for displaying the
extra-flags of a state, introduced a potential segfault.
Trying to show a state without the extra-flag attribute and show_stats
enabled, would cause the NULL pointer in tb[XFRMA_SA_EXTRA_FLAGS] to be
dereferenced.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
2013-08-31 10:33:21 -07:00
Lutz Jaenicke 7dc0481aa1 macvlan: fix typo in macvlan_print_opt()
The mode information is contained in IFLA_MACVLAN_MODE instead
of IFLA_VLAN_ID (both evaluating to "1" in their enums).

Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com>
2013-08-31 10:30:11 -07:00
Richard Godbee 4b8000f37a iproute2: ip-route.8.in: minor fixes
In SYNOPSIS section:

 - Add 'reordering'
 - Add missing '[' before 'quickack'

Signed-off-by: Richard Godbee <richard@godbee.net>
2013-08-31 10:30:06 -07:00
Richard Godbee 30d07e9e36 iproute2: spelling: noptmudisc -> nopmtudisc
Signed-off-by: Richard Godbee <richard@godbee.net>
2013-08-31 10:30:03 -07:00
Richard Godbee 8f48063721 iproute2: iproute.c: fix usage() spacing problems
Fix two spacing problems around square brackets in usage text.

Signed-off-by: Richard Godbee <richard@godbee.net>
2013-08-31 10:30:01 -07:00
Stephen Hemminger 001856532f add ability to filter neighbour discovery by protocol
Useful to be able to monitor arp and IPv6 nd seperately.
Default is both.
2013-08-29 12:18:52 -07:00
Stephen Hemminger e9e78b0db0 tc: allow qdisc without options
Pfifo_fast needs no options. So don't force it to have parsing code.
2013-08-26 08:41:19 -07:00
Martin Schwenke 488c41d216 ip: Add label option to ip monitor
Prefix labelling is currently only activated when monitoring "all"
objects.  However, the output can still be confusing when monitoring
more than 1 object, so add an option to always print prefix labels.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2013-08-19 08:57:24 -07:00
Stephen Hemminger b8a45897b9 More minor spelling fixes 2013-08-04 15:10:05 -07:00
Stephen Hemminger d259f0302f Fix spelling errors
Minor errors found by codespell
2013-08-04 15:00:56 -07:00
Stephen Hemminger ac3ff72032 cleanup help message
Split it naturally
2013-08-04 15:00:42 -07:00
Thomas Richter 5464049b47 iproute vxlan add support for fdb replace command
Add support for the bridge fdb replace command to replace an
existing entry in the vxlan device driver forwarding data base.
The entry is identified with its unicast mac address and its
corresponding remote destination information is updated.

This is useful for virtual machine migration and replaces the
bridge fdb del and bridge fdb add commands.

It follows the same interface as ip neigh replace commands.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
2013-08-04 11:56:54 -07:00
Stefan Tomanek b1d0525f9c ip rule: add route suppression options
When configuring a system with multiple network uplinks and default routes, it
is often convenient to reference a routing table multiple times - but reject
its routing decision if certain constraints are not met by it.

Consider this setup:

$ ip route add table secuplink default via 10.42.23.1

$ ip rule add pref 100            table main suppress_prefixlength 0
$ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By suppressing entries with a prefixlength of 0 (or less), the
default route (/0) of the table "main" is hidden to packets processed by rule
100; packets traveling to destinations via more specific routes are processed
as usual.

It is also possible to suppress a routing entry if a device belonging to
a specific interface group is to be used:

$ ip rule add pref 150 table main suppress_group 1

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
2013-08-04 11:54:15 -07:00
Stephen Hemminger 7bfc49cf47 Merge branch 'master' into net-next-3.11 2013-08-04 11:53:10 -07:00
Stefan Tomanek c4fdf75d3d ip link: fix display of interface groups
This change adds the interface group to the output of "ip link show".

It also makes "ip link" print _all_ devices if no group filter is specified;
previously, only interfaces of the default group (0) were shown.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
2013-08-04 11:50:38 -07:00
Stephen Hemminger 5318b2c667 Update kernel headers to net-next for 3.12 2013-08-04 11:43:02 -07:00
Stephen Hemminger 3140e9a3a3 Remove -Werror
-Werror just doesn't work because it changes too much
between compiler versions.
2013-07-31 17:42:39 -07:00
Nicolas Dichtel 77620be89a ip: allow to specify mode for sit tunnels
It's now possible to have IPv4 and IPv6 over IPv4 tunnels with the module sit.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-07-26 14:30:36 -07:00
Nicolas Dichtel 973eb50b18 ipadress: fix display of IPv6 peer address
Because only IPv4 was supported, the size was static. Now, IPv6 also supports
peer address.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-07-26 14:27:19 -07:00
Atzm Watanabe 7cfa3802ca vxlan: Allow setting destination to unicast address.
This patch allows setting VXLAN destination to unicast address.
It allows that VXLAN can be used as peer-to-peer tunnel without
multicast.

v6: change back to the v3 except for using new attribute because
    replacing command-line parameters breaks existing scripts,
    based by Cong Wang's comments.

v5: rebase on the latest.

v4: replace "group" with "remote" based by David Stevens's comments.

v3: move a new attribute REMOTE into the last of an enum list
    based by Stephen Hemminger's comments.
    fix the usage to show explicitly that both "remote" and "group"
    cannot be specified, based by Ben Hutchings's comments.

v2: use a new argument "remote" instead of "group" based by
    Stephen Hemminger's comments.

Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
2013-07-26 14:25:42 -07:00
Stephen Hemminger cc71ad3ddd Merge branch 'net-next-3.10' 2013-07-16 10:20:31 -07:00
Stephen Hemminger ecefa08c10 Update to 3.11-rc1 kernel headers
Sanitized headers from upstream
2013-07-16 10:19:56 -07:00
Stephen Hemminger 63c9e8555d v3.10.0 2013-07-16 10:06:36 -07:00
Stephen Hemminger a3aa47a559 Make tc and ip batch mode consistent
Change the code for tc and ip so that batch mode is handled
the same.
2013-07-16 10:04:05 -07:00
Stephen Hemminger 09154ec15f ip: add batch mode to man page 2013-07-13 10:02:03 -07:00
Stephen Hemminger a05f6511f5 netns: follow return value conventions of the rest of the code
The netns code was using EXIT_SUCCESS/EXIT_FAILURE but the rest of the ip
code used -1 explictly, so change to follow convention. Also, certain types
of errors like fork failure should abort a batch operation, rather than just
returning an error.
2013-07-12 08:43:23 -07:00
esr@thyrsus.com 1284fd3a81 ip-rule.8: Fix presentational use of .SS. 2013-07-12 08:33:09 -07:00
esr@thyrsus.com ee0b0a9ec9 In tc-ematch.8, remove no-op .ti requests to prevent translation warnings
These do nothing on an 80-column display. They were clearly somebody's
boilerplate way of setting up hanging indents, but the syntax lines
are way too short to require them. And since most were argumentless
they would have been no-ops on any sized display.
2013-07-12 08:33:08 -07:00
Thomas Richter 2816a56879 iproute2 vxlan documentation update for ip command
The ip link command line help and the ip-link.8.in
man page are outdated in regards to the vxlan support.
The patch updates both the command line help for the
ip command and its man page.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
2013-07-09 09:38:41 -07:00
Thomas Richter 7578ae8807 iproute2 vxlan documentation update for bridge command
The bridge fdb command line help and the bridge.8
man page are outdated in regards to the vxlan support.
The patch updates both the command line help for the
bridge command and its man page.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
2013-07-09 09:38:36 -07:00
JunweiZhang 95592b47be ipbatch: fix use of 'ip netns exec'
execvp() does not return when the command succeed, hence all commands in the
batch file after the line 'ip netns exec' are not executed.

Let's fork before calling execvp() if batch mode is used..

Example:
$ cat test.batch
netns add netns1
netns exec netns1 ip l
netns
$ ip -b test.batch
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT
    link/sit 0.0.0.0 brd 0.0.0.0

All command after 'netns exec' are never executed.

With the patch:
$ ip -b test.batch
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT
    link/sit 0.0.0.0 brd 0.0.0.0
netns1

Now, existing netns are displayed.

Signed-off-by: JunweiZhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-07-09 09:14:10 -07:00
Amerigo Wang 86c00faae2 iptunnel: check SIT_ISATAP flag only for SIT tunnel
Without patch, I got:

	# ./ip/ip tunnel show
	ip_vti0: ioctl 89f4 failed: Invalid argument
	ip_vti0: ip/ip  remote any  local any  ttl inherit  nopmtudisc key 0

this is due to VTI_ISVTI has the same numeric value with SIT_ISATAP,
but only sit tunnel has SIOCGETPRL, therefore it should check for SIT
tunnel first.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
2013-07-09 09:08:14 -07:00
esr@thyrsus.com 11eb939653 tc-stab.8: Fix synopsis errors, an invalid escape, and incorrect English usge.
The command synopsis is regularized and part of it split off into an
OPTIONS section.  This allows the page to lift to XML-DocBook.

An invalid \p escape was removed.

This page was written by someone who didn't understand the use of
definite and indefinite articles in English, nor its punctuation rules.
I've fixed these mistakes, and some glitches in punctuation and
capitalization.
2013-07-09 09:07:41 -07:00
Adam Borowski 5d8a75293c ip: fix build failure if time_t is not long int
This includes x32, and, per Linus' decree, any future arch with longs
shorter than 64 bits.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
2013-06-25 13:36:56 -07:00
Eric Dumazet 260804f422 ss: add more TCP_INFO components
Allow ss -i to display more TCP informations :

unacked:N   Number of un-acked packets
retrans:X/Y   X: number of outstanding retransmit packets
              Y: total number of retransmits for the session
lost:N       Number of lost packets (tcpi_lost)
sacked:N     Number of sacked packets (tcpi_sacked)
facked:N     Number of facked packets (tcpi_facked)
reordering:N Reordering level (if different of 3)

Example :

$ ss -emoi dst 10.7.7.83
tcp   ESTAB      0      1154056   10.7.7.84:54127    10.7.7.83:34342
timer:(on,200ms,0) ino:57003 sk:ffff88063c51d0c0 <->
	 skmem:(r0,rb89280,t0,tb2097152,f726504,w1436184,o0,bl0) ts sack cubic
wscale:7,6 rto:310 rtt:107.375/1 mss:1448 cwnd:568 ssthresh:108 send
61.3Mbps unacked:568 retrans:0/21 reordering:127 rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
2013-06-25 13:33:07 -07:00
esr@thyrsus.com 61f541fe12 First set of manpage markup fixes
Enclosed patch fixes inappropriate uses of the .SS macro.  Fuller explanation
in the change comment.

There are other problems in these pages that block lifting to
XML-DocBook, most notably in the command synopses.  They will take
some creativity to fix.  I'm working on it

>From 75745adba4b45b87577b61a2daa886dd444f44da Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Fri, 21 Jun 2013 15:27:38 -0400
Subject: [PATCH] Abolish presentation-level misuse of the .SS macro.

This change fixes most (but not all) fatal errors in attempts to lift
the iproute2 manual pages to XML-DocBook.  Where .SS is still used it
is a real subsection header, not just a way to outdent and bold text.
Presentation-level instances are turned into .TP calls and tables.
2013-06-24 17:00:54 -07:00
Patric McHardy 8fd8f6ed71 ip: iplink_vlan: add 802.1ad support
The following patch adds support to ip_vlan for configuring VLAN 802.1ad
support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2013-06-21 08:59:24 -07:00
Cong Wang b37f2c895d add quickack option to ip route
This patch adds quickack option to enable/disable TCP quick ack
mode for per-route.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
2013-06-20 08:35:21 -07:00
Rony Efraim 07fa9c1529 Add VF link state control
Add link state per VF command

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
2013-06-19 18:14:39 -07:00
Stephen Hemminger 318ad9d745 update to 3.10-net-next headers 2013-06-19 18:14:17 -07:00
Eric Dumazet a303853e84 get_rate: detect 32bit overflows
On Mon, 2013-06-03 at 16:36 +0100, Ben Hutchings wrote:

> Oops, I read this as being strtol() currently, not strtod().  Currently
> '1.5gbit' will work, but this change will break that.  So I think you
> need to keep bps as a double.

Arg

> Then here I think the check should be *rate != floor(bps), i.e. accept
> rounding down of a non-integer number of bytes but any other change is
> assumed to be overflow.

Thanks Ben, here is v4 then ;)

[PATCH v4] get_rate: detect 32bit overflows

Current rate limit is 34.359.738.360 bit per second, and
unfortunately 40Gbps links are above it.

overflows in get_rate() are currently not detected, and some
users are confused. Let's detect this and complain.

Note that some qdisc are ready to get extended range, but this will
need additional attributes and new iproute2

With help from Ben Hutchings

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
2013-06-07 09:24:56 -07:00
Stephen Hemminger 22fa92e367 htb: fix indentation
iproute2 uses kernel style indenting
2013-06-07 08:54:45 -07:00
Eric Dumazet 44f1ff0afc htb: report overhead attribute
"tc class show dev ..." omits the overhead attribute for HTB.

After patch I have :

tc class add dev $DEV parent 1: classid 1:1 est 1sec 4sec htb \
    rate 12Mbit mtu 1500 quantum 1514 overhead 20

tc class show dev $DEV
class htb 1:1 root prio 0 rate 12000Kbit overhead 20 ceil 12000Kbit
burst 1500b cburst 1500b

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-06-07 08:53:53 -07:00
Andrey Vagin ecb928c876 ss: Get netlink sockets info via sock-diag (v2)
v2: update netlink_diag.h

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-05 08:54:35 -07:00
Andrey Vagin f271fe011a ss: show destination address for netlink sockets
A netlink socket may be connected to a specific group.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-05 08:54:35 -07:00
Andrey Vagin 129709aea1 ss: create a function to print info about netlink sockets
It will be reused for printing info about netlink sockets, when
socket diag is used for retrieving information.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-05 08:54:35 -07:00
Andrey Vagin d8402b9641 ss: handle socket diag request in a separate function
It will be reused to show netlink sockets

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-05 08:54:35 -07:00
Andrey Vagin bcb9d40319 ip: set the close-on-exec flag for descriptors
Otherwise a program executed by "ip netns exec" has two extra
descriptors.

$ ip netns exec test /bin/bash
$ lsof -p $$
...
bash    817 root    0u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    1u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    2u   CHR  136,0       0t0          3 /dev/pts/0
bash    817 root    3u  sock    0,6       0t0      13386 protocol: NETLINK
bash    817 root    4r   REG    0,3         0 4026532155 net
bash    817 root  255u   CHR  136,0       0t0          3 /dev/pts/0

Cc: Stephen Hemminger <stephen@networkplumber.org>
Reported-by: Dilip Daya <dilip.daya@hp.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2013-06-04 09:11:06 -07:00
Andreas Henriksson c083d99dd3 iproute2: fix build failure on sparc due to -Wformat and tv_usec
tv_usec is "suseconds_t" which is apparently usually
a signed long, but sometimes not....
Change the printf modifier to use signed and
cast the tv_usec to long in case it's not already long.

gcc -Wall -Wstrict-prototypes -Werror -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -fPIC   -c -o utils.o utils.c
utils.c: In function 'print_timestamp':
utils.c:802:2: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type '__suseconds_t' [-Werror=format]
cc1: all warnings being treated as errors

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2013-06-03 19:56:25 -07:00
John Fastabend a40d0827a5 iproute2: bridge: fix 'bridge link' setlink/getlink parsing
Use IFLA_AF_SPEC nested attributes to lookup bridge mode and when
doing strcmp() check for equality.

These appear to be typos from the original commit,

commit 64108901b7
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Fri Mar 15 10:01:28 2013 -0700

    bridge: Add support for setting bridge port attributes

Also set flags to BRIDGE_FLAGS_SELF instead of using OR operation.
This allows setting the bridge mode when not being used with a
master device.

To allow setting both master and self devices simultaneously we
will need to add a {self|master} field similar to fdb commands.
For now the command sets are mutually exclusive as noted in the
original commit.

With this patch 'bridge link set' works now,

# ./bridge/bridge link set dev veth1 cost 3
# ./bridge/bridge link show
10: veth1 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 3 cost 3

CC: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2013-06-03 19:55:32 -07:00
Stephen Hemminger 45a3b3fcd6 man: get rid of useless reference to GNU style options
No need to state the obvious here.
2013-05-28 08:47:56 -07:00
Sriram Narasimhan c41e038f48 iptuntap: allow creation of multi-queue tun/tap device
This patch adds multi_queue option to ip tuntap.
This allows IFF_MULTI_QUEUE flag to be specified during
tun/tap device creation enabling multi-queue support in tun/tap
device.

Example: ip tuntap add dev tap0 mode tap multi_queue

Signed-off-by: Sriram Narasimhan <sriram.narasimhan@hp.com>
2013-05-24 08:12:52 -07:00
Nicolas Dichtel f3c2f91e22 man: describe --bpf option of ss
This option has been recently added to ss utility.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-05-24 08:11:59 -07:00
Rami Rosen b0f01cf60e ss: replace bfp with bpf in usage().
This patch fixes usage() of misc/ss.c to use bpf instead of bfp.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
2013-05-24 08:11:01 -07:00
Stephen Hemminger 92deabcf29 vxlan: remove dstport option
Dstport option does not work as expected in 3.10
It only allows setting port for sending and does not enable incoming
receive.
2013-05-23 10:21:15 -07:00
Pavel Emelyanov 5b81604753 ss: Show inet and unix sockets' shutdown state
When extended info is requested (-e option) one will be able to observe
arrows in the output, like this:

ESTAB 0 0  127.0.0.1:41705  127.0.0.1:12345  ino:143321 sk:ffff88003a8cea00 -->
ESTAB 0 0  127.0.0.1:46925  127.0.0.1:12346  ino:143322 sk:ffff88003a8ce4c0 <--
ESTAB 0 0  127.0.0.1:51678  127.0.0.1:12347  ino:143323 sk:ffff88003a8cdf80 ---
ESTAB 0 0  127.0.0.1:46911  127.0.0.1:12348  ino:143324 sk:ffff88003b7f05c0 <->

for SHUT_RD, SHUT_WR, SHUT_RDWR and non-shutdown sockets respectively.

The respective nlattrs in *_diag messages has appeared in Linux v3.7 and
are already present in ss's headers.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-17 08:46:51 -07:00
Nicolas Dichtel 372c30d2aa ss: allow to retrieve AF_PACKET info via netlink
This patch add support of netlink messages for AF_PACKET and thus it allows
to get filter information of this kind of sockets.
To dump these filters info the option --bfp must be specified and the user
must have admin rights.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-05-17 08:42:34 -07:00
Nicolas Dichtel f7431e2913 ipnetconf: by default dump all entries
This is now possible, because the dump function has been added in kernel.
Note that IPv4 and IPv6 entries are displayed.

Before this patch, only all entries were displayed.

Example:
$ ip netconf
ipv4 dev lo forwarding on rp_filter off mc_forwarding 0
ipv4 dev eth0 forwarding on rp_filter off mc_forwarding 1
ipv4 all forwarding on rp_filter off mc_forwarding 1
ipv4 default forwarding on rp_filter off mc_forwarding 0
ipv6 dev lo forwarding on mc_forwarding 0
ipv6 dev eth0 forwarding on mc_forwarding 0
ipv6 all forwarding on mc_forwarding 0
ipv6 default forwarding on mc_forwarding 0

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-05-17 08:38:36 -07:00
Nicolas Dichtel dc8867d0ff ip/xfrm: all to set flag XFRM_SA_XFLAG_DONT_ENCAP_DSCP
For the display part, we print extra-flags only if show_stats is set, like for
standard flags.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-05-17 08:38:26 -07:00
Stephen Hemminger 5cf35d6ad7 add BPF header files
For later ss change.
2013-05-17 08:36:52 -07:00
Stephen Hemminger 2a126a85fe vxlan: nag user to set port value
This change shifts burden onto the users to choose the UDP port value.
Kernel default value is incorrect UDP port 5287 but now there is
an official assigned port for VXLAN.

The kernel can't change because of legacy compatibility
but new deployments should not use the legacy port value.
2013-05-15 15:09:57 -07:00
David L Stevens 5b8a1d4a03 iproute2: support NTF_ROUTER flag in VXLAN fdb entries
This patch allows setting the "NTF_ROUTER" flag in VXLAN forwarding table
entries to enable L3 switching for router destinations while still allowing
L2 redirection appliances for non-router MAC destinations.

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
2013-05-06 07:54:44 -07:00
Eric Dumazet 9cb1eccf69 ss: add fastopen support
ss -i can output "fastopen" attribute if socket used Fast Open

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-05-03 20:48:30 -07:00
David Stevens 5f409678eb iproute2: generalize VXLAN forwarding tables
iproute2 patch to generalize VXLAN forwarding tables

This is the iproute2 support allowing an administrator to specify alternate
ports, vnis and outgoing interfaces for VXLAN device forwarding tables.

Changes since v3: changed NDA_PORT to be 16-bit network byte order to match
	changed byte-order/size in the VXLAN driver.

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
2013-05-03 13:20:34 -07:00
Stephen Hemminger d85e0a59d4 Add vxlan destination port option
Add ability to set UDP destination port on a per device basis.
If no port is assigned, the default IANA assigned port will be used.
If you want the kernel default value, then use port 0.

Source port range option is now called 'srcport', to avoid
confusion. The old option syntax is accepted for compatiablity.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-05-03 13:18:45 -07:00
Daniel Borkmann 191b60bd73 ip: ipv6: add tokenized interface identifier support
This patch adds support for tokenized IIDs, that enable
administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix
from Router Advertisements. This is the iproute2 part for
the kernel patch f53adae4eae5 (``net: ipv6: add tokenized
interface identifier support'').

Example commands with iproute2:

Setting a device token:
  # ip token set ::1a:2b:3c:4d/64 dev eth1

Getting a device token:
  # ip token get dev eth1
  token ::1a:2b:3c:4d dev eth1

Listing all tokens:
  # ip token list  (or: ip token)
  token :: dev eth0
  token ::1a:2b:3c:4d dev eth1

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-05-03 13:17:21 -07:00
Stephen Hemminger 79e9a1db11 Update headers to 3.10
Merge in kernel sanitized headers from upstream
2013-05-03 13:15:36 -07:00
Nicolas Dichtel b0a9dbb816 ip: add missing help about mode argument
There is three possibilities: only IPv6, only IPv4 or both.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-05-03 12:29:22 -07:00
Kamil Rytarowski 824c843556 iproute2 patch against GCC 4.8.0
Hello!

I'm attaching a patch [1] "Feed GCC 4.8.0 against new warning that is
shipped with -Wall: -Wsizeof-pointer-memaccess.".

More details: http://gcc.gnu.org/gcc-4.8/porting_to.html

Regards,

[1] 0001-Feed-GCC-4.8.0-against-new-warning-that-is-shipped-w.patch

>From 1f3ea01fe2ff61cbbca6474f7d9903a0756a4f44 Mon Sep 17 00:00:00 2001
From: Kamil Rytarowski <n54@gmx.com>
Date: Fri, 3 May 2013 18:43:38 +0200
Subject: [PATCH] Feed GCC 4.8.0 against new warning that is shipped with
 -Wall: -Wsizeof-pointer-memaccess.
2013-05-03 12:10:09 -07:00
Alexander Duyck cfa292defa iproute2: act_ipt fix xtables breakage on older versions.
In trying to build on a RHEL6.3 I ran into several build issues that are
addressed in this patch.

The first is that xtables_merge_options only has 3 parameters.  It appears
this is how this code was originally.  As such for the case where the version
is less than 6 I am assuming it would be correct to maintain the original
setup that only had 3 parameters being passed instead of 4.

I also ran into an issue with the define for __ALIGN_KERNEL not being present.
I believe this may be due to the fact that __ALIGN_KERNEL was moved into a
separate header from ALIGN after the UAPI changes.  In order to just cover all
of the bases I have moved the main definition for the macros into
__ALIGN_KERNEL_MASK and __ALIGN_KERNEL and if ALIGN is also needed then it is
just a direct redefine to __ALIGN_KERNEL.

Cc: Hasan Chowdhury <shemonc@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-05-01 08:01:47 -07:00
Stephen Hemminger 74c2f602f6 v3.9.0 2013-04-30 07:47:54 -07:00
Alexander Duyck 63338dca45 libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter
This change corrects a kernel incompatibility that was resulting in the
ext_filter_mask not being correctly discovered by the kernel as it is buried
somewhere in the ifinfomsg.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
2013-04-26 16:40:30 -07:00
Stephen Hemminger 03fdb011dd ipnetns: fix build on older systems
Debian Squeeze has out of date <sys/mount.h> without the required flags.
2013-04-17 13:35:48 -07:00
Stephen Hemminger 2f9e88f3c9 Revert "add linux/fs.h"
This reverts commit 5abe4685b6.
2013-04-17 13:30:17 -07:00
Stephen Hemminger 5abe4685b6 add linux/fs.h
The ipnetns nees MS_SLAVE, MS_SHARED etc definitions which
are in include/linux/fs.h.
2013-04-17 13:26:47 -07:00
Stephen Hemminger 697ac63905 utils: fix range checking for get_u32/get_u64 et all
Be more careful about overflow in strtoXX routines.
Checks are based on documented interface on man pages.
Based on suggestion from "Mr Dash Four".

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-04-12 11:40:57 -07:00
Hubert Kario ce93fffe82 add short description of batch mode in tc man page
The tc command is missing documentation of -batch and -force switches
that are returned by "tc -help".
Add short description on their syntax and usage.
2013-04-12 09:07:09 -07:00
Petr Sabata 6274b0b759 iproute2: Fix some manpage typos
This patch fixes some of the typos found in iproute2
documentation.

Signed-off-by: Petr Šabata <contyk@redhat.com>
2013-04-05 09:30:05 -07:00
Stephen Hemminger a6d55bf0a0 Update kernel headers to 3.9-rc5 2013-04-01 11:56:36 -07:00
Stephen Hemminger f0124b0f0a ip: remove unnecessary ll_init_map
Don't call ll_init_map on modify operations
Saves significant overhead with 1000's of devices.
2013-03-28 15:17:47 -07:00
Stephen Hemminger 0025e5d63d ll_map: add name and index hash
Make ll_ functions faster by having a name hash, and allow
for deletion. Also, allow them to work without calling ll_init_map.
2013-03-28 14:57:28 -07:00
Nicolas Dichtel 16f02e145e libnetlink: check flag NLM_F_DUMP_INTR during dumps
When this flag is set, it means that dump was interrupted and result may be
inconsistent.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-03-28 14:44:41 -07:00
David Ward e8740e42ec ip/xfrm: Improve error strings
Quotation marks are now used only to indicate literal text on the
command line.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:42:32 -07:00
David Ward 29665f92c7 ip/xfrm: Improve usage text and documentation
Change ALGO-KEY to ALGO-KEYMAT to make it more obvious that the
keying material might need to contain more than just the key (such
as a salt or nonce value).

List the algorithm names that currently exist in the kernel.

Indicate that for IPComp, the Compression Parameter Index (CPI) is
used as the SPI.

Group the list of mode values by transform protocol.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
David Ward f3b9aa3df8 ip/xfrm: Command syntax should not expect a key for compression
Compression algorithms do not use a key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
David Ward 8dbe67d2fe ip/xfrm: Do not print a zero-length algorithm key
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
David Ward 6128fdfd5c ip/xfrm: Improve transform protocol-specific parameter checking
Ensure that only algorithms and modes supported by the transform
protocol are specified (so that errors are more obvious).

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
David Ward ec839527f2 ip/xfrm: Do not allow redundant algorithm combinations to be specified
AEAD algorithms perform both encryption and authentication; they are
not combined with separate encryption or authentication algorithms.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
David Ward 1d26e1fefd ip/xfrm: Extend SPI validity checking
A Security Policy Index (SPI) is not used with Mobile IPv6. IPComp
uses a smaller 16-bit Compression Parameter Index (CPI) which is
passed as the SPI value. Perform checks whenever specifying an ID.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-28 14:40:45 -07:00
James Chapman 9c064b5332 iproute2: update ip-l2tp man page
Add documentation about the new l2spec_type parameter for "ip l2tp add
session".

Signed-off-by: James Chapman <jchapman@katalix.com>
2013-03-27 13:20:59 -07:00
James Chapman dd10baa50d iproute2: add l2spec_type param to l2tp add session
When unmanaged L2TP sessions are created using "ip l2tp add session",
there is no option to allow the session's Layer2SpecificHeader type to
be selected - the kernel's default setting is always used. For
interopability with some vendor equipment, it might be necessary to
use a different setting. So add a new l2spec_type parameter to the "ip
l2tp add session" parameter list, allowing operators to set a specific
Layer2SpecificHeader type. The kernel already exposes the setting as a
netlink attribute so it is straightforward to add support for it in
iproute2.

This change allows unmanaged L2TP sessions to be configured between
Linux and some Cisco equipment by specifying "l2spec_type none" in "ip
l2tp add session" command parameters.

Signed-off-by: James Chapman <jchapman@katalix.com>
2013-03-27 13:20:58 -07:00
Stephen Hemminger 5f21823922 ll_map: use net/if.h to get prototype
Better to get prototype from system headers
2013-03-27 09:28:58 -07:00
Stephen Hemminger 3e26112a02 ll_map: remove unused address fields
The address was being stored but not used by current code.
2013-03-27 09:26:25 -07:00
Stephen Hemminger 1b95cb8d6b tc-tbf: remove ancient references to Alpha
In older versions of traffic shaping the Alpha kernel was special
and had higher HZ. This no longer matters, TC is based on high
resoulution timers in kernel.
2013-03-22 11:18:25 -07:00
Thomas Egerer 0c5982fd7f ip xfrm state: Allow different selector family
My previous commit introduced a patch to allow for states with different
ip address families for selector and id. The must have somehow been a
mixup of the patch I tested and the one I send, so the patch sent breaks
the iproute2 build. This patch fixes this. My apologies.

Signed-off-by: Thomas Egerer <hakke_007@gmx.de>
2013-03-20 08:11:54 -07:00
Thomas Egerer 23d5b0d551 ip xfrm state: Allow different selector family
Do not enforce the selector of a state to have the same address family
as the id. This makes it possible to configure inter family states.

Signed-off-by: Thomas Egerer <hakke_007@gmx.de>
2013-03-18 10:23:00 -07:00
Stephen Hemminger 4cd20da16f bridge: add oneline option
Split output of 'bridge link' across multiple lines,
only show the flags if -d is set, and add --oneline option
like ip command.
2013-03-16 10:18:50 -07:00
Vlad Yasevich aa2f1335e9 man: Add documentation for the bridge link operation.
Bridge tool now supports setting and retrieving bridge port specific
link attributes.  Document what attributes are supported and what
they mean.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-16 10:02:51 -07:00
Vlad Yasevich b1b7ce0f0d bridge: Add support for printing bridge port attributes
Output new nested bridge port attributes.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-16 10:02:18 -07:00
Vlad Yasevich 64108901b7 bridge: Add support for setting bridge port attributes
Add netlink support bridge port attributes such as cost, priority, state
and flags.  This also adds support for HW mode such as VEPA or VEB.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-16 10:01:53 -07:00
Stephen Hemminger 1124ffb721 ipaddress: minor white space cleanup
Convert leading spaces to tabs, and put alias in one printf
2013-03-14 13:47:49 -07:00
Stephen Hemminger d947b2384e ipmaddr: add whitespace around =
fix warning from parser
2013-03-14 13:44:25 -07:00
Petr Šabata 4405123433 iproute2: Mention the 'up' argument in documentation
Both ip-link and ip-address support the 'up' argument, however this
isn't documented in neither their help outputs or ip-address' manpage.
This patch fixes that.

Signed-off-by: Petr Šabata <contyk@redhat.com>
Reported-by: Jiří Popelka <jpopelka@redhat.com>
2013-03-14 13:26:33 -07:00
Stephen Hemminger e7b24b67db Fix build when shared libraries are disabled
On some platforms, shared libraries are not used. The stub code
need some updating to not generate errors.
2013-03-13 08:29:59 -07:00
roopa 263c894fd1 Fix -oneline output when alias present
This patch removes '\n' in -oneline output when alias
present on interface

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
2013-03-12 15:50:13 -07:00
Eric W. Biederman f480917486 iproute2: Document the -D and -I options
While looking into a sysctl regression in decnet on old kernels I
discovered this omission in the iproute2 documentation.

I can't imagine anyone's muscle memory remembering the longer forms.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-11 16:53:37 -07:00
Stephen Hemminger 8ae660941f bridge: cleanup usage message
Fdb usage message got too long.
2013-03-06 11:04:29 -08:00
Vlad Yasevich ab9387104c bridge: Update bridge man pages to include vlan command
Add the vlan command documentation to bridge man page.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-06 11:03:10 -08:00
Vlad Yasevich 9eff0e5cc4 bridge: Add vlan configuration support
Recent kernel patches added support for VLAN filtering on the bridge.
This functionality allows one to turn a basic bridge into a VLAN bridge,
where VLANs dicatate packet forwarding and header transformation.

To configure the VLANs on the bridge and its ports a new command is
added to the 'bridge' utility.

   # bridge vlan add dev eth0 vid 10 pvid untagged brdev
   # bridge vlan add
   # bridge vlan delete dev eth0 vid 10
   # bridge vlan show

This command supports the following flags:
   master - peform the operation on the software bridge device.  This is
	    the default behavior.
   self  -  perform the operation on the hardware associated with the port.
            This flag is required when the device is the bridge device and
	    the configuration is desired on the bridge device itself (not
	    one of the ports).
   pvid  -  Set the PVID (port vlan id) for a given port.  Any untagged
            frames arriving on the port will be assigned to this vlan.
   untagged - Sets the egress policy of for a given vlan.  Default port
            egress policy is tagged.  Set this flag if you wish traffic
            associated with this VLAN to exit the port untagged.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-06 11:03:08 -08:00
Vlad Yasevich fd08839c73 bridge: Add vlan support to fdb entries
Provide the ability to set and show vlans on FDB entries.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
2013-03-06 11:03:05 -08:00
Stephen Hemminger c6ff4b8344 Revert "generalize VXLAN forwarding tables"
This reverts commit 90ad5ae77e.
2013-03-06 11:02:46 -08:00
David Stevens 90ad5ae77e generalize VXLAN forwarding tables
This is the iproute2 support allowing an administrator to specify alternate
ports, vnis and outgoing interfaces for VXLAN device forwarding tables.

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
2013-03-06 10:48:38 -08:00
David Ward 4e9a686020 iplink_vlan: Add flag for Multiple VLAN Registration Protocol (MVRP)
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Patrick McHardy <kaber@trash.net>
2013-03-06 10:46:37 -08:00
David Ward daa45caddb ip/iptunnel: Fix incorrect syntax in man page
Reported-by: Andreas Henriksson <andreas@fatal.se>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-03-06 10:45:26 -08:00
Stephen Hemminger ae70d96656 ipntable: more fixes for ppc64
Not all arch have sizeof(unsigned long long) == sizeof(__u64)
2013-03-04 13:59:39 -08:00
Stephen Hemminger a55a8fd83b fix dependency on sizeof(__u64) == sizeof(unsigned long long)
Some platforms like ppc64 have unsigned long long as 128 bits, and
the printf format string would cause errors. Resolve this by using
unsigned long long where necessary (or unsigned long).
2013-02-28 08:51:46 -08:00
Stephen Hemminger 609106d3af Update kernel headers to 3.9.0-rc1 2013-02-28 08:43:46 -08:00
Stephen Hemminger a7c2882461 ip: fix ipv6 ntable on ppc64
Add casts to handle printf format when
 sizeof(unsigned long long) != sizeof(__u64)
2013-02-27 07:26:17 -08:00
Vijay Subramanian 9235195666 Fix compilation error of m_ipt.c with -Werror enabled
Commit (5a650703d4 Makefile: make warnings into
errors ) causes the following build error.

gcc -Wall -Wstrict-prototypes -Werror -Wmissing-prototypes
-Wmissing-declarations -Wold-style-definition -O2 -I../include
-DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\"
-D_GNU_SOURCE -DCONFIG_GACT -DCONFIG_GACT_PROB -DIPT_LIB_DIR=\"/lib/xtables\"
-DYY_NO_INPUT   -c -o m_ipt.o m_ipt.c
cc1: warnings being treated as errors
m_ipt.c:72: error: no previous prototype for 'xtables_register_target'
m_ipt.c:361: error: no previous prototype for 'build_st'
make[1]: *** [m_ipt.o] Error 1

This is fixed by adding the prototype in the header include/iptables.h

I am not sure if this is due to something wrong on my build system but I am
using current glibc 2.17.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2013-02-26 17:35:26 -08:00
Hannes Frederic Sowa 51ff9f2453 ss: show socket memory stats for unix sockets if requested
The output format is the same as for tcp sockets but only the following
fields are currently non-zero: sk_rcvbuf, sk_wmem_alloc and sk_sndbuf.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-02-26 17:33:49 -08:00
Stephen Hemminger 5048f9a0c5 ss: use rta_getattr_u32 2013-02-26 17:32:58 -08:00
Hannes Frederic Sowa defd61ca91 ss: show send queue length on unix domain sockets
On sockets in listen state Send-Q reports the maximum backlog,
otherwise it reports allocated socket write memory.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-02-26 17:29:24 -08:00
Stephen Hemminger 05e983ea82 v3.8.0 2013-02-21 08:41:20 -08:00
Kees van Reeuwijk 3bed7bb7e7 iproute2: clearer error messages for fifo and tbf qdiscs
Clearer error messages for fifo and tbf qdiscs:
- Say who is complaining
- Don't just say a parameter is bad, show the offending parameter
- Be clearer about duplicate parameters vs illegal pairs of parameters
- Try to give multiple error messages rather than let the user discover the errors one by one
- When there are parameter aliases, try to use the variant that was used, or at least mention them all

Note that in the old version an empty parameter list to tbf would just cause an explain() message
without a specific error message. By simply removing the relevant error check, the code now
handles this error more gracefully by printing an error message for all mandatory parameters.
It still prints the explain() message.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-21 08:34:34 -08:00
Lutz Jaenicke 257422f77f rtnl_wilddump_request: fix alignment issue for embedded platforms
Platforms have different alignment requirements which need to be
fulfilled by the compiler. If the structure elements are already
4 byte (NLMGS_ALIGNTO) aligned by the compiler adding an explicit
padding element (align_rta) is not allowed.
Use __attribute__ ((aligned (NLMSG_ALIGNTO))) in order to achieve
the required alignment.
Experienced on ARM (xscale) with symptom
  netlink: 12 bytes leftover after parsing attributes

Tested on:
  ARM      (32bit Big Endian)
  PowerPC  (32bit Big Endian)
  x86_64   (64bit Little Endian)
Each with different aligment requirments.

Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com>
2013-02-19 07:45:59 -08:00
Stephen Hemminger f21963fdff Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2013-02-19 07:19:12 -08:00
Stephen Hemminger caae16b3b8 ip: handle flush with table > 2^31
Fixes Debian bug #700434
Need to table id in filter to be unsigned to avoid conversion to -1

The documentation for "ip" suggests that, when using multiple routing tables, the table ID can be an arbitrary 32 bit number. I've been writing a script that calculates a table Id based on an IP addresses and sets up tables accordingly based on it. This seems to work for everything I've tried except "ip route flush". If you specify a table to flush with an ID over 2^31, it flushes all IPv4 routing tables. For example:

Will delete all routing tables, including the default one. Needless to say, this is quite annoying. I think this is an upstream bug, but your opinions will be greatly appreciated.
2013-02-12 11:42:57 -08:00
Stephen Hemminger 6398d3a652 Makefile: turn on warnings about missing prototypes
Catches missing, dead code and also places where function should be static.
2013-02-12 11:39:32 -08:00
Stephen Hemminger 46ac8a5550 lib: make string arguments const
For lookup routines, make arguments const where possible.
2013-02-12 11:39:07 -08:00
Stephen Hemminger d1f28cf181 ip: make local functions static 2013-02-12 11:38:35 -08:00
Kees van Reeuwijk 14645ec231 iproute2: improved error messages
This patch improves many error messages as follows:
- For incorrect parameters, show the value of the offending parameter, rather than just say that it is incorrect
- Rephrased messages for clarity
- Rephrased to more `mainstream' english

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-11 09:22:22 -08:00
Kees van Reeuwijk b8a05839f3 iproute2: clarifications in the libnetlink.3 man page
Rephrasing for clarity.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>

 ---

 libnetlink.3 |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)
2013-02-11 09:22:19 -08:00
Kees van Reeuwijk ecf52428da iproute2: add a missing return statement
Since do_help() has to return an int to fit in the table of commands,
it should actually return an int. This patch lets it do so.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-11 09:22:17 -08:00
Kees van Reeuwijk 089d8f36dd iproute2: clarifications in the tc-hfsc.7 man page
Improved man page as follows:
- Use more `mainstream' english
- Rephrased for clarity
- Use standard notation for units

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-11 09:22:14 -08:00
Kees van Reeuwijk 4957250166 iproute2: clarification of various man8 pages
Rephrasing for clarity.

Note that in ip-rule.8 I rephrased a sentence to "The RPDB is scanned
in order of decreasing priority." The original version talked about
*in*creasing priority, but from the context that didn't make sense.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-11 09:22:06 -08:00
Benjamin Poirier 5ab3a4de5e Use pkg-config to obtain xtables.h path
On openSUSE 12.2 (at least) xtables.h is not installed in the system-wide
include dir but in /usr/include/iptables-1.4.16.3/. This results in the
following build failure:
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Other includers of xtables.h already call out to pkg-config
2013-02-11 09:19:54 -08:00
Stephen Hemminger 1cb6a110d6 ip: change format of promiscuity display
Don't put it on separate line, keep it on line with link address.
2013-02-05 08:16:28 -08:00
Nicolas Dichtel ede6a3eaf5 iplink: display the value of IFLA_PROMISCUITY
This is useful to know the 'real' status of an interface (the flag IFF_PROMISC
is exported by the kernel only when the user set it explicitly, for example it
will not be exported when a tcpdump is running).

This information will be displayed when '-details' is provided by the user.

Example:
$ ip -d l l tun10
6: tun10: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT
    link/sit 10.16.0.249 peer 10.16.0.121
    sit remote 10.16.0.121 local 10.16.0.249 ttl inherit pmtudisc 6rd-prefix 2002::/16
    promiscuity 2

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-02-05 08:06:59 -08:00
Johannes Naab e72ca3fbb0 iproute2: tc netem rate: allow negative packet/cell overhead
by fixing the parsing of command-line tokens

Signed-off-by: Johannes Naab <jn@stusta.de>
2013-02-04 09:06:50 -08:00
Nicolas Dichtel d36035185c ipaddr: fix a typo in error msg about SIOCGIFTXQLEN
The optname was wrong.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2013-02-04 09:05:31 -08:00
David Ward e59fd3db2e ip/iptunnel: Extend TOS syntax
The 'inherit/STRING' or 'inherit/00..ff' syntax indicates that the
TOS field of tunneled packets should be copied from the original IP
header, but for non-IP packets the value STRING or 00..ff should be
used instead. (This syntax is already used by 'ip tunnel show'.)

Also clarify the man page and the command usage text (particularly
that the TOS is not specified as a decimal number).

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2013-02-04 08:56:45 -08:00
Stephen Hemminger 53403c53af libnetlink: add caveat
There are much better API's to netlink now. Encourage users to look
elsewhere.
2013-02-04 08:54:17 -08:00
Stephen Hemminger 5a650703d4 Makefile: make warnings into errors
Don't let contributions cause warnings.
2013-02-04 08:51:44 -08:00
Eric W. Biederman 9a7b3d91b6 iproute2: Add "ip netns pids" and "ip netns identify"
Add command that go between network namespace names and process
identifiers.  The code builds and runs agains older kernels but
only works on Linux 3.8+ kernels where I have fixed stat to work
properly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:35:07 -08:00
Eric W. Biederman 1e9014a7a6 iproute2: Fill in the ip-netns.8 manpage
Document ip netns monitor.

Add a few senteces describing each command.  The manpage was looking
very scrawny.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:34:00 -08:00
Eric W. Biederman 58a3e8270f iproute2: Make "ip netns delete" more likely to succeed
Sometimes "ip netns delete" fails because it can not delete the file a
network namespace was mounted on.  If this only happened when a
network namespace was really in use this would be fine, but today it
is possible to pin all network namespaces by simply having a long
running process started with "ip netns exec".

Every mount is copied when a network namespace is created so it is
impossible to prevent the mounts from getting into other mount
namespaces.  Modify all mounts in the files and subdirectories of
/var/run/netns to be shared mount points so that unmount events can
propogate, making it unlikely that "ip netns delete" will fail because
a directory is mounted in another mount namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:33:58 -08:00
Eric W. Biederman 4395d48c78 iproute2: Improve "ip netns add" failure error message
Report the name of the network namespace that could not be
added.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:33:55 -08:00
Eric W. Biederman 8e2d47dce2 iproute2: Normalize return codes in "ip netns"
Ben Hutchings pointed out that the return value of do_netns is passed
to exit and the current convention of returning -1 for failure is
inconsitent with that reality.

Return EXIT_FAILURE instead of -1 and EXIT_SUCCESS instead of 0.  To make
it clear that the return codes are expected to be passed to exit.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:33:53 -08:00
Eric W. Biederman 144e6ce167 iproute2: Don't propogate mounts out of ip
Some systems are now following the advice in
linux/Documentation/sharedsubtrees.txt and running with all mount
points shared between all mount namespaces by default.

After creating the mount namespace call mount on / with
MS_SLAVE|MS_REC to modify all mounts in the new mount namespace to
slave mounts if they are shared or private mounts otherwise.
Guarnateeing that changes to the mount namespace created with
"ip netns exec" don't propgate to other namespaces.

Reported-by: Petr Šabata <contyk@redhat.com>
Tested-by: Petr Šabata <contyk@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-04 08:33:50 -08:00
Stephen Hemminger 003f76f026 README: update mail address and download location 2013-01-18 09:54:58 -08:00
Mike Frysinger 048bff6e02 ipxfrm: use alloca to allocate stack space
Clang doesn't support the gcc extension for embeddeding flexible arrays
inside of structures.  Use the slightly more portable alloca().

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2013-01-18 08:17:12 -08:00
Jamal Hadi Salim 852d51222d iproute2: act_ipt fix xtables breakage
Fixes breakage with xtables API starting with version 1.4.10

Signed-off-by: Hasan Chowdhury <shemonc@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-01-16 08:14:48 -08:00
Mike Frysinger 55eaaeb57a do not ignore errors in man subdirs
If an error occurs in a man subdir, make sure we propagate it back up.

While we're here, merge the duplicate rules into one.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2013-01-15 09:41:37 -08:00
Mike Frysinger 5746307300 add man7 to subdirs list
The man dir misses the man7 as a subdir which means none of the pages
get installed.

URL: https://bugs.gentoo.org/451166
Reported-by: Marcin Mirosław <bug@mejor.pl>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2013-01-15 09:41:32 -08:00
Strake 5bd9dd49ae include needed files
Needed to build iproute2 with musl
2012-12-23 11:49:06 -08:00
Cong Wang e29d8cc616 bridge: update help
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-20 10:56:06 -08:00
Cong Wang 0ff8f578ed bridge: make `bridge mdb` output consistent with input
bridge -> dev
group -> grp

Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-20 10:55:55 -08:00
Cong Wang d8b75d1ad2 bridge: distinguish permanent and temporary mdb entries
This patch adds a flag to mdb entries so that we can distinguish
permanent entries with temporary ones.

Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-20 10:54:19 -08:00
Stephen Hemminger 75e003c23e bridge: update kernel headers 2012-12-20 08:24:05 -08:00
Stephen Hemminger ae7b9a0d5c configure: restore old behaviour
Previous change wasn't needed, since merge of
	configure: move toolchain init to a function
2012-12-19 16:01:39 -08:00
Stephen Hemminger 07a6f5eca2 build: indent shell functions in configure
Script has lots of shell functions but never indented properly.
2012-12-18 09:20:13 -08:00
Jan Engelhardt d29feaaa35 build: unbreak linakge of m_xt.so
Commit v3.7.0~10 caused the variable new PKG_CONFIG variable never
to be present at the time of calling make, leading to tc/m_xt.so
not linked with -lxtables (result from pkg-config xtables --libs),
that in turn leading to

tc: symbol lookup error: /usr/lib64/tc//m_xt.so: undefined symbol:
xtables_init_all

Fixing that.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
2012-12-18 09:18:45 -08:00
Mike Frysinger 95d9d665d9 configure: pull AR from the env too
This matches the existing CC behavior.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-12-17 09:13:46 -08:00
Mike Frysinger 691c8a6567 lib: include the Config file too
The lib makefile doesn't include Config which means it misses
setting up toolchain vars that it includes.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-12-17 09:13:46 -08:00
Mike Frysinger 601f60e552 configure: move toolchain init to a function
The layout of this file uses functions to update Config.  Move the
toolchain logic to the same style to fix setting the vars in Config.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-12-17 09:13:46 -08:00
Nicolas Dichtel cbe195dc6b ip: update man pages and usage() for 'ip monitor'
Sync with the current code.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-17 08:47:51 -08:00
Nicolas Dichtel 743a00a72b ip: add man pages for netconf
This patch add the documentation about 'ip netconf' command.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-17 08:47:51 -08:00
Zhi Yong Wu 602e9d36ba ip: add the type 'vxlan' in the output of "ip link help"
The new type 'vxlan' is added in the output of "ip link help"

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
2012-12-17 08:15:57 -08:00
Nicolas Dichtel e34d3dcce2 ip: use rtnelink to manage mroute
mroute was using /proc/net/ip_mr_[vif|cache] to display mroute entries. Hence,
only RT_TABLE_DEFAULT was displayed and only IPv4.
With rtnetlink, it is possible to display all tables for IPv4 and IPv6. The output
format is kept. Also, like before the patch, statistics are displayed when user specify
the '-s' argument.

The patch also adds the support of 'ip monitor mroute', which is now possible.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-14 10:08:17 -08:00
Nicolas Dichtel e509fb1b68 ip: term OPTIONS was used twice in 'ip route' man pages
INFO_SPEC already uses the term 'OPTIONS' and describe it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-14 10:00:31 -08:00
Nicolas Dichtel 77987911e5 ip: update man pages for 'ip link'
Now 'ip link' supports ipip, sit and ip6tnl.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-14 10:00:31 -08:00
Nicolas Dichtel 2a898320be ip: update mand pages and usage() for 'ip mroute'
Sync with the current code.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-14 09:56:47 -08:00
Nicolas Dichtel 195f0f62d7 ip/link_iptnl: fix indentation Logged in as shemminger
Use tabs instead of space when possible.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-14 09:50:33 -08:00
Cong Wang 176659e38e iproute2: update usage info of bridge monitor
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-14 09:11:15 -08:00
Cong Wang 4a4ee61699 iproute2: add support to monitor mdb entries too
This patch implements `bridge monitor mdb`.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-12 10:27:46 -08:00
Cong Wang 9dca676721 iproute2: implement add/del mdb entry
This patch implements:

	bridge mdb { add | del } dev DEV port PORT grp GROUP

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-12 10:27:46 -08:00
David L Stevens 1556e29d3c add DOVE extensions for iproute2
This patch adds a new flag to iproute2 for vxlan devices to enable
DOVE features. It also adds support for L2 and L3 switch lookup miss
netlink messages to "ip monitor".

Changes since v2: fix merge conflict
Changes since v1:
	- split "dove" flag into separate feature flags:
		- "proxy" for ARP reduction
		- "rsc" for route short circuiting
		- "l2miss" for L2 switch miss notifications
		- "l3miss" for L3 switch miss notifications

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
2012-12-12 10:02:19 -08:00
Nicolas Dichtel 1ce2de9738 ip: add support of 'ip link type [ipip|sit]'
This patch allows to manage ip tunnels via the interface ip link.
The syntax for parameters is the same that 'ip tunnel'.

It also allows to display tunnels parameters with 'ip -details link' or
'ip -details monitor link'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-12 09:10:22 -08:00
Nicolas Dichtel 9d0efc1048 ip: add support of 'ip link type ip6tnl'
This patch allows to manage ip6 tunnels via the interface ip link.
The syntax for parameters is the same that 'ip -6 tunnel'.

It also allows to display tunnels parameters with 'ip -details link' or
'ip -details monitor link'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-12-12 09:09:23 -08:00
Nicolas Dichtel 4852ba750a ip: add support of netconf messages
Example of the output:
$ ip monitor netconf&
[1] 24901
$ echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
ipv6 dev lo forwarding off
ipv6 dev eth0 forwarding off
ipv6 all forwarding off
$ echo 1 > /proc/sys/net/ipv4/conf/eth0/forwarding
ipv4 dev eth0 forwarding on

$ ip -6 netconf
ipv6 all forwarding on mc_forwarding 0
$ ip netconf show dev eth0
ipv4 dev eth0 forwarding on rp_filter off mc_forwarding 1

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Minor cleanup of original patch, made sure netconf.h matched
result of santized kernel headers
2012-12-12 09:05:51 -08:00
Andreas Henriksson caadda9308 iproute2: fix tc ematch manpage section
The debian package checking tool, lintian, spotted that the
tc ematch manpage seems to have an error in the specified section.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-12-12 08:15:56 -08:00
Cong Wang e06c7f7e2e iproute2: add mdb sub-command to bridge
Sample output:

	# ./bridge/bridge mdb show dev br0
	bridge br0 port eth1 group 224.8.8.9
	bridge br0 port eth0 group 224.8.8.8

	# ./bridge/bridge -d mdb show dev br0
	bridge br0 port eth1 group 224.8.8.9
	bridge br0 port eth0 group 224.8.8.8
	router ports on br0: eth0

Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-11 16:46:22 -08:00
Stephen Hemminger 08342500ee bridge: add if_bridge.h header
Since system may not have upto date kernel headers, keep if_bridge.h
in set of exported headers used to build iproute.
2012-12-11 16:43:36 -08:00
Stephen Hemminger 910773dc0d Update kernel headers to 3.8-pre
Sanitized headers from net-next
2012-12-11 11:16:36 -08:00
Stephen Hemminger 6abef21b3e v3.7.0 2012-12-11 09:52:39 -08:00
Petr Sabata 7de7e5915a iproute2: ss - change default filter to include all socket types
Currently the default filter lists TCP sockets only which is
rather confusing especially when the '-a/--all' flag is used.
This patch changes the default to include all sockets, imitating
netstat(8) behavior.

Signed-off-by: Petr Šabata <contyk@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
2012-12-11 09:50:39 -08:00
Stephen Hemminger efa344f35c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2012-12-07 09:04:38 -08:00
Rostislav Lisovy 8f2550ab5d tc: add canid ematch to ematch_map
The canid ematch has been added in commit:

7b5f30e Ematch used to classify CAN frames according to their identifiers

But the corresponding entry in etc/iproute2/ematch_map was lost. This patch
adds the missing entry in ematch_map, otherweise tc would complain:

Error: Unable to find ematch "canid" in /etc/iproute2/ematch_map
Please assign a unique ID to the ematch kind the suggested entry is:
        7       canid

Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-30 08:27:58 -08:00
Nicolas Dichtel df5574d066 ip/ip6tunnel: fix update of tclass and flowlabel
When tclass or flowlabel field were updated, we only performed an OR with the
new value. For example, it was not possible to reset tclass:
  ip -6 tunnel change ip6tnl2 tclass 0

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-16 08:15:39 -08:00
Nicolas Dichtel 3f83dce573 ip/ip6tunnel: reset encap limit flag on change
Flag IP6_TNL_F_IGN_ENCAP_LIMIT is set when encaplimit is none, but it was not
removed if encaplimit was set on update (ip tunnel change).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-16 08:15:39 -08:00
Nicolas Dichtel d0c8420c09 ip/ip6tunnel: fix help for TCLASS
Help is "[tclass TCLASS]", but only TOS was described.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-16 08:15:39 -08:00
Wookey 35122a7500 configure: respect $CC environment var override
Enables e.g. cross-compiling by setting $CC env var.  This patch
extracted from the Ubuntu package (thanks, Wookey and Colin Watson).

BugLink: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=670660
BugLink: https://bugs.launchpad.net/bugs/870197

Signed-off-by: Kamal Mostafa <kamal@debian.org>
2012-11-16 08:06:19 -08:00
Nicolas Dichtel 8b2f2d777c ip/ip6tunnel: fix update of tclass and flowlabel
When tclass or flowlabel field were updated, we only performed an OR with the
new value. For example, it was not possible to reset tclass:
  ip -6 tunnel change ip6tnl2 tclass 0

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-14 19:42:46 +01:00
Nicolas Dichtel 1da845409e ip/ip6tunnel: reset encap limit flag on change
Flag IP6_TNL_F_IGN_ENCAP_LIMIT is set when encaplimit is none, but it was not
removed if encaplimit was set on update (ip tunnel change).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-14 19:42:46 +01:00
Nicolas Dichtel 2a930d24bc ip/ip6tunnel: fix help for TCLASS
Help is "[tclass TCLASS]", but only TOS was described.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-11-14 19:42:46 +01:00
Andreas Henriksson 2c389b0f31 iproute2: drop libresolv
Hello!

While building the iproute package in Debian I get warnings from
package helpers like this:

dpkg-shlibdeps: warning: package could avoid a useless dependency if debian/iproute/sbin/tc debian/iproute/usr/bin/lnstat debian/iproute/bin/ip debian/iproute/bin/ss debian/iproute/sbin/bridge debian/iproute/sbin/rtmon were not linked against libresolv.so.2 (they use none of the library's symbols)

The -lresolv in ./Makefile seems to come from pre-historic times (before
iproute2 git history, possibly from libc5/pre-glibc days).
I couldn't find out if/why there was any reason for linking to libresolv.
Does anyone know if there are any valid reasons for keeping it still?

If not, I'd be happy to see it go.... while at it I also removed includes
of <resolv.h> which I also couldn't find any reason for, but this is
just an added bonus of the patch (and there are probably more unneeded
includes that could be dropped in the same sources).

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-11-12 08:50:15 -08:00
Stephen Hemminger cc57430206 man: fix incorrect use of "it's"
A couple of places were using "it's" where possive form "its"
should be used instead.
2012-11-12 08:05:45 -08:00
Andreas Henriksson 987e9d710a iproute2: avoid errors from double-installing manpages
Three manpages in man8 are listed twice in MAN8PAGES (both directly and
in TARGETS) which causes the install command to spit our a couple of
warnings as below and exiting with non-zero exit code....

make[3]: Entering directory `/tmp/buildd/iproute-20121001/man/man8'
install -m 0755 -d /tmp/buildd/iproute-20121001/debian/tmp/usr/share/man/man8
install -m 0644 ip-address.8 ip-link.8 ip-route.8 ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 tc.8 tc-bfifo.8 tc-cbq.8 tc-cbq-details.8 tc-choke.8 tc-codel.8 tc-drr.8 tc-ematch.8 tc-fq_codel.8 tc-hfsc.8 tc-htb.8 tc-netem.8 tc-pfifo.8 tc-pfifo_fast.8 tc-prio.8 tc-red.8 tc-sfb.8 tc-sfq.8 tc-stab.8 tc-tbf.8 bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 ip-address.8 ip-addrlabel.8 ip-l2tp.8 ip-link.8 ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 ip-netns.8 ip-ntable.8 ip-route.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 /tmp/buildd/iproute-20121001/debian/tmp/usr/share/man/man8
install: will not overwrite just-created `/tmp/buildd/iproute-20121001/debian/tmp/usr/share/man/man8/ip-address.8' with `ip-address.8'
install: will not overwrite just-created `/tmp/buildd/iproute-20121001/debian/tmp/usr/share/man/man8/ip-link.8' with `ip-link.8'
install: will not overwrite just-created `/tmp/buildd/iproute-20121001/debian/tmp/usr/share/man/man8/ip-route.8' with `ip-route.8'
make[3]: *** [install] Error 1
make[3]: Leaving directory `/tmp/buildd/iproute-20121001/man/man8'
make[2]: *** [install] Error 2
make[2]: Leaving directory `/tmp/buildd/iproute-20121001/man'

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-11-11 16:22:31 -08:00
Mike Frysinger e4fc4ada33 allow pkg-config to be customized
Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can
override it to their specific version (like when cross-compiling).

This is the same way the upstream pkg-config code works.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-11-11 16:21:34 -08:00
Stephen Hemminger 1465db1a14 bridge: use rta_getattr_xxx wrappers
Don't peek at RTA_DATA() directly.
2012-10-29 17:54:09 -07:00
Stephen Hemminger 38df7ac95d bridge: remove trailing whitespace 2012-10-29 17:48:55 -07:00
Pavel Emelyanov 346f8ca814 ss: Get udp sockets info via sock-diag
Now everything is prepared for it, so the patch is straightforward.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-10-26 17:48:49 -07:00
Pavel Emelyanov 886d19d6c9 ss: Support sock-diag
That is -- write the code, that sends diag request in new format. It's
mostly copied from tcp-diag code. Plus, sock-diag differentiates sockets
by families, thus we have to send two requests sequentially.

If we fail to submit new sock-diag request, try to fall-back and submit
the legacy tcp-diag one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-10-26 17:48:49 -07:00
Pavel Emelyanov 746a695f86 ss: Split inet_show_netlink into parts
The existing function inet_show_netlink sends tcp-diag request and
then receives back the response and prints it on the screen.

The sock-diag and legacy tcp-diag have different request types, but
report sockets in the same format. In order to support both it's
convenient to split the code into sending and receiving parts.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-10-26 17:48:49 -07:00
Pavel Emelyanov 3fe5b534fe ss: Rename some tcp- names into inet-
The sock-diag is capable to diag udp sockets as well. Prepare the
ss code for this by first renaming soon-to-be-generic tcp-s names
into inet-s.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-10-26 17:48:48 -07:00
Vincent Bernat 4d6c3796a5 ip: fix "ip -6 route add ... nexthop"
IPv6 multipath routes were not accepted by "ip route" because an IPv4
address was expected for each gateway. Use `get_addr()` instead of
`get_addr32()`.

Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-10-25 09:07:01 -07:00
Stephen Hemminger f1a6f4e985 iplink: add vxlan to man page
Also sort link types for clarity
2012-10-25 09:01:29 -07:00
Or Gerlitz de0389935f iplink: Added support for the kernel IPoIB RTNL ops
Added support to ipoib rtnl ops through which one can create, configure,
query and delete IPoIB devices, for example

 $ ip link add link ib0.8001 name ib0.8001 type ipoib pkey 0x8001
 $ ip link add link ib0.1 name ib0.1 type ipoib mode connected
 $ ip --details link show dev ib0.1

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
2012-10-25 08:53:12 -07:00
Stephen Hemminger e95c8fc3b1 Update kernel headers to 3.7-rc1
Get new sanitized headers
2012-10-19 13:31:05 -07:00
Stephen Hemminger b64da5a5e0 vxlan: only send group address if defined
Don't send 0 as group address.
2012-10-19 13:25:17 -07:00
Stephen Hemminger 2d596120cf vxlan: add support for port range 2012-10-09 23:39:17 -07:00
Julian Anastasov ea63a69b6d iproute2: add support for tcp_metrics
ip tcp_metrics/tcpmetrics

	We support get/del for single entry and dump for
show/flush.

v3:
 - fix rtt/rttvar shifts as suggested by Eric Dumazet
 - show rtt/rttvar usecs as suggested by David Laight

Signed-off-by: Julian Anastasov <ja@ssi.bg>
2012-10-08 10:23:07 -07:00
Nicolas Dichtel 6ea3ebafe0 iproute2: inform user when a neighbor is removed
When running 'ip monitor neigh', there is no hint to tell if a neighbor is
updated or deleted.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2012-10-08 09:48:23 -07:00
Stephen Hemminger 253eb98b77 Merge branch 'vxlan'
Conflicts:
	include/linux/if_link.h
2012-10-03 08:52:59 -07:00
Matt Burgess 92905c6e0d iproute2-3.6.0 assumes presence of iptables
Hi,

When compiling iproute2-3.6.0 on a host that doesn't have iptables available, I get the following error:

gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE
-DCONFIG_GACT -DCONFIG_GACT_PROB -DYY_NO_INPUT   -c -o em_ipset.o
em_ipset.c
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Fixed by the following patch, which guards the building of em_ipset.o on
the presence of suitable headers.

Thanks,

Matt.
2012-10-03 08:51:29 -07:00
Stephen Hemminger f22640712f Update headers to 3.7-pre-rc
Get latest headers from merge
2012-10-03 08:48:37 -07:00
Petr Písař 7f747fd937 iproute2: List interfaces without net address by default
This fixes regression in iproute2-3.5.1 when `ip addr show' skipped
interfaces without network layer address.

Wrong output:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:54:00:0f:03 brd ff:ff:ff:ff:ff:ff
    inet 10.34.25.198/23 brd 10.34.25.255 scope global eth0
    inet6 2620:52:0:2219:250:54ff:fe00:f03/64 scope global dynamic
       valid_lft 2591919sec preferred_lft 604719sec
    inet6 fe80::250:54ff:fe00:f03/64 scope link
       valid_lft forever preferred_lft forever

Expected output:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:54:00:0f:03 brd ff:ff:ff:ff:ff:ff
    inet 10.34.25.198/23 brd 10.34.25.255 scope global eth0
    inet6 2620:52:0:2219:250:54ff:fe00:f03/64 scope global dynamic
       valid_lft 2591896sec preferred_lft 604696sec
    inet6 fe80::250:54ff:fe00:f03/64 scope link
       valid_lft forever preferred_lft forever
5: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 8a:ec:35:34:1f:a8 brd ff:ff:ff:ff:ff:ff
6: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 2e:97:ef:77:40:82 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Petr Písař <ppisar@redhat.com>
2012-10-03 08:47:14 -07:00
Stephen Hemminger ec72fd7380 l2tp: remove references to old bridge utils
Can no manage interfaces with ip no need to invoke old brctl
2012-10-03 08:45:32 -07:00
Stephen Hemminger 0849e60a10 bridge: manage VXLAN based forwarding tables
Allow extending bridge forwarding table to handle VXLAN as well.
Change format of output to be close to 'ip neighbour'
2012-10-01 13:58:01 -07:00
Stephen Hemminger d0d9fcb3ce Merge branch 'master' into vxlan 2012-10-01 09:05:29 -07:00
Stephen Hemminger 808ed6e10a v3.6.0 2012-10-01 08:39:21 -07:00
Stephen Hemminger ab12370657 update header files to 3.6 2012-10-01 08:38:03 -07:00
Stephen Hemminger a5494df2c1 vxlan support 2012-10-01 08:36:50 -07:00
Werner Fink 0ecf26fc7d Change how pdf doc's are created
Currently the pdf docs are done with
    sgml -> sgmltool -> tex -> latex -> dvi -> dvips -> ps -> ps2pdf -> pdf
 or
    tex -> latex -> dvi -> dvips -> ps -> ps2pdf -> pdf
 with this patch we do
    sgml -> sgmltool -> tex -> pdflatex -> pdf
 or
    tex -> pdflatex -> pdf
2012-09-24 12:50:37 -07:00
Stephen Hemminger 27bca61531 Add support for AF_BRIDGE
This can be useful when displaying neighbour table
2012-09-17 15:50:27 -07:00
Julian Anastasov 328d482c48 iproute2: GENL: merge GENL_REQUEST and GENL_INITIALIZER
Both macros are used together, so better to have
single define. Update all requests in ipl2tp.c to use the
new macro.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
2012-09-17 15:46:45 -07:00
Stephen Hemminger bd899db39c man: make sure tc man page gets installed 2012-09-17 15:46:00 -07:00
John Fastabend d611682a8c iproute2: bridge: finish removing replace option in man pages
This patch finishes removing the replace option from the bridge
man page which I missed in this commit

commit 57b9785de3
Author: John Fastabend <john.r.fastabend@intel.com>
Date:   Mon Aug 27 10:52:31 2012 -0700

    iproute2: bridge: remove replace and change options

Also add documentation for "{ self | embedded }" already shown on
the cmd line help msg.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2012-09-14 10:05:00 -07:00
Pavel Emelyanov 81824ac228 iproute: Add ability to save, restore and show the interfaces' addresses (resend)
This functionality is required by checkpoint-restore project. Since the
dump and restore for routes is already done in ip tool it's naturally to
dump and restore addresses in the ip tool as well.

The implementation logic is the same as for the respective one for routes.
The magic number digits are taken from the Seattle coordinates.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-11 09:18:21 -07:00
Julian Anastasov 4ef9ff2a8f iproute2: use libgenl in ipl2tp
Use the common code from libgenl.c to parse family, and initialize
structures.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
2012-09-11 09:05:42 -07:00
Julian Anastasov 8afcc28879 iproute2: add libgenl files
Create libgenl.h and libgenl.c. They will contain
common code for GENL users such as ipl2tp, tcp_metrics, etc.

Somewhat simplified by Stephen Hemminger

Signed-off-by: Julian Anastasov <ja@ssi.bg>
2012-09-11 08:59:09 -07:00
Li Wei 8325daf7de iproute2: tc.8: update UNITS section.
- rename section UNITS to PARAMETERS.
- break section PARAMETERS down to four subsections to cover the
  common used parameter types(RATES, TIMES, SIZES, VALUES).
- add some explaination for IEC units in RATES.
- point out the max value we can set for RATES, TIMES and SIZES.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
2012-09-10 09:34:27 -07:00
Pavel Emelyanov 93b7986345 iproute: Add route showdump command (v2)
Some time ago the save+restore commands were added to ip route (git
id f4ff11e3, Add ip route save/restore). These two save the raw rtnl
stream into a file and restore one (reading it from stdin).

The problem is that there's no way to get the contents of the dump
file in a human readable form. The proposal is to add a command that
reads the rtnl stream from stdin and prints the data in a way the
usual "ip route list" does?

changes since v1:

* Take the magic at the beginning of the dump file into account
* Check for stdin (the dump is taken from) is not a tty

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-07 09:13:32 -07:00
Pavel Emelyanov 76c61b34a6 iproute: Add magic cookie to route dump file
In order to somehow verify that a blob contains route dump a
4-bytes magic is put at the head of the data and is checked
on restore.

Magic digits are taken from Portland (OR) coordinates :) Is
there any more reliable way of generating such?

Signed-of-by: Pavel Emelyanov <xemul@parallels.com>
2012-09-07 09:10:51 -07:00
Andreas Schwab 1b3c149b41 iproute2: Fix various manpage formatting nits
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
2012-09-07 09:01:51 -07:00
Mathias Krause c2f7d6c7c4 configure: remove TMPDIR on exit
Commit e557d1a ("Don't put configure files in /tmp") introduced a typo
that prevented automated cleanup of the temporary directory created for
feature testing. Fix this typo.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
2012-09-04 09:42:16 -07:00
John Fastabend 57b9785de3 iproute2: bridge: remove replace and change options
Replace and change are not supported by bridge netlink so remove it
from bridge tool options.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2012-08-27 11:24:03 -07:00
John Fastabend 059547c597 iproute2: build failure due to missing '\' in Makefile
After latest commit 'Install all tc and ip sub pages' this error
occurs on make.

make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/git/kernel.org/iproute2/man/man3'
make[2]: Entering directory `/home/git/kernel.org/iproute2/man/man8'
Makefile:8: *** commands commence before first target.  Stop.
make[2]: Leaving directory `/home/git/kernel.org/iproute2/man/man8'

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2012-08-27 11:24:03 -07:00
Stephen Hemminger 5147af5acb Install all tc and ip sub pages
Add missing entries in Makefile
2012-08-25 08:39:30 -07:00
John Fastabend dc6a6a2553 iproute2: Add FDB print and update cmds for self and master
Add command to update and print FDB entries with NTF_SELF and
NTF_MASTER set.

Example usages illustrating use of 'self' to program embedded
forwarding table and 'master' to configure the forwarding table
of the bridge. Also shows 'master self' used to update both in
the same command.

#./br/br fdb add 00:1b:21:55:23:60 dev eth3 self
#./br/br fdb add 00:1b:21:55:23:60 dev eth3 master
#./br/br fdb add 00:1b:21:55:23:61 dev eth3 self master
#./br/br fdb add 00:1b:21:55:23:62 dev eth3
#./br/br fdb show
eth3    00:1b:21:55:23:60       local self
eth3    00:1b:21:55:23:61       local self
eth3    33:33:00:00:00:01       local self
eth3    01:00:5e:00:00:01       local self
eth3    33:33:ff:55:23:59       local self
eth3    01:00:5e:00:00:fb       local self
eth33   33:33:00:00:00:01       local self
eth34   33:33:00:00:00:01       local self
eth3    00:1b:21:55:23:59       local master
eth3    00:1b:21:55:23:60       static master
eth3    00:1b:21:55:23:62       static master
eth3    00:1b:21:55:23:61       static master

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2012-08-24 17:11:01 -07:00
Oliver Hartkopp 6790dc84dd iproute2: Add missing tc-ematch.8 for man page installation
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
2012-08-21 07:48:53 -07:00
Rostislav Lisovy 7b5f30e14f Ematch used to classify CAN frames according to their identifiers
This ematch enables effective filtering of CAN frames (AF_CAN) based
on CAN identifiers with masking of compared bits. Implementation
utilizes bitmap based classification for standard frame format (SFF)
which is optimized for minimal overhead.

Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
2012-08-20 13:11:55 -07:00
Stephen Hemminger ac4e8384e0 Update can.h to 3.6-rc2 2012-08-20 13:02:42 -07:00
Pavel Emelyanov b8cf1e9ae3 iproute: Fix errno propagation from rtnl_talk
Callers of rtnl_talk check errno value for their needs. In particular, the addrs
and routes restoring code validly reports success if the EEXISTS is in there.

However, the errno value can be sometimes screwed up by the perror call. Thus
we should only set it _after_ the message was emitted.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-08-20 12:54:48 -07:00
Stephen Hemminger d89fbf3223 Explain TC class id limits 2012-08-20 10:58:58 -07:00
Li Wei da7fbb24c7 iproute2: configure: Add search path for 64bit library.
Use pkg-config to tell us the library path and fallback to search
old paths if xtables.pc not exists.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
2012-08-20 09:01:16 -07:00
Li Wei 083b46bbe9 iproute2: fix typo in help message.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
2012-08-20 09:00:16 -07:00
Stephen Hemminger c7f04f021c Fix formatting of ip.8 family man page
fix bad formatting in description of -f option
2012-08-17 15:28:59 -07:00
Dan Kenigsberg f1675d615b utils: invarg: msg precedes the faulty arg
fix all call which reversed the arg order.

Signed-off-by: Dan Kenigsberg <danken@redhat.com>
2012-08-17 13:35:36 -07:00
Chris Webb 9069817033 Correct the bridge command name in help messages
The bridge command used to be called br but was renamed bridge. Correct
the outdated references to the br name in the help messages, together with a
typo of '-help' for 'help'.

Signed-off-by: Chris Webb <chris@arachsys.com>
2012-08-16 14:02:46 -07:00
Florian Westphal c487348a9c add ematch man page 2012-08-13 08:34:13 -07:00
Florian Westphal 8194411a42 tc: add ipset ematch
example usage:
tc filter add dev $dev parent $id: basic match not ipset'(foobar src)' ..

also updates iproute2/ematch_map, else tc complains:
Error: Unable to find ematch "ipset" in /etc/iproute2/ematch_map
Please assign a unique ID to the ematch kind the suggested entry is:
        8       ipset

when trying to use this ematch.

(text ematch (5) only exists in kernel, a vlan ematch (6) exists neither in
 kernel nor userspace, but kernel headers define TCF_EM_VLAN == 6).
2012-08-13 08:33:50 -07:00
Mike Frysinger af9d406f99 Fix regression with 'ip address show'
`ip a s` no longer shows addresses since 3.4.0 works, but 3.5.0,

the simple test case:
make clean && make -j -s && ./ip/ip a s lo

before that change, i would get:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

but after, i now get:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

seems like the bug was introduced in the middle of that patch:

-	if (filter.family != AF_PACKET) {
+	if (filter.family && filter.family != AF_PACKET) {
+		if (filter.oneline)
+			no_link = 1;
+
 		if (rtnl_wilddump_request(&rth, filter.family, RTM_GETADDR) < 0) {
 			perror("Cannot send dump request");
 			exit(1);

if i revert the change to the if statement there, `ip a s` works for me again.
2012-08-13 08:09:52 -07:00
Xose Vazquez Perez 6d10827c79 Fix Makefile's
Missing space in man8 Makefile and install bridge command with
correct name
2012-08-13 08:06:21 -07:00
Jiri Pirko d992f3e611 iplink: add support for num[tr]xqueues 2012-08-01 16:19:55 -07:00
Eric Dumazet c6d6c92c2c ss: report SK_MEMINFO_BACKLOG
linux-3.6-rc1 supports SK_MEMINFO_BACKLOG with commit d594e987c6f54
(sock_diag: add SK_MEMINFO_BACKLOG)

ss command can display it if provided by the kernel.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-08-01 16:16:43 -07:00
Rostislav Lisovy 13eea5a600 add can.h 2012-08-01 16:14:55 -07:00
Saurabh 7357933907 iproute2: VTI support for ip link command.
Support for VTI via rt netlink.

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
2012-08-01 16:13:32 -07:00
Saurabh Mohan eec476088a VTI support for ip tunnel
Configure VTI using 'ip tunnel'
2012-08-01 16:11:25 -07:00
Stephen Hemminger a564b70942 Update to 3.6.0-pre headers
These are pre -rc1 version of santised kernel headers
2012-08-01 16:08:53 -07:00
Stephen Hemminger a27875b0f8 v3.5.0 2012-08-01 15:25:51 -07:00
Stephen Hemminger d04bc300c3 Add bridge command
New tool to allow manipulating forwarding entries and monitoring
bridge events.
2012-08-01 15:23:49 -07:00
Stephen Hemminger bc84585e47 man8: build cleanup
Rearrange Makefile, and ignore derived files
2012-08-01 14:58:15 -07:00
Ben Hutchings 4d35434771 ss: Report MSS from internal TCP information
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-07-31 14:06:51 -07:00
Stephen Hemminger fa1f7441a9 Remove reference to multipath algorithms in usage
IP multipath algorithms support was removed several revisions ago.
Remove from usage as well
2012-07-26 16:12:20 -07:00
Stephen Hemminger 8d07e5f7d9 Refactor ipaddr_list_or_flush
Alternative solution to problem reported by Pravin B Shelar <pshelar@nicira.com>
Split large function ipaddr_list_or_flush into components.
Fix memory leak of address and link nlmsg info.
Avoid fetching address info if only flushing.
2012-07-13 13:37:50 -07:00
Li Wei 524de02728 tc-bfifo: man: Add parameter value range.
Add value range for 'limit' parameter.
2012-07-13 10:01:20 -07:00
Li Wei 6cef544b96 tc: man: change man page and comment to confirm to code's behavior.
Since the get_rate() code incorrectly interpreted bare number, the
behavior is not the same as man page and comment described.

We need to change the man page and comment for compatible with the
existing usage by scripts.
2012-07-12 09:05:28 -07:00
Li Wei 3cde191f60 tc: man: add 'delete' command.
Add the missing 'delete' command for qdisc, class and filter, and
correct 'remove' to 'delete'.
2012-07-11 07:52:29 -07:00
Li Wei 424adc19bf tc: filter: validate filter priority in userspace.
Because we use the high 16 bits of tcm_info to pass prio value to
kernel, thus it's range would be [0, 0xffff], without validation
in tc when user pass a lager(>65535) priority, the actual priority
set in kernel would confuse the user.

So, add a validation to ensure prio in the range.
2012-07-10 15:39:30 -07:00
Hiroaki SHIMODA 690b11f4a6 tc: u32: Fix firstfrag filter.
On current firstfrag filter, all non fragmented packets are matched.
firstfrag should check MF bit.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Hiroaki SHIMODA 1d62f99fe2 tc: u32: Fix icmp_code off.
The off of icmp_code is not 20 but 21. Also offmask should be 0 unless
nexthdr+ is specified.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Li Wei 3c4f545633 tc: prio: Perform more strict check on priomap.
Since band number counts from zero thus band must be little than
opt.bands.
2012-06-18 12:25:08 -07:00
Li Wei 8c8a9089ba tc: man: Fix incorrect parameter format in prio.
Parameter priomap use blank instead of comma to separate bands,
update manpage to confirms to this.
2012-06-18 12:24:20 -07:00
Vijay Subramanian 05f1801c79 tc: Update manpage
This makes 2 changes:
1: Add fq_codel to SEE ALSO section in tc manpage.
2: Reorder the SEE ALSO section to make the order alphabetical
(suggested by Jan Ceuleers ).

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-06-11 15:02:37 -07:00
Vijay Subramanian 65e472d967 tc-fq_codel: Add manpage
This patch adds the manpage for the FQ_CoDel (Fair Queuing Controlled-Delay)
AQM.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-06-11 15:02:13 -07:00
Eric Dumazet 62e2e54091 ip: speedup ip link
ip link has quadratic behavior because store_nlmsg()
has a head list pointer and search the end of list.

Provides a head/tail to cut time.

Time with 128000 net devices, to do "ip link show dev xxx"

Before: 2m3.594s
After: 0m2.830s

Signed-off-by: Eric Dumazet <edumazet@google.com>
2012-06-11 14:55:23 -07:00
Jan Ceuleers e1b59459da Add reference to tc-codel(8) to the SEE ALSO section
Reported-by: Andy Furniss <andyqos@ukfsn.org>
Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org>
2012-06-04 12:02:30 -07:00
Bjarni Ingi Gislason d18086ccde tc-drr(8): tab unquoted in a argument to a macro
<groff: tc-drr.8>:67: warning: tab character in unquoted macro argument
<groff: tc-drr.8>:69: warning: tab character in unquoted macro argument

*********************

Originally filed at: http://bugs.debian.org/674706

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-05-29 08:17:46 -07:00
Bjarni Ingi Gislason 7c34520bf5 tc(8): Negative indent and missing "-" after an escape
<groff: tc.8>:51: warning: total indent cannot be negative
<groff: tc.8>:57: warning: escape character ignored before `i'

*********************

Space at end of line removed

  General considerations

a) Manuals should usually only be left justified.  Use ".ad l"
as the first regular command.

b) Each sentence should begin on a new line.  The conventions
about the amount of space between sentences are different.  This
also makes a check on the number of space characters between
words easier.

c) Separate numbers from units with a (no-break) space.  A
no-break space can be code 0xA0, "\ " (\<space>), or "\~"
(groff).

d) Use macros "TS/TE" for tables with more than two columns.
Then use

'\" t

as the first line in the source to tell "man" to use the "tbl"
preprocessor.

e) Protect last period (full stop) in abbreviations with "\&",
if it is or might be (through new formatting of source) at the
end of line, if it is also not an end of sentence.

*********************

Originally filed at: http://bugs.debian.org/674704

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-05-29 08:17:12 -07:00
Jan Ceuleers 6fdd09d6a5 tc-codel: Fix typos in manpage
Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org>
2012-05-25 08:50:32 -07:00
Vijay Subramanian 50a3ec3c46 tc-codel: Update usage text
codel can take 'noecn' as an option. This also makes it consistent with the
manpage.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-24 15:02:05 -07:00
Vijay Subramanian 28c5805322 tc-codel: Add manpage
This patch adds the manpage for the CoDel (Controlled-Delay) AQM.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-24 09:03:53 -07:00
Chris Elston 6618e334ba iproute2: allow IPv6 addresses for l2tp local and remote parameters
Adds support for parsing IPv6 addresses to the parameters local and
remote in the l2tp commands. Requires netlink attributes L2TP_ATTR_IP6_SADDR
and L2TP_ATTR_IP6_DADDR, added in a required kernel patch already submitted
to netdev.

Also enables printing of IPv6 addresses returned by the L2TP_CMD_TUNNEL_GET
request.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
2012-05-22 14:24:46 -07:00
Eric Dumazet c3524efc14 fq_codel: Fair Queue Codel AQM
Fair Queue Codel packet scheduler

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]
                      [ quantum BYTES ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Changli Gao <xiaosuo@gmail.com>
2012-05-22 14:17:49 -07:00
Eric Dumazet 185d88f99b tc_codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

http://queue.acm.org/detail.cfm?id=2209336

This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.

As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :

target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)

Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.

Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                          [ interval TIME ] [ ecn ]

qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
 Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
 rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
  count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
  maxpacket 1514 ecn_mark 84399 drop_overlimit 0

CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.

One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
2012-05-22 14:13:52 -07:00
Vijay Subramanian 1070205dc0 tc-netem: Add support for ECN packet marking
This patch provides support for marking packets with ECN instead of
dropping them with netem. This makes it possible to make use of the
netem ECN marking feature that was added recently to the kernel.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-22 14:10:21 -07:00
Vijay Subramanian 82613f9252 Update tc-netem manpage to add ecn capability
This patch updates the netem manpage to describe how to use
netem to mark packets with ecn instead of dropping them.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-22 14:09:09 -07:00
Stephen Hemminger e419f2d6f5 Remove derived man pages
These man pages are now built from templates
2012-05-22 14:03:37 -07:00
Stephen Hemminger 5e4dc84ff7 Update headers to 3.5 merge window
Use sanitized version of kernel headers from 3.5 pre-rc1 merge
2012-05-22 14:02:49 -07:00
Andreas Henriksson 6e30461e73 iproute2: man page and /bin/ip disagree on del vs delete
Reported by Robert Henney:
> the 'ip' man page does not mention the command "del" at all but does
> claim, "As a rule, it is possible to add, delete and show (or list ) objects".
> however, 'ip' does not always recognize "delete" as a commend.
>
> robh@debian:~$ ip tunnel delete
> Command "delete" is unknown, try "ip tunnel help".

Lets use "delete" in all calls to matches() for consistency. This will
make both "del" and "delete" work everywhere.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-05-21 15:17:28 -07:00
Stephen Hemminger db70d91c78 v3.4.0 2012-05-21 14:12:19 -07:00
Andreas Henriksson 9fc56974ac iproute2: trivial fix of ip link syntax in manpage
Reported by Ivan Vilata i Balaguer <ivan@selidor.net>
found that the description of the `ip link add` command in the manpage
is outdated regarding the compulsory `link DEVICE` option.
For instance, `ip  link help` says:
    Usage: ip link add [link DEV] [ name ] NAME
     ...
But the manpage still says:
     ip link add link DEVICE [ name ] NAME

(Trying to provide a `link` option e.g. under an LXC container can frustrate
 the creation of dummy devices which don't need an actual device.)

The syntax of the "ip link help" output was fixed in commit
"iproute2: Fix usage and man page for 'ip link'" (a22e92951d).
This updates the manpage to mark "link DEVICE" as an optional
argument there as well.

  http://bugs.debian.org/673171

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-05-21 14:11:58 -07:00
Vijay Subramanian e6232cf647 Update man8 Makefile
Commit (761a1e60 iproute2 - Split up manual page installation )
introduced man/man8/Makefile but did not add all the man pages.
This patch adds the missing man pages for installation.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-03 14:16:09 -07:00
James Chapman 6121e1fef8 iproute2: add ip-l2tp man page
Add a man page to cover the "ip l2tp" commands. Add a reference to it
in the main ip page.

This version removes the unnecessary setting of promiscuous mode
in the examples.

Signed-off-by: James Chapman <jchapman@katalix.com>
2012-05-03 08:31:06 -07:00
Shan Wei 910b039771 ss: use new INET_DIAG_SKMEMINFO option to get more memory information for tcp socket
INET_DIAG_SKMEMINFO is used to monitor socket memory information
which contains more information than INET_DIAG_MEMINFO.

-m option is retained for old kernel that don't surpport INET_DIAG_SKMEMINFO.

Signed-off-by: Shan Wei <davidshan@tencent.com>
2012-05-03 08:27:28 -07:00
Stephen Hemminger e278088076 Revert "iproute2: allow IPv6 addresses for l2tp local and remote parameters"
This reverts commit 16eba34485.
Hold off until next release.
2012-04-26 08:06:38 -07:00
Chris Elston 16eba34485 iproute2: allow IPv6 addresses for l2tp local and remote parameters
Adds support for parsing IPv6 addresses to the parameters local and
remote in the l2tp commands. Requires netlink attributes L2TP_ATTR_IP6_SADDR
and L2TP_ATTR_IP6_DADDR, added in a required kernel patch already submitted
to netdev.

Also enables printing of IPv6 addresses returned by the L2TP_CMD_TUNNEL_GET
request.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
2012-04-25 13:12:37 -07:00
Christoph J. Thompson 5c434a9e5a iproute2 - Fix up and simplify variables pointing to install directories
Define where is the are located the iproute2 config files.
Get rid of trailing slashes for paths in several file.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:49:10 -07:00
Christoph J. Thompson 761a1e6028 iproute2 - Split up manual page installation
Generate manual pages based on where the config files are installed.
Add missing manual pages for utilities which are links to other binaries.
Make tc-pfifo.8 a real file that points to tc-bfifo.8 instead of symlink
which causes problems with compressing manual pages.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:47:19 -07:00
Christoph J. Thompson c8610020b8 iproute2 - Split up cflags
Allows setting optimisation flags at compile time without patching the Makefile.

modified:   Makefile

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:39:05 -07:00
Christoph J. Thompson fb72129b78 iproute2 - Don't hardcode the path to config files
Allows using an alternate path for config files.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:37:18 -07:00
Rose, Gregory V bd886ebb1f iproute2: Add netlink attribute to filter dump requests
Add a new netlink attribute type to the dump request to allow
filtering of the information returned for the respective matching
interfaces.  At this time the only filter defined is to request
virtual function (VF) device info for interfaces that attached VFs.

It will also be possible to extend the request with other yet to be
defined netlink attributes in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
2012-04-12 09:36:30 -07:00
Stephen Hemminger 29cea29df0 Merge in 3.3-rc2 kernel headers 2012-04-10 09:11:21 -07:00
Florian Westphal 9fb6dc2bef tc: man: choke counts packets, not bytes 2012-04-10 09:02:13 -07:00
Eric Dumazet 930a75f925 Fix ss if INET_DIAG not enabled in kernel
If kernel doesn't have INET_DIAG and using newish version of iproute
nothing would be displayed.
2012-04-10 09:00:16 -07:00
Stephen Hemminger ff24746cca Convert to use rta_getattr_ functions
User new functions (inspired by libmnl) to do type safe access
of routeing attributes
2012-04-10 08:47:55 -07:00
Jorge Boncompte [DTI2] 49b730d7b2 iproute: show metrics as an unsigned value
Avoids showing negative metrics.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
2012-04-10 08:23:59 -07:00
Stephen Hemminger 4ccfb44dfb Make link mode case independent
The link mode is printed in upper case, and following the general
rule that ip command output should work on input, allow either case.
2012-04-05 15:10:19 -07:00
Stephen Hemminger 4f2fdd44b6 Add ability to set link state with ip
Exposes existing netlink operations to modify link state of devices.
2012-04-05 15:08:57 -07:00
João Valverde ae5555d334 ipl2tp: allow setting session interface name
Hi,

I've attached a trivial patch for iproute2 to allow naming interfaces
created with "ip l2tp add session".

I believe patches should go through the netdev mailing list but this
patch is so small I figured that would just add noise. Hope that's OK.

Originally I thought I would need a bigger patch and was going to take a
stab at implementing something like

ip l2tp add tunnel L2TP_TUNNEL_ARGS
ip link add name NAME  [ LINK_OPTS ] type l2tp L2TP_SESSION_ARGS

(a better interface IMHO) but all the code was there already, all that I
needed to add was option parsing.

Thanks,

João Valverde

From fd8c3b712527d2e959aeabc6f6b71a9910e7be7e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jo=C3=A3o=20Valverde?= <joao.valverde@ist.utl.pt>
Date: Mon, 26 Mar 2012 18:30:56 +0100
Subject: [PATCH] ipl2tp: allow setting session interface name
2012-04-03 11:38:51 -07:00
Stephen Hemminger 4bb00cd2b7 v3.3.0 2012-03-19 17:27:12 -07:00
Stephen Hemminger 82499282b2 ip: allow set and display of link mode parameter
The kernel supports a link mode attribute (which can be dormant or default).
This attribute is used to control how the link watch engine
handles operstate transistion.

This adds a new parameter to ip link command to allow setting and
displaying the value.
---
2012-03-19 17:24:43 -07:00
Stephen Hemminger 718165534d gre: allow 0 as a legal key value
There is nothing in the standard that says 0 can't be used as a key.
It makes sense to allow it. Also fix typo where ikey was printed for
when printing okey.
2012-03-19 17:18:49 -07:00
Stephen Hemminger 7dd0371222 Fix rta_getattr_u32 wrapper and add getattr_u8 2012-03-15 17:47:51 -07:00
Florian Westphal 598a42c091 ip: xfrm: report nat-t/encapsulation portmapping updates
Signed-off-by: Florian Westphal <fw@strlen.de>
2012-03-15 14:49:03 -07:00
Stephen Hemminger c23abafbdc update to 3.3-rc7 kernel headers 2012-03-15 14:44:13 -07:00
Kenyon Ralph 43d29f782f Update ip address manual page
* update synopsis to match "ip address help" output
* specify IPv4, since "IP" is ambiguous
* remove deprecated site scope
* document lifetimes, home, and nodad
* update wording to make sense since page was split from the ip(8) page
* git rid of extra spaces
2012-03-15 14:39:12 -07:00
Anton Danilov 90d98edf39 csum action, fix typo 2012-03-15 14:24:59 -07:00
Stephen Hemminger b8d59e1ec1 Fix ip-monitor manual page what-is entry
Debian warning that NAME wasn't parsible
2012-03-14 10:38:53 -07:00
Stephen Hemminger 09fa327941 iproute: allow changing gretap parameters
Change the order of evaluation of ip link type arguements to allow
changing parameters of gre tunnels.

The following wouldn't work:
 # ip li add mytunnel type gretap remote 1.1.1.1 key 3
 # ip li set mytunnel type gretap key 9
2012-03-14 10:28:33 -07:00
Stephen Hemminger a6ddc20617 Keep cscope around after make clean
Follow convention of kernel and keep cscope file around
after make clean.
2012-03-14 10:22:25 -07:00
Andreas Henriksson f526af995e iproute: fix tc -iec display of Mibit rates
As reported by Thomas Mühlgrabner <muehltom@cable.vol.at>
in http://bugs.debian.org/662979 :

 When showing htb class configuration with "tc -iec class show",
 the output for Mibit is actually the value for bit.
 Example: configure a class with a ceil of 1000Mibit.
 Output states 1048576000 Mibit.

The cause is missing parenteses in the display code of tc....

(Please also note that a lower value of 100Mibit will be displayed
as 102400 Kibit, which I think is kind of ugly.)

Reported-by: Thomas Mühlgrabner <muehltom@cable.vol.at>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-03-10 09:13:58 -08:00
Yegor Yefremov 8ced4fcd50 iproute2: cleanup dependencies
LIBNETLINK will be defined in the main Makefile, so
both ../lib/libnetlink.a ../lib/libutil.a will be
automatically appended during linking. Otherwise
../lib/libnetlink.a ../lib/libutil.a will appear
twice during linking.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
2012-02-27 08:27:54 -08:00
Petr Sabata e2a4536a43 iproute2: tc - mqprio formatted print fix
Just a minor correction of mqprio printf()'s.

Reported-by: Petr Písař <ppisar@redhat.com>
Signed-off-by: Petr Šabata <contyk@redhat.com>
2012-02-22 15:23:12 -08:00
Stephen Hemminger e6e6fb5c6a ipaddress: cleanup code for link stats64
On 64 bit platform, casting to unsigned long long is unnecessary.
Use inttypes.h and stdtypes.h to resolve it.
2012-02-21 17:18:59 -08:00
Stephen Hemminger 0df7db3cf4 arpd: allow configuring polling interval
A new option -p is added to the arpd command that accepts
a time indicating the number of seconds
to wait between kernel arp table polling attempts.
The minimum value is .1 (100ms).

If not specified, polling defaults to 30 seconds.

Patch by Erik Hugne <erik.hugne@ericsson.com> with
modifications
2012-02-17 08:17:09 -08:00
Stephen Hemminger 2728f598bb ss: simplify code
Rather than copy-pasting code using sendmsg/recvmsg, use the simpler
send() and recv() system calls.
2012-02-16 16:42:42 -08:00
Matt Tierney c51577cd13 ss: Close file descriptors in tcp_show_netlink.
ss: Close file descriptors in tcp_show_netlink.

Signed-off-by: Matt Tierney <tierney@cs.nyu.edu>
2012-02-16 16:31:35 -08:00
Stephen Hemminger 20ed7b24df dhcp-client-script: don't use /tmp
/tmp is a dangerous place and better to put log files in /var/log.
Based on patch by Vasiliy Kulikov <segoon@openwall.com>
2012-02-15 10:05:45 -08:00
Stephen Hemminger e557d1ac3a Don't put configure files in /tmp
Based on patch by Vasiliy Kulikov <segoon@openwall.com>
Don't use /tmp since it is dangerous, instead put temporary files
from configure script in build directory. This is what autoconf
generated configure does.
2012-02-15 10:03:39 -08:00
Tony Zelenoff cdae8bcc0f Adjust man page for new functionality
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
2012-02-09 15:06:53 -08:00
Tony Zelenoff 1dac7817b4 Modify neighbour proxy show
New "ip neigh show proxy" command now can show proxies which
were added with "ip neigh add proxy" command. Kernel code to
support this feature sent a bit earlier to netdev.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
2012-02-09 15:06:52 -08:00
Greg Rose 0e56c6b69b iproute2: Add VF spoofchk command description to ip-link.8 man page
Add documentation for the ip link set spoof checking option.  The
expanded text section explaining the VF commands was missing this
text.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-02-09 15:02:43 -08:00
Stephen Hemminger d798a0483e red: add missing include math.h
red now uses pow() function.
2012-02-06 09:45:50 -08:00
Stephen Hemminger cfd2cbd15f Add cast to rta_getattr_str
Warning from C++
2012-02-06 09:35:27 -08:00
Eric Dumazet a3fd8e58c1 ss: should support CONFIG_INET_UDP_DIAG=n kernels
ss -x currently fails if CONFIG_INET_UDP_DIAG=n or old kernels

Also close file descriptors while we are at it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
2012-01-30 08:12:50 -08:00
Eric Dumazet 50c6f3ee5b tc-sfq: update man page
Add documentation about RED mode, and new parameters (flows, depth)
added in linux 3.3

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-24 08:04:06 -08:00
Eric Dumazet e61df2105c tc-red: update man page
include documentation for harddrop and adaptive parameters.

All parameters but limit and avpkt are optional.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-24 08:04:06 -08:00
Vijay Subramanian e330c1853d netem: Fix 'reorder' section of man page
The syntax used in the example on reordering in the manpage is inconsistent with
the usage syntax.  Moreover, the text does not describe the reordering process
in the kernel correctly. This patch fixes these two issues.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-01-24 08:04:06 -08:00
Vijay Subramanian 0bdb83cd70 netem: Fix up grammatical errors in man page
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-01-24 08:04:06 -08:00
Stephen Hemminger f606236010 Fix unix socket diagnostic build
Get updated headers incorporated into build environment
and include required sock_diag.h.
2012-01-20 12:48:00 -08:00
Pavel Emelyanov dfbaa90dec iproute: Dump unix sockets via netlink
Get the same info as from /proc file plus the peer inode.

Applies on top of new sock diag patch and udp diag patch.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2012-01-20 12:43:21 -08:00
nick black db4a7f198b Update ip manpage
Fix synopsis and other references to match current code.
2012-01-20 12:32:44 -08:00
Vijay Subramanian 14a1c164d1 netem: Fail cleanly if user input is wrong
(Resending patch since it looks like my earlier mail did not make it to
netdev).

netem reordering requires that the delay parameter be given. Currently, if no
delay is given, tc prints the error message but still installs the qdisc. Fix
this by printing the usage and failing cleanly.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-01-20 11:21:58 -08:00
Stephen Hemminger fdeae171e5 Merge branch 'master' of nehalam:src/iproute2 2012-01-20 08:17:59 -08:00
Stephen Hemminger 5aa08f6bf4 ip: make 'ip l' be 'ip link'
Restore compatiablity for those lazy typists.
2012-01-20 08:16:02 -08:00
Eric Dumazet 1b6f0bb5be gred: support TCA_GRED_MAX_P attribute
TCA_GRED_MAX_P permits to express high resolution probabilities.

New output (on 3.3+ kernel) :

disc gred 9442: root refcnt 17
 DP:0 (prio 1) Average Queue 0b Measured Queue 0b
	 Packet drops: 0 (forced 0 early 0)
	 Packet totals: 20 (bytes 2584)
 limit 31460b min 3000b max 9000b ewma 5 probability 0.05 Scell_log 15

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:24 -08:00
Eric Dumazet 650252d8c3 choke: support TCA_CHOKE_MAX_P
TCA_CHOKE_MAX_P permits to express high resolution RED probability.

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec choke \
	limit 90 ecn min 10 max 30 probability 0.05 bandwidth 10Mbit

Before patch :

tc -s -d qdisc show dev eth3
qdisc ... limit 90p min 10p max 30p ecn ewma 3 Plog 19 Scell_log 13

After :

qdisc ... limit 90p min 10p max 30p ecn ewma 3 probability 0.05
Scell_log 13

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:23 -08:00
Eric Dumazet 6987ecf083 sfq: add optional RED on top of SFQ
Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
	limit 3000 headdrop flows 512 divisor 16384 \
	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 ewma 6 min 8000b max 60000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 4876 prob_drop 6131
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
 rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
 rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:22 -08:00
Eric Dumazet 54a2fce832 red: fix adaptive spelling
Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:21 -08:00
Eric Dumazet e7e4abea3e red: Add adaptative algo Logged in as shemminger
Enable Adaptative RED algo, using :

tc qdisc  ... red limit BYTES ... adaptative ...

Support of high precision probability/max_p setting and reporting, with
support of old kernels.

With a new kernel, "Plog ..." is replaced in tc output by "probability
value" :

qdisc red 10: dev eth3 parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma
5 probability 0.09 Scell_log 15
2012-01-19 14:45:20 -08:00
Stephen Hemminger 59858866e3 netem: add rate extension to man page
Fixed up version of patch from Hagen Paul Pfeifer <hagen@jauu.net>
Also run spell check.
2012-01-19 14:38:36 -08:00
Hagen Paul Pfeifer 6b8dc4deea tc: netem rate shaping and cell extension
This patch add rate shaping as well as cell support. The link-rate can be
specified via rate options. Three optional arguments control the cell
knobs: packet-overhead, cell-size, cell-overhead. To ratelimit eth0 root
queue to 5kbit/s, with a 20 byte packet overhead, 100 byte cell size and
a 5 byte per cell overhead:

	tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2012-01-19 14:28:27 -08:00
Hagen Paul Pfeifer 30d10db566 utils: add s32 parser
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2012-01-19 14:24:52 -08:00
Masatake YAMATO aa38c3eefa using NLM_F_DUMP flag constant in libnetlink.c
This is trivial patch for libnetlink.c in iproute2.

In iproute2/include/linux/netlink.h NLM_F_DUMP is defines as:

   #define NLM_F_DUMP	(NLM_F_ROOT|NLM_F_MATCH)

It is not used in libnetlink.c. If used, the code becomes a bit easier
to read.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
2012-01-19 14:16:12 -08:00
Vijay Subramanian 1c4cbdbc6a netem: Add missing '}' in man page
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-01-19 14:11:07 -08:00
Stephen Hemminger 7878c0ba40 Update to 3.3 headers (with inet_diag fix)
Incorporate change to fix inet_diag build failure.
2012-01-19 14:09:42 -08:00
Stephen Hemminger a08d2590a0 Update to kernel v3.3 headers
Initial merge window version of headers
2012-01-10 10:50:02 -08:00
Stephen Hemminger aab2702d33 Fix man page whatis entry errors
lintian says:
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man7/tc-hfsc.7.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-address.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-addrlabel.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-link.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-maddress.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-monitor.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-mroute.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-neighbour.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-netns.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-ntable.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-route.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-rule.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-tunnel.8.gz
W: iproute: manpage-has-bad-whatis-entry usr/share/man/man8/ip-xfrm.8.gz
2012-01-10 10:47:28 -08:00
Stephen Hemminger 13603f6a9e iplt2p: remove unused libnl headers
Leftover from change to original code.
2012-01-10 08:50:49 -08:00
Stephen Hemminger 447c118f13 v3.2.0 2012-01-05 08:34:31 -08:00
Jan Engelhardt 8e91a80d97 iproute2: fix calling up the xt action
Upsteam: has not been sent yet

Requesting the xt action never succeeded because it registered
using the wrong name.
2012-01-03 15:07:38 -08:00
Jan Engelhardt d7aa57d450 iproute2: proper detection of libxtables position and flags
Upstream: not sent yet

Any tests involving iptables _MUST_ utilize pkg-config to find the
proper locations of the installation.
2012-01-03 15:05:25 -08:00
Stephen Hemminger 6c513a0061 README cleanup's
Spell check the README files and remove out of date release notes.
2012-01-03 15:04:55 -08:00
Stephen Hemminger 155ad8023b ematch: fix warning about unused input()
Use existing compile flag to indicate that input() is not used
by tc ematch, fixes compiler warning.
2012-01-03 13:55:59 -08:00
Stephen Hemminger 5761f04fb8 ematch: fix warning about yyerror and const
yyerror() should take const char * on current bison.
2012-01-03 13:55:00 -08:00
Jan Engelhardt f5b830dc5d iproute2: avoid use of implicit declarations
gcc -DLIBDIR=\"/usr/lib64\" -D_GNU_SOURCE -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wstrict-prototypes -fPIC -DXT_LIB_DIR=\"/usr/lib64/xtables\" -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib64\" -fPIC   -c -o ipx_pton.o ipx_pton.c
In file included from ../include/utils.h:8:0,
                 from ipx_ntop.c:5:
../include/libnetlink.h: In function 'rta_getattr_u64':
../include/libnetlink.h:84:2: warning: implicit declaration of function 'memcpy'
../include/libnetlink.h:84:2: warning: incompatible implicit declaration of built-in function 'memcpy'
2012-01-03 13:48:04 -08:00
Stephen Hemminger 38cd311ade l2tp: Add l2tp support
Based on earlier implementation by James Chapman. But instead of
dragging in all of libnl, use existing libnetlink infrastructure.
2011-12-29 09:35:37 -08:00
Stephen Hemminger 46c5d64d69 libnetlink: add attribute access inline functions
Based on idea in libmnl, add attribute access functions instead
of explicitly exposing casts. Also handle possible alignment issues
of u64.
2011-12-29 09:29:33 -08:00
Stephen Hemminger cd70f3f522 libnetlink: remove unused junk callback
Both rtnl_talk and rtnl_dump had a callback for handling portions
of netlink message that do not match the correct pid or seq.
But this callback was never used by any part of iproute2 so remove
it.
2011-12-28 10:37:12 -08:00
Eric Dumazet d060de7f8d netem: fix a typo in explain()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-24 11:21:33 -08:00
Stephen Hemminger 2aa3dd29a7 libnetlink: add more attribute functions
New functions to handle u8, u16, u32, u64 and string attribute types.
Use common code for all attribute wrappers.
2011-12-23 10:43:54 -08:00
Stephen Hemminger 6cf8398f5f libnetlink: change rtnl_send() to take void *
Avoid having to cast buffer being sent.
2011-12-23 10:41:50 -08:00
Stephen Hemminger 3c7950af59 netem: add support for 4 state and GE loss model
Incorporate support for new loss models.
2011-12-22 17:08:11 -08:00
Stephen Hemminger fd6fac34e9 netem: fix man page
Format man page with conventional style (BNF and italics) to make
it match other pages.

Fix loss model options to match what will implemented!
2011-12-22 16:50:54 -08:00
Stephen Hemminger 1b1177ed5f Update to latest 3.2 kernel headers
Keep in sync
2011-12-22 10:40:39 -08:00
Stephen Hemminger 2a9721f1c4 Split up ip man page
The man page for ip command had grown too large to be readable.
Break it up into separate pages.
2011-12-22 10:34:03 -08:00
Florian Westphal 2587c01a0e tc: man: add man page for stochastic fair blue
With help from Eric Dumazet.
Man page is derived in parts from README file contained in
Juliusz Chroboczeks original sfb kernel patch.
2011-12-22 08:54:26 -08:00
Eric Dumazet 841fc7bc98 red: harddrop support and cleanups
Add harddrop support (kernel support added a long time ago), and various
cleanups.

min BYTES, max BYTES are now optional and follow Sally Floyd's
recommendations.

By the way, our default 2% probability is a bit low, Sally recommends 10%.
Not a big deal if upcoming adaptative algo is deployed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-08 16:43:18 -08:00
Hagen Paul Pfeifer cd72dcf13c netem: add man-page
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2011-12-01 09:26:35 -08:00
Eric Dumazet ab15aeacf5 red: make burst optional
Documentation advises to set burst to (min+min+max)/(3*avpkt)

Let tc do this automatically if user doesnt provide burst himself.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-01 09:23:49 -08:00
Eric Dumazet 0cf67ead7b red: give a hint about burst value
Check for burst values that are too small.

Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-01 09:23:43 -08:00
Greg Rose 7b8179c780 iproute2: Add new command to ip link to enable/disable VF spoof check
Add ip link command parsing for VF spoof checking enable/disable

V2 - Fixed problem with parsing of dump info on kernels that don't
     support the spoof checking option and also wrapped the ifla_vf_info
     structure in #ifdef __KERNEL__ to prevent user space from directly
     accessing the structure
V3 - Improved parsing of vfinfo
V4 - Put Makefile back to proper list of subdirs
V5 - Remove struct ifla_vf_info, it is only used by the kernel
V6 - Make sure spoof check is reported by the driver - rtnl will set
     it to -1 to indicate driver didn't report a value.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-11-23 14:53:12 -08:00
Eric Dumazet 719b958bbd ss: report ecnseen
Support ECNSEEN reporting in ss command.

ESTAB      0      0           10.170.73.123:4900
10.170.73.125:51001    uid:501 ino:385994 sk:f31e5f00
         mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:8,8 rto:210
rtt:18.75/15 ato:40 cwnd:10 send 69.9Mbps rcv_space:32768

"ecn" means TCP session negociated ECN capability (TCP layer) at setup
time

"ecnseen" at least one frame with ECT(0) or ECT(1) or ECN (IP layer) was
received from peer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-11-23 14:51:54 -08:00
Thomas Jarosch fcbd0165fc tc: Use correct variable type for get_distribution() result
get_distribution() returns an int.

cppcheck reported:
[tc/q_netem.c:243]: (style) Checking if unsigned variable 'dist_size' is less than zero.

The mismatch actually rendered the error checking
after get_distribution() ineffective.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-11-23 14:46:24 -08:00
Thomas Jarosch a3da01c519 tc: Remove unused variable 'res'.
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-11-23 14:46:21 -08:00
Stephen Hemminger 9044a4547d Update to 3.2.0-rc2 headers 2011-11-23 14:34:49 -08:00
Stephen Hemminger 9cbe6bc337 v3.1.0 2011-11-17 16:53:50 -08:00
Petr Šabata 16963ce6f0 Display closed UDP sockets on 'ss -ul'
This patch emulates 'netstat -ul' behavior, showing 'closed'
(state 07) UDP sockets when ss is called with '-ul' options.
Although dirty, this seems like the least invasive way to fix
it and shouldn't really break anything.

Signed-off-by: Petr Šabata <contyk@redhat.com>
2011-11-16 09:32:20 -08:00
Stephen Hemminger 93ba481acb cleanup ematch yacc files
make clean needs to remove all the yacc output files for ematch.
2011-11-02 16:39:36 -07:00
Michal Soltys 9bac173fa6 HFS manpage changes
Few minor changes and small additions.
2011-11-02 16:35:32 -07:00
Michal Soltys 41f6004139 HFSC (7) & (8) documentation + assorted changes
This patch adds detailed documentation for HFSC scheduler. It roughly
follows HFSC paper, but tries to not rely too much on math side of things.
Post-paper/Linux specific subjects (timer resolution, ul service curve, etc.)
are also discussed.

I've read it many times over, but it's a lengthy chunk of text - so try
to be understanding in case I made some mistakes.

tc-hfsc(7): explains algorithm in detail (very long)
tc-hfsc(8): explains command line options briefly
tc(8): adds references to new man pages
Makefile: adds man7 directory to install target
q_hfsc.c: minimal help text changes, consistency with tc-hfsc(8)

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-11-02 16:33:50 -07:00
Mike Frysinger aa48b5931a tc: fix parallel build file with lex/yacc
Building iproute2 in parallel might hit the race failure:
	emp_ematch.l:2:30: fatal error: emp_ematch.yacc.h:
		No such file or directory
	make[1]: *** [emp_ematch.lex.o] Error 1

This is because we currently allow the yacc/lex files to generate and
compile in parallel.  So add a simple dependency to make sure yacc has
finished before we attempt to compile the lex output.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-10-18 15:02:21 -07:00
Stephen Hemminger 7397944de6 ip: fix exit codes
Alternative fix to problem reported by: Bin Li
The issue is came from https://bugzilla.novell.com/show_bug.cgi?id=681952.

In any previous version (since suse ... 10.0?), ip addr add always returned
the error code 2 in case the ip address is already set on the interface:

    inet 172.16.2.3/24 brd 172.16.2.255 scope global bond0
RTNETLINK answers: File exists
2

On 11.4, it returns the exit code 254:

    inet 172.16.1.1/24 brd 172.16.1.255 scope global eth0
RTNETLINK answers: File exists
254

This of course causes ifup to return an error in this quite common case..
2011-10-13 08:38:33 -07:00
Thomas Jarosch 788731b320 Fix unterminated readlink() buffer usage
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-13 08:16:56 -07:00
Stephen Hemminger 707f612c00 Update to 3.1-rc9 kernel headers
Align header files with sanitized version of headers in kernel.
2011-10-10 11:02:42 -07:00
Stephen Hemminger ce691fb5ce v3.0.0 2011-10-10 08:59:54 -07:00
Stephen Hemminger b702f9cc37 ip: fix man page warnings
Fix usage of ".R" which is not man macro.
2011-10-10 08:55:56 -07:00
Thomas Jarosch 19bcc05bea Fix file descriptor leak on error in read_igmp()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:20:23 -07:00
Thomas Jarosch 297452a1c2 Fix file descriptor leak in do_tunnels_list()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:20:22 -07:00
Thomas Jarosch e588a7db16 Fix file descriptor leak on error in read_mroute_list()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:20:21 -07:00
Thomas Jarosch 67ef60a293 Fix file descriptor leak on error in read_viftable()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:18:41 -07:00
Thomas Jarosch 25352af7c2 Fix file descriptor leak on error in iproute_flush_cache()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:18:09 -07:00
Thomas Jarosch e9a927dc08 Add missing closedir() call in do_show()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:17:41 -07:00
Thomas Jarosch 1a6543c56b Fix memory leak of lname variable in get_target_name()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:17:10 -07:00
Thomas Jarosch 9f1ba57016 Fix wrong sanity check in choke_parse_opt()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:17:03 -07:00
Thomas Jarosch 6d5ee98a7c Fix wrong comparison in cmp_print_eopt()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:16:15 -07:00
Thomas Jarosch 97c13582f9 Fix file descriptor leak on error in rtnl_hash_initialize()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:15:28 -07:00
Jiri Benc 21a5a6b378 iproute2: fix changing of ip6ip6 tunnel parameters
When changing ip6ip6 parameters (ip -6 tun change), ip passes zeroed
struct ip6_tnl_parm to the kernel. The kernel then tries to change all of
the tunnel parameters to the passed values, including zeroing of local and
remote address. This fails (-EEXIST in net/ipv6/ip6_tunnel.c:ip6_tnl_ioctl).

For other tunnel types, ip fetches the current parameters first and applies
the required changes on top of them. This patch applies the same code as in
ip/iptunnel.c to ip/ip6tunnel.c.

See http://bugzilla.redhat.com/730627 for the original bug report.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2011-10-07 11:14:47 -07:00
Thomas Jarosch 2bcc3c1629 Fix pipe I/O stream descriptor leak in init_service_resolver()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:10:03 -07:00
Sridhar Samudrala a22e92951d iproute2: Fix usage and man page for 'ip link'
Add bridge as a supported type with 'ip link' in usage and all the missing
types in 'ip' man page. Also fixed some typos.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2011-10-07 11:04:46 -07:00
Petr Sabata 281d740691 iproute2: arpd - fix usage and manpage options
Signed-off-by: Petr Sabata <contyk@redhat.com>
2011-10-06 08:25:06 -07:00
Petr Sabata 7e8bd80e38 iproute2: lnstat - fix typos
Signed-off-by: Petr Sabata <contyk@redhat.com>
2011-10-06 08:25:05 -07:00
Petr Sabata 583de1498e iproute2: ss - fix missing parameters
Signed-off-by: Petr Sabata <contyk@redhat.com>
2011-10-06 08:25:04 -07:00
Stephen Hemminger 8555902504 Add ntable to ip man page
Add some documentation about ip neighbour table parameter command.
2011-08-31 13:23:04 -07:00
Dan McGee 4f3626f920 xt: only unset fields if m is non NULL 2011-08-31 12:18:49 -07:00
Dan McGee 9a230771c0 ensure uptime is initialized if /proc/uptime cannot be opened 2011-08-31 12:16:36 -07:00
Dan McGee 1b129bf2fe genl: remove unused code
remove unused basename logic, avoid dereference of possibly NULL variable
2011-08-31 12:15:22 -07:00
Dan McGee 1313ceb4d6 iptuntap: avoid double open
would leak a file handle
2011-08-31 12:14:51 -07:00
Eric W. Biederman 223f4d8ea6 iproute2: Fail "ip netns add" on existing network namespaces.
Use O_EXCL so that we only create and mount a new network namespace
if there is no chance an existing network namespace is present.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-08-31 11:02:26 -07:00
Eric W. Biederman 2e8a07f543 iproute2: Auto-detect the presence of setns in libc
If libc has setns present use that version instead of
rolling the syscall wrapper by hand.

Dan McGee found the following compile error:

    gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include
    -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib/\"   -c -o ipnetns.o ipnetns.c
    ipnetns.c:31:12: error: static declaration of ‘setns’ follows non-static
    declaration
    /usr/include/bits/sched.h:93:12: note: previous declaration of ‘setns’
    was here
    make[1]: *** [ipnetns.o] Error 1

Reported-by:  Dan McGee <dan@archlinux.org>
Tested-by:  Dan McGee <dan@archlinux.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-08-31 11:02:02 -07:00
Stephen Hemminger 0f28c38b34 Update headers to 3.0.4
Update the automatically generated sanitized headers
2011-08-31 11:00:26 -07:00
Dan McGee 44e743e588 Make iproute2 configure script more flexible
On Arch Linux, we still install the iptables shared libraries in
/usr/lib/iptables/, even though the main library is installed to
/usr/lib/libxtables.so. This change checks all available locations to
correctly find the iptables library directory.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-31 10:56:12 -07:00
Florian Westphal 05fb9184f2 tc: filter: fix default 'protocol all' on little-endian platforms
when specifiying filters without 'protocol' keyword, tc will
default to 'protocol all'.

Unfortunately, this missed a byte-ordering conversion.
2011-08-31 10:55:13 -07:00
Dan McGee f9eab60d17 iproute2: Remove ChangeLog
This hasn't been updated since 2006.
2011-08-31 10:51:42 -07:00
Stephen Hemminger 75dbf13791 Add LLDP to ethernet type table
and make type table const.
2011-08-31 10:45:04 -07:00
Florian Westphal 610b22a30f tc: man: update sfq man page
Document 'divisor' option and mention that external classifiers can be used.
2011-08-31 10:43:21 -07:00
Florian Westphal b1978178fa tc: man: add man page for choke scheduler 2011-08-31 10:43:10 -07:00
Bin Li ed0f006b86 ip: fix typo in ip link manual page
Extra bracket
2011-08-31 10:42:00 -07:00
Andreas Henriksson c0c44bfedd iproute2: Remove "monitor" from "ip route help" output
$ ip route help 2>&1 | grep monitor
ip route { add | del | change | append | replace | monitor } ROUTE
$ ip route monitor
Command "monitor" is unknown, try "ip route help".

(I guess what was really intended is "ip monitor route", so just remove
the argument from the help output.)

Originally reported by martin f krafft at http://bugs.debian.org/537681

While at it, also drop all non-existant (route,link,netns) monitor
arguments from the ip(8) man page.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2011-07-20 16:04:04 -07:00
Christoph Biedl c13f598242 ip: fix display of prefix cache info
The "ip monitor" command does properly decode the "preferred" and
"valid" lifetime records in router advertisements from netlink
messages.
2011-07-20 16:02:50 -07:00
Stephen Hemminger c441bd4c1b Add QFQ scheduler
Basic configuration support for QFQ.
Still need to add manual page.
2011-07-13 13:46:34 -07:00
Stephen Hemminger be181323c1 Remove redundant limits.h
redo.
2011-07-13 09:49:17 -07:00
Eric W. Biederman 0dc34c7713 iproute2: Add processless network namespace support
The goal of this code change is to implement a mechanism such that it is
simple to work with a kernel that is using multiple network namespaces
at once.

This comes in handy for interacting with vpns where there may be rfc1918
address overlaps, and different policies default routes, name servers
and the like.

Configuration specific to a network namespace that would ordinarily be
stored under /etc/ is stored under /etc/netns/<name>.  For example if
the dns server configuration is different for your vpn you would create
a file /etc/netns/myvpn/resolv.conf.

File descriptors that can be used to manipulate a network namespace can
be created by opening /var/run/netns/<NAME>.

This adds the following commands to iproute.
ip netns add NAME
ip netns delete NAME
ip netns monitor
ip netns list
ip netns exec NAME cmd ....
ip link set DEV netns NAME

ip netns exec exists to cater the vast majority of programs that only
know how to operate in a single network namespace.  ip netns exec
changes the default network namespace, creates a new mount namespace,
remounts /sys and bind mounts netns specific configuration files to
their standard locations.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-07-13 09:48:26 -07:00
Stephen Hemminger 21a85d3bec Fix test for EOF on continuation line
getline() returns -1 on EOF, need to not loose that by forcing
result to size_t (unsigned).

Reported-by: Petr Sabata
2011-07-11 10:38:10 -07:00
Stephen Hemminger e760a19a43 Update kernel headers to 3.0 2011-07-11 10:31:07 -07:00
Stephen Hemminger b5383aaac8 Update .gitignore 2011-07-11 10:29:12 -07:00
Andreas Henriksson 73de5d9680 iproute2: Fix building xt module against xtables version 6
iptables/xtables apparently changed API again.... Now you need to pass
and extra parameter (orig_opts) which was not needed before.

Sprinkle some lovely pre-processor magic to be compatible with both older
and new versions. In the beginning of times XTABLES_VERSION_CODE didn't
exist. Then it was (0x10000 * major + 0x100 * minor + patch) when it was
first introduced (according to git), but now it's at 6...
Don't know what official iptables releases has defined it to over time.
Lets just hope none of the older versions with is has the define
higher then 6 is still around.... so only the "current" versioning
scheme is supported.... lets see how long this lasts now.

For the API change in xtables, see:
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commitdiff;h=600f38db82548a683775fd89b6e136673e924097

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2011-07-11 10:18:14 -07:00
Petr Sabata 5582c0cffd iproute2: Remove unreachable code
This patch removes unreachable, useless code.

Signed-off-by: Petr Sabata <contyk@redhat.com>
2011-07-11 10:13:51 -07:00
David Ward cbec021913 xfrm: Update documentation
The ip(8) man page and the "ip xfrm [ XFRM-OBJECT ] help" command output
are updated to include missing options, fix errors, and improve grammar.
There are no functional changes made.

The documentation for the ip command has many different meanings for the
same formatting symbols (which really needs to be fixed). This patch makes
consistent use of brackets [ ] to indicate optional parameters, pipes | to
mean "OR", braces { } to group things together, and dashes - instead of
underscores _ inside of parameter names. The parameters are listed in the
order in which they are parsed in the source code.

There are several parameters and options that are still not mentioned or
need to be described more thoroughly in the "COMMAND SYNTAX" section of
the ip(8) man page. I would appreciate help from the developers with this.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2011-07-11 10:12:06 -07:00
Gilles Espinasse 4f69c63a4b iproute2: fix minor typo in comments
Signed-off-by: Gilles Espinasse <g.esp@free.fr>
2011-07-11 10:11:09 -07:00
Stephen Hemminger 8acd148fab v2.6.39 2011-06-29 16:01:48 -07:00
Stephen Hemminger 49dff8c88c xt match: fix set-never-used warning 2011-06-29 15:59:41 -07:00
Stephen Hemminger 02ee3dbc78 skbedit: fix set-never-used warning 2011-06-29 15:59:02 -07:00
Stephen Hemminger 18445b3e92 ss: check result of readlink
Don't ignore readlink failure.
2011-06-29 15:58:37 -07:00
Stephen Hemminger dc484542a9 Fix set-never-used warning in ifstat 2011-06-29 15:58:12 -07:00
Stephen Hemminger 2dd9f8e073 libnetlink: fix set never used warning 2011-06-20 14:34:46 -07:00
Stephen Hemminger bf808cbf84 tc: fix set never used warning in red 2011-06-20 14:34:30 -07:00
Stephen Hemminger d93b6b51e6 ip: iproute fix set never used warning 2011-06-20 14:34:11 -07:00
Stephen Hemminger cdf3585224 ip: addrlabel fix set never used warning 2011-06-20 14:33:55 -07:00
Eric Dumazet df39de8d24 ss: fix autobound filter
Fixes following error. We currently provide garbage data to kernel, that
can abort the validation process or produce unexpected results.

$ ss -a autobound
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port
TCPDIAG answers: Invalid argument

After patch:

$ misc/ss -a autobound
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port
LISTEN     0      128                     *:44624                    *:*
ESTAB      0      0            192.168.1.21:47141        74.125.79.109:imaps

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-06-20 14:31:51 -07:00
Stephen Hemminger bcd7abddd4 tc filter: fix dport/sport in pretty print output
Problem reported by Peter Lebbing on Debian.
The decode of source and destination port filters in pretty print
mode was backwards.
2011-05-19 09:19:17 -07:00
Eric Dumazet f78e316f25 ip: Support IFLA_TXQLEN in ip link command
Eric Dumazet a écrit :
> We currently use an expensive ioctl() to get device txqueuelen, while
> rtnetlink gave it to us for free. This patch speeds up ip link operation
> when many devices are registered.
>

Here is a 2nd version od this patch, not displaying "qlen 0" useless info

[PATCH iproute2] ip: Support IFLA_TXQLEN in ip link show command

We currently use an expensive ioctl() to get device txqueuelen, while
rtnetlink gave it to us for free. This patch speeds up ip link operation
when many devices are registered.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-05-12 08:55:49 -07:00
John Fastabend 892eba309f iproute2: improve mqprio inputs for queue offsets and counts
This changes mqprio input format to be more user friendly.

Old usage,

 # ./tc/tc qdisc add dev eth3 root mqprio help
Usage: ... mqprio [num_tc NUMBER] [map P0 P1...]
                  [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1|0]

New usage,

 # ./tc/tc qdisc add dev eth3 root mqprio help
Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
                  [queues count1@offset1 count2@offset2 ...] [hw 1|0]

Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2011-04-26 14:59:32 -07:00
Stephen Hemminger 4d91e4f168 Merge branch 'for-2.6.39' of /home/shemminger/iproute2-net-next
Conflicts:
	include/linux/xfrm.h
	ip/iplink.c
2011-04-12 14:42:20 -07:00
Stephen Hemminger 242b8da71b Use INIT_NETDEV_GROUP
Now that headers are sanitized, use the define.
2011-04-12 14:40:14 -07:00
Ulrich Weber c0635644cd iproute2: parse flag XFRM_POLICY_ICMP
parse flag XFRM_POLICY_ICMP

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2011-04-12 14:38:32 -07:00
Stephen Hemminger 7b032a1f77 Update README information
Change url's and describe current kernel header values.
2011-04-12 14:30:11 -07:00
John Fastabend 914953046a iproute2: tc add mqprio qdisc support
Add mqprio qdisc support. Output matches the following,

qdisc mq 0: dev eth1 root
qdisc mq 0: dev eth2 root
qdisc mqprio 8001: dev eth3 root  tc 8 map 0 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1
             queues:(0:7) (8:15) (16:23) (24:31) (32:39) (40:47) (48:55) (56:63)

And usage is,

Usage: ... mclass [num_tc NUMBER] [map P0 P1...]
                  [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1|0]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2011-04-12 14:28:19 -07:00
Brandon Philips 27b3f52444 doc: add pdf targets
Hello Stephen-

Here is one more patch that SUSE has been carrying.

Cheers, Brandon
2011-04-12 14:28:04 -07:00
Juliusz Chroboczek d7f3299d59 tc : SFB flow scheduler
Supports SFB qdisc (included in linux-2.6.39)

1) Setup phase : accept non default parameters

2) dump information

qdisc sfb 11: parent 1:11 limit 1 max 25 target 20
  increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600000ms 60000ms)
 Sent 47991616 bytes 521648 pkt (dropped 549245, overlimits 549245 requeues 0)
 rate 7193Kbit 9774pps backlog 0b 0p requeues 0
  earlydrop 0 penaltydrop 0 bucketdrop 0 queuedrop 549245 childdrop 0 marked 0
  maxqlen 0 maxprob 0.00000 avgprob 0.00000

Signed-off-by: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-12 14:27:37 -07:00
Stephen Hemminger 876cd7fa10 Add README.devel 2011-04-12 14:24:15 -07:00
Stephen Hemminger 59a935d204 Update email address of netem 2011-04-12 14:24:01 -07:00
Brandon Philips 1f7190db39 ip: fix memory leak in ipmaddr.c
If the continue is taken, then there is a memory leak.

https://bugzilla.novell.com/show_bug.cgi?id=538996

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Brandon Philips <bphilips@suse.de>
2011-04-12 14:23:52 -07:00
Stephen Hemminger d7ac9ad4f4 Fix warning in u32 from assignment in conditional 2011-04-12 14:23:39 -07:00
Stephen Hemminger 8988b02ee1 Fix snprintf with non format
snprintf was being called with environment variable.
If variable had format string (like %s) then program would crash.
2011-04-12 14:23:27 -07:00
Stephen Hemminger 38c867d2a8 Add checks for fgets() when reading proc
If expected proc headers are missing, catch and print error.
2011-04-12 14:23:17 -07:00
Stephen Hemminger 46dc73a57d Add no-strict-aliasing to genl
The genl code uses constructs which violate the strict aliasing
constraints of gcc 4.4. Disable the optimization to avoid warnings
and potential breakage.
2011-04-12 14:23:06 -07:00
Stephen Hemminger 21cfb5e1d9 update to 2.6.39-rc3 headers 2011-04-12 14:20:01 -07:00
Vlad Dogaru ac694c333f iproute2: support listing devices by group
User can specify device group to list by using the group keyword:

	ip link show group test

If no group is specified, 0 (default) is implied.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-04-12 14:18:05 -07:00
Stephen Hemminger 77d1e6ab84 v2.6.38.1 2011-03-17 10:05:47 -07:00
Nicolas Dichtel aba383448c iproute2: allow to specify truncation bits on auth algo
Hi,

here is a patch against iproute2 to allow user to set a state with a specific
auth length.

Example:
$ ip xfrm state add src 10.16.0.72 dst 10.16.0.121 proto ah spi 0x10000000
auth-trunc "sha256" "azertyuiopqsdfghjklmwxcvbn123456" 96 mode tunnel
$ ip xfrm state
src 10.16.0.72 dst 10.16.0.121
         proto ah spi 0x10000000 reqid 0 mode tunnel
         replay-window 0
         auth-trunc hmac(sha256)
0x617a6572747975696f707173646667686a6b6c6d77786376626e313233343536 96
         sel src 0.0.0.0/0 dst 0.0.0.0/0

Regards,
Nicolas

>From 522ed7348cdf3b6f501af2a5a5d989de1696565a Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 23 Dec 2010 06:48:12 -0500
Subject: [PATCH] iproute2: allow to specify truncation bits on auth algo

Attribute XFRMA_ALG_AUTH_TRUNC can be used to specify
truncation bits, so we add a new algo type: auth-trunc.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2011-03-17 10:02:02 -07:00
Vlad Dogaru 2c19bf6aaf iproute2: fix man page whitespace
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-03-17 10:01:37 -07:00
Gerrit Renker db6b0cfa51 iproute: rename 'get_jiffies' since it uses msecs
The get_jiffies() function retrieves rtt-type values in units of
milliseconds. This patch updates the function name accordingly,
following the pattern given by dst_metric() <=> dst_metric_rtt().
2011-03-17 10:01:22 -07:00
Gerrit Renker fca1dae821 iproute: fix unit conversion of rtt/rttvar/rto_min
Since July 2008 (2.6.27, c1e20f7c8b9), the kernel stores the values for
RTAX_{RTT{,VAR},RTO_MIN} in milliseconds. When using a kernel > 2.6.27 with
the current iproute2, conversion of these values is broken in either way.

This patch
 * updates the code to pass and retrieve milliseconds;
 * since values < 1msec would be rounded up, also drops the usec/nsec variants;
 * since there is no way to query kernel HZ, also drops the jiffies variant.

Arguments such as
	rtt		3.23sec
	rto_min		0xff
	rto_min		0.200s
	rttvar		25ms
now all work as expected when reading back previously set values.
2011-03-17 10:01:09 -07:00
Gerrit Renker 897fb84fd9 utils: get_jiffies always uses base=0
get_jiffies() is in all places called in the same manner, with base=0;
simplify argument list by putting the constant value into the function.
2011-03-17 10:00:43 -07:00
Joy Latten 4bb75da2d0 xfrm security context support
Adds security context support to ip xfrm state.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
2011-03-17 10:00:21 -07:00
Joy Latten e5055b591b xfrm security context support
Adds security context support to ip xfrm policy.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
2011-03-17 10:00:07 -07:00
Joy Latten 2c319e1ab7 xfrm security context support
In the Linux kernel, ipsec policy and SAs can include a
security context to support MAC networking. This feature
is often referred to as "labeled ipsec".

This patchset adds security context support into ip xfrm
such that a security context can be included when
add/delete/display SAs and policies with the ip command.
The user provides the security context when adding
SAs and policies. If a policy or SA contains a security
context, the changes allow the security context to be displayed.

For example,
ip xfrm state
src 10.1.1.6 dst 10.1.1.2
	proto esp spi 0x00000301 reqid 0 mode transport
	replay-window 0
	auth hmac(digest_null) 0x3078
	enc cbc(des3_ede) 0x6970763672656164796c6f676f33646573636263696e3031
	security context root:system_r:unconfined_t:s0

Please  let me know if all is ok with the patchset.
Thanks!!

regards,
Joy

Signed-off-by:  Joy Latten <latten@austin.ibm.com>
2011-03-17 09:58:23 -07:00
Sridhar Samudrala f0612d566b macvlan/macvtap: support 'passthru' mode
Add support for 'passthru' mode when creating a macvlan/macvtap device
which allows takeover of the underlying device and passing it to a KVM
guest using virtio with macvtap backend.

Only one macvlan device is allowed in passthru mode and it inherits
the mac address from the underlying device and sets it in promiscuous
mode to receive and forward all the packets.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2011-03-16 17:01:58 -07:00
Stephen Hemminger fcae78992c v2.6.38 2011-03-15 19:27:36 -07:00
Diego Elio Pettenò 2230ac1d18 Remove -L flags from link
While the previous code was supposed to work nonetheless, it could be
messed up if further -L were used in LDFLAGS to list the path where glibc's
libutil was to be found.

References: https://bugs.gentoo.org/347489

Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com>
2011-03-09 10:18:03 -08:00
Nicolas Dichtel 98f5519cd9 iproute2: add support of flag XFRM_STATE_ALIGN4
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2011-03-02 11:50:09 -08:00
Stephen Hemminger d5b7420a26 Remove #ifdef's
The iproute package keeps its own headers so there is no need
of polluting code with #ifdef's
2011-02-25 20:00:54 -08:00
Jiri Pirko a1e191b90c iplink: implement setting of master devic 2011-02-25 19:55:19 -08:00
Nicolas Dichtel f323f2a32c iproute2: allow to specify truncation bits on auth algo
Hi,

here is a patch against iproute2 to allow user to set a state with a specific
auth length.

Example:
$ ip xfrm state add src 10.16.0.72 dst 10.16.0.121 proto ah spi 0x10000000
auth-trunc "sha256" "azertyuiopqsdfghjklmwxcvbn123456" 96 mode tunnel
$ ip xfrm state
src 10.16.0.72 dst 10.16.0.121
         proto ah spi 0x10000000 reqid 0 mode tunnel
         replay-window 0
         auth-trunc hmac(sha256)
0x617a6572747975696f707173646667686a6b6c6d77786376626e313233343536 96
         sel src 0.0.0.0/0 dst 0.0.0.0/0

Regards,
Nicolas

>From 522ed7348cdf3b6f501af2a5a5d989de1696565a Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 23 Dec 2010 06:48:12 -0500
Subject: [PATCH] iproute2: allow to specify truncation bits on auth algo

Attribute XFRMA_ALG_AUTH_TRUNC can be used to specify
truncation bits, so we add a new algo type: auth-trunc.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2011-02-25 19:52:12 -08:00
Vlad Dogaru 678b99ee6d iproute2: fix man page whitespace
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-02-25 13:01:19 -08:00
Eric Dumazet f3f28c2126 sfq: add divisor support
In 2.6.39, we can build SFQ queues with a given hash table size,
2011-02-25 12:59:53 -08:00
Gerrit Renker 81d03dc356 iproute: rename 'get_jiffies' since it uses msecs
The get_jiffies() function retrieves rtt-type values in units of
milliseconds. This patch updates the function name accordingly,
following the pattern given by dst_metric() <=> dst_metric_rtt().
2011-02-25 12:54:37 -08:00
Gerrit Renker 9b2cdc00da iproute: fix unit conversion of rtt/rttvar/rto_min
Since July 2008 (2.6.27, c1e20f7c8b9), the kernel stores the values for
RTAX_{RTT{,VAR},RTO_MIN} in milliseconds. When using a kernel > 2.6.27 with
the current iproute2, conversion of these values is broken in either way.

This patch
 * updates the code to pass and retrieve milliseconds;
 * since values < 1msec would be rounded up, also drops the usec/nsec variants;
 * since there is no way to query kernel HZ, also drops the jiffies variant.

Arguments such as
	rtt		3.23sec
	rto_min		0xff
	rto_min		0.200s
	rttvar		25ms
now all work as expected when reading back previously set values.
2011-02-25 12:51:48 -08:00
Gerrit Renker 94089ef772 utils: get_jiffies always uses base=0
get_jiffies() is in all places called in the same manner, with base=0;
simplify argument list by putting the constant value into the function.
2011-02-25 12:49:42 -08:00
Joy Latten 0c7a594541 xfrm security context support
Adds security context support to ip xfrm state.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
2011-02-25 12:45:58 -08:00
Joy Latten e4f054f017 xfrm security context support
Adds security context support to ip xfrm policy.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
2011-02-25 12:45:49 -08:00
Joy Latten b2bb289a57 xfrm security context support
In the Linux kernel, ipsec policy and SAs can include a
security context to support MAC networking. This feature
is often referred to as "labeled ipsec".

This patchset adds security context support into ip xfrm
such that a security context can be included when
add/delete/display SAs and policies with the ip command.
The user provides the security context when adding
SAs and policies. If a policy or SA contains a security
context, the changes allow the security context to be displayed.

For example,
ip xfrm state
src 10.1.1.6 dst 10.1.1.2
	proto esp spi 0x00000301 reqid 0 mode transport
	replay-window 0
	auth hmac(digest_null) 0x3078
	enc cbc(des3_ede) 0x6970763672656164796c6f676f33646573636263696e3031
	security context root:system_r:unconfined_t:s0

Please  let me know if all is ok with the patchset.
Thanks!!

regards,
Joy

Signed-off-by:  Joy Latten <latten@austin.ibm.com>
2011-02-25 12:45:36 -08:00
Vlad Dogaru db02608b6f iproute2: support device group semantics
Add the group keyword to ip link set, which has the following meaning:
If both a group and a device name are pressent, we change the device's
group to the specified one. If only a group is present, then the
operation specified by the rest of the command should apply on an entire
group, not a single device.

So, to set eth0 to the default group, one would use
	ip link set dev eth0 group default

Conversely, to set all the devices in the default group down, use
	ip link set group default down

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-02-25 12:43:14 -08:00
Vlad Dogaru 26ad3aecfe iproute2: support device group semantics
Add the group keyword to ip link set, which has the following meaning:
If both a group and a device name are pressent, we change the device's
group to the specified one. If only a group is present, then the
operation specified by the rest of the command should apply on an entire
group, not a single device.

So, to set eth0 to the default group, one would use
	ip link set dev eth0 group default

Conversely, to set all the devices in the default group down, use
	ip link set group default down

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-02-25 12:43:07 -08:00
Vlad Dogaru f960c92aac iproute2: support listing devices by group
User can specify device group to list by using the group keyword:

	ip link show group test

If no group is specified, 0 (default) is implied.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
2011-02-25 12:38:50 -08:00
Stephen Hemminger dd4b25a0f9 Merge branch 'choke' into net-next
Conflicts:
	include/linux/pkt_sched.h
2011-02-25 12:35:01 -08:00
Stephen Hemminger 08dc32e130 update to net-next (2.6.39) headers 2011-02-25 12:34:00 -08:00
Stephen Hemminger 1551198b40 Merge branch 'master' into choke 2011-02-20 12:35:40 -08:00
Stephen Hemminger a4eca97cff CHOKe scheduler
TC commands for CHOKe qdisc
2011-01-31 09:09:50 -08:00
Stephen Hemminger 1598b9ef7b Revert "iproute2: add VF_PORT support"
This reverts commit 632110aa0d.

There seem to be some recent changes in the 802.1Qbh/bg specs which may
result in changes to this patch in the near future. It seems like its best
to ignore this patch for now.
I will re-spin at a later time when the changes in the specs converge.

BTW, Please let me know if I should CC netdev list and others on the
original email. I can resend this email.

Thanks,
Roopa
2011-01-13 14:53:02 -08:00
Roopa Prabhu 632110aa0d iproute2: add VF_PORT support
Resubmitting Scott Feldmans original patch with below changes

- Fix port profile strlen which was off by 1
- Added function to convert IFLA_PORT_RESPONSE codes to string

Add support for IFLA_VF_PORTS.  VF port netlink msg layout is

        [IFLA_NUM_VF]
        [IFLA_VF_PORTS]
                [IFLA_VF_PORT]
                        [IFLA_PORT_*], ...
                [IFLA_VF_PORT]
                        [IFLA_PORT_*], ...
                ...
        [IFLA_PORT_SELF]
                [IFLA_PORT_*], ...

The iproute2 cmd line for link set is now:

Usage: ip link add link DEV [ name ] NAME
                   [ txqueuelen PACKETS ]
                   [ address LLADDR ]
                   [ broadcast LLADDR ]
                   [ mtu MTU ]
                   type TYPE [ ARGS ]
       ip link delete DEV type TYPE [ ARGS ]

       ip link set DEVICE [ { up | down } ]
                          [ arp { on | off } ]
                          [ dynamic { on | off } ]
                          [ multicast { on | off } ]
                          [ allmulticast { on | off } ]
                          [ promisc { on | off } ]
                          [ trailers { on | off } ]
                          [ txqueuelen PACKETS ]
                          [ name NEWNAME ]
                          [ address LLADDR ]
                          [ broadcast LLADDR ]
                          [ mtu MTU ]
                          [ netns PID ]
                          [ alias NAME ]
                          [ port MODE { PROFILE | VSI } ]
                          [ vf NUM [ mac LLADDR ]
                                   [ vlan VLANID [ qos VLAN-QOS ] ]
                                   [ rate TXRATE ]
                                   [ port MODE { PROFILE | VSI } ] ]
       ip link show [ DEVICE ]

TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }
MODE := { assoc | preassoc | preassocrr | disassoc }
PROFILE := profile PROFILE
           [ instance UUID ]
           [ host UUID ]
VSI := vsi mgr MGRID type VTID ver VER
       [ instance UUID ]

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
2011-01-13 14:50:46 -08:00
Stephen Hemminger 9351fec72d Update to lasest kernel headers 2011-01-12 18:46:54 -08:00
Stephen Hemminger f2c45d7050 v2.6.37 2011-01-07 09:54:30 -08:00
Stephen Hemminger 8552b387df Update to 2.6.37-rc8 headers
Use sanitized headers from 2.6.37-rc8
2010-12-29 15:05:48 -08:00
Petr Sabata 5c68fc88c5 ip: Few typo and grammar errors fixes for ip(8) manpage 2010-12-16 08:30:26 -08:00
Stephen Hemminger 4b3385f6c5 Cleanup ll_map
In preparation for adding name hash:
  * add const
  * use same types in cache as ifinfomsg
  * rename idxmap to ll_cache
2010-12-10 11:58:09 -08:00
Octavian Purdila 3056423728 iproute2: initialize the ll_map only once
Avoid initializing the LL map (which involves a costly RTNL dump)
multiple times. This can happen when running in batch mode.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
2010-12-10 11:37:57 -08:00
Stephen Hemminger 1e21ea71a7 Increase size of ifindex hash heads
The default of 16 is too small for users with 10,000 interfaces.
2010-12-10 09:46:24 -08:00
Gerrit Renker 1da5f6b2ca tc-red: typo in man page 2010-12-09 09:39:47 -08:00
Petr Sabata d98e300c33 ss: Change "do now" to "do not" in ss(8), -n option
A small typo fix.
2010-12-08 07:55:34 -08:00
Dan Smith f4ff11e3e2 Add ip route save/restore
This patch adds save and restore commands to "ip route". Save dumps
the RTNL stream to stdout which can then be passed to restore later.
This may be helpful in some normal situations, and will allow C/R to
migrate the routing information in userspace.  Tweaking of the stream
can be done by userspace helpers to convert between versions and adjust
things like device indexes when restoring routes in a different
environment.

By factoring out some of the common bits of print_route() into
filter_nlmsg(), the "save" command can use the same selection logic
as "list," allowing the caller to save only specific routes as
necessary.

The only change since the RFC is the addition of manpage and doc
material.

Signed-off-by: Dan Smith <danms@us.ibm.com>
2010-12-01 11:24:58 -08:00
Gregoire Baron 3822cc986c tc: add ACT_CSUM action support (csum)
Add the iproute2 support for the ACT_CSUM action. Can be used as
following, certainly in conjunction with the ACT_PEDIT action (pedit):

 # In order to DNAT (stateless) IPv4 packet from 192.168.1.100 to
 #  0x12345678 (18.52.86.120), and update the IPv4 header checksum and
 #  the UDP checksum (the last one, only if the packet is UDP).
tc filter add eth0 prio 1 protocol ip parent ffff: \
  u32 match ip src 192.168.1.100/32 flowid :1 \
    action pedit munge offset 16 u32 set 0x12345678 \
      pipe csum ip and udp

 # In order to alter destination address of IPv6 TCP packets from fc00::1
 #  and correct the TCP checksum (nothing happened? except maybe for
 #  checksums in the TCP payload ...).
tc filter add eth0 prio 1 protocol ipv6 parent ffff: \
  u32 match ip6 src fc00::1/128 match ip6 protocol 0x06 0xff flowid :1 \
    action pedit munge offset 24 u32 set 0x12345678 \
      pipe csum tcp
2010-12-01 11:17:46 -08:00
Ben Greear 64c7956061 Allow 'ip addr flush' to loop more than 10 times
The default remains at 10 for backwards compatibility.

For instance:
 # ip addr flush dev eth2
 *** Flush remains incomplete after 10 rounds. ***
 # ip -l 20 addr flush dev eth2
 *** Flush remains incomplete after 20 rounds. ***
 # ip -loops 0 addr flush dev eth2
 #

This is useful for getting rid of large numbers of IP
addresses in scripts.

Signed-off-by: Ben Greear <greearb@candelatech.com>
2010-12-01 11:13:51 -08:00
Sridhar Samudrala 3f0a7b4c4f Support 'mode' parameter when creating macvtap device
Add support for 'mode' parameter when creating a macvtap device.
This allows a macvtap device to be created in bridge, private or
the default vepa modes.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

-------------------------------------------------------------------
Acked-by: Arnd Bergmann <arnd@arndb.de>
2010-11-30 10:01:41 -08:00
Andreas Schwab f66efadd79 iproute2: remove useless use of buffer
Print directly to the file instead of going through a buffer.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
2010-11-30 09:59:11 -08:00
Changli Gao 7162c92148 iproute2: tc: f_flow: add key rxhash
We can use rxhash to classify the traffic into flows. As rxhash maybe
supplied by NIC or RPS, it is cheaper.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
2010-11-30 09:57:36 -08:00
Timo Teräs 4a9608e6ae iproute2: support xfrm upper protocol gre key
Similar to tunnel side: accept dotted-quad and number formats.
Use regular number for printing the key.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
2010-11-30 09:53:23 -08:00
Timo Teräs 6f4f7c464a iproute2: treat gre key as number
Print GRE key as a regular number. It is not really an IPv4 address
and this is also how Cisco and Juniper treats GRE keys. Do keep the
parsing of dotted-quad format for backwards compatibility.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
2010-11-30 09:52:32 -08:00
Mike Frysinger be3c4d4f3c m_xt: stop using xtables_set_revision()
iptables dropped the xtables_set_revision() function around version 1.4.9,
so set the rev directly ourselves.  This should be compatible back to the
original version m_xt itself is designed for.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-11-30 09:48:38 -08:00
Petr Sabata 5d8056357a ss(8) improvements by Jiri Popelka <jpopelka@redhat.com> 2010-11-29 09:19:44 -08:00
Stephen Hemminger ea71beacac Use standard routines for interface name to index etc
Use the available libraries for mapping from interface index to name
or type. This should speed up display with lots of interfaces
2010-11-28 10:35:28 -08:00
Stephen Hemminger 82408fc17d Workaround for repeated distclean
The subdirectory makefiles need Config file to exist.
Therefore create it, then run make clean, then remove it.
2010-11-18 15:25:38 -08:00
Stephen Hemminger 3f5c1a01e6 Update to 2.6.36 headers
Use santized headers from 2.6.36 release
2010-10-20 17:38:04 -07:00
Ulrich Weber 66abc09072 iproute2: display xfrm socket policy direction
display socket policy direction

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2010-09-13 08:23:01 -07:00
Stephen Hemminger cb4bd0ec8d Fix GRED options clearing
Bug reported where priorities of GRED DP's are ignored.
The option parsing sets opt then memset was clearing these
values.
2010-08-25 09:04:55 -07:00
Eric Dumazet a571587d0b iproute2: add 64bit support to ifstat
Le lundi 23 août 2010 à 10:33 -0700, Stephen Hemminger a écrit :

> I think this breaks the wraparound detection code in this command.
>
>

OK lets fix the bug only, before adding 64bit counters capabilities.

Thanks

[PATCH] iproute2: add 64bit arches support to ifstat

ifstat assumes IFLA_STATS fields are "unsigned long", but they are
__u32. This fix is needed to let ifstat run on 64bit arches.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-08-23 13:05:12 -07:00
Stephen Hemminger daf7bd5c73 Use correct rt_link_statistics
In recent kernels, net_device_stats is not exposed and the code
shoulf have used rt_link_statistics.  Also, fix use of sprintf
with user supplied value.
2010-08-23 09:13:05 -07:00
Eric Dumazet b0373bfbbc ip: add RTA_MARK support
Adds support for RTA_MARK rt attribute added in linux-2.6.36

$ ip route get ADDR mark 4
192.168.20.110 dev eth1  src 192.168.20.108  mark 4
    cache  mtu 1500 advmss 1460 hoplimit 64

$ ip route get 192.168.20.108 from ADDR iif STRING mark 256
local 192.168.20.108 from 192.168.20.110 dev lo  src 192.168.20.108  mark 0x100
    cache <local,src-direct>  iif eth1

$ ip route list cache [ADDR] mark NUMBER

Hexadecimal output if mark >= 16
null marks are not displayed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-08-23 08:41:25 -07:00
Stephen Hemminger 04a9fc0a50 Update kernel headers to 2.6.36-rc2 2010-08-23 08:35:08 -07:00
Stephen Hemminger daa10c8af6 Snapshot for 2.6.35.1 2010-08-23 08:14:38 -07:00
Ulrich Weber c73f3e02f8 iproute2: dont filter cached routes on iproute_get
iproute_get will return cloned routes for IPv4
and cloned as well non-cloned routes for IPv6.

Therefore RTM_F_CLONED flag should not be checked
for iproute_get routes. Check in print_route will
always fail because valid values are 0 and 1.

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2010-08-23 08:13:35 -07:00
Ben Greear 3bc1c4f297 iproute2: Fix filtering related to flushing IP addresses.
The old 'ip addr flush' logic had several flaws:

* It reversed logic for primary v/s secondary flags
  (though, it sort of worked right anyway)

* The code tried to remove secondaries and then primaries,
  but in practice, it always removed one primary per loop,
  which not at all efficient.

* The filter logic in the core would run only the first
  filter in most cases.

* If you used '-s -s', the ifa_flags member would be
  modified, which could make future filters fail
  to function fine.

This patch attempts to fix all of these issues.

Tested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Ben Greear <greearb@candelatech.com>
2010-08-23 08:10:32 -07:00
Stephen Hemminger a130b49b6c snapshot 100804 2010-08-04 10:45:59 -07:00
Stephen Hemminger e3d153c1fb Fix byte order of ether address match for u32
The u32 key match was incorrect byte order when using ether source
or destination address matching.
2010-08-02 11:55:30 -07:00
Andreas Henriksson 02833d1b38 tc: make symbols loaded from tc action modules global.
Fixes problems with xtables based MARK target ("ipt" module).
When tc loads the "ipt" (xt) module it kept the symbols local,
this made loading of libxtables not find the required struct.

currently ipt/xt is the only tc action module.
iproute2 never seem to do dlclose.
hopefully the modules doesn't export more symbols then needed.

In this situation hopefully the RTLD_GLOBAL flag won't hurt us.

I've been using this patch in the Debian package of iproute for
the last 3 weeks and noone has complained.
( This fixes http://bugs.debian.org/584898 )

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2010-08-02 09:54:59 -07:00
Steve Fink fbc0f876fa ss -p is much too slow
> On closer inspection, it appears that ss -p does a quadratic scan. It
> rescans every entry in /proc/*/fd/* repeatedly (once per listening
> port? per process? I don't remember what I figured out.)
>
> I humbly suggest that this is not a good idea.

Yep, this is junk.  Please give this patch a try:

ss: Avoid quadradic complexity with '-p'

Scan the process list of open sockets once, and store in a hash
table to be used by subsequent find_user() calls.

Reported-by: Steve Fink <sphink@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-31 19:31:12 -07:00
Mike Frysinger 1a7943bcf3 netem: fix installs of dist files
The tc program searches LIBDIR by default for the .dist files, and that
defaults to /usr/lib.  But the netem subdir has /lib/ hardcoded which
means the default build+install results in the files not being found.

Further, these are plain text files which are read at runtime, so it
doesn't make sense to give them executable bits.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-07-31 19:31:04 -07:00
Stephen Hemminger 4b45abd1f0 Fix NULL pointer reference when using basic match
If basic match has no tree of matches underneath
then print_ematch would core dump.
2010-07-29 18:03:35 -07:00
Stephen Hemminger 4dbda0f482 Update ARP header type table
Add all current values. Since if_arp.h is included, get rid
of ifdefs'. Make table constant.
2010-07-23 13:12:12 -07:00
Mike Frysinger 9ec0e899e1 dnet: fix strict aliasing warnings
Recent gcc doesn't like it when you cast char pointers to uint16_t
pointers and then dereference it.  So use memcpy() instead and let
gcc take care of optimizing things away (when appropriate).  This
should also fix alignment issues on arches where gcc packs the char
pointer tighter than 16bits.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-07-23 12:30:48 -07:00
Petr Lautrbach 0156412215 iproute: fix tc generating ipv6 priority filter
This patch adds ipv6 filter priority/traffic class function
static int parse_ip6_class(int *argc_p, char ***argv_p, struct tc_u32_sel *sel)
shifting filter value to 5th bit and ignoring "at" as header position
is exactly given.

Signed-off-by: Petr Lautrbach <plautrba@redhat.com>
2010-07-23 12:29:35 -07:00
Mike Frysinger bf512683e0 tc: revert "echo" in install target
The recent commit "iproute2: add option to build m_xt as a tc module"
(ab814d6355) looks like it wrongly included debug changes in the
install target.  So drop the `echo` so the tc binary actually gets
installed again.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-07-23 12:28:25 -07:00
Arnd Hannemann dec01609dc iproute2: Add dsfield as alias for tos for ip rules
Get ip rule parsing of "dsfield" in sync with ip route parsing and manual page.

Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
2010-07-23 12:27:14 -07:00
Ben Greear 0d1c9b570a iproute2: Fix batch-mode for mrules.
The do_multirule logic was broken in batch mode because
it expected the preferred_family to be AF_INET or AF_INET6,
but it then assigned it to RTNL_FAMILY_IPMR or RTNL_FAMILY_IP6MR.
So, the next iteration of the batch processing, it failed
the check for AF_INET or AF_INET6.

Signed-off-by: Ben Greear <greearb@candelatech.com>
2010-07-23 09:03:12 -07:00
Ulrich Weber 62011a0b31 iproute2: use int instead of long for RTAX_HOPLIMIT compare
otherwise "if ((int)val == -1)" will never match on 64 bit systems

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2010-07-23 09:01:01 -07:00
Ulrich Weber 2eca8d3d3e iproute2: use get_user_hz() for IPv6 print_route
as already done in IPv4 and metrics code part

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2010-07-23 09:01:01 -07:00
Ulrich Weber 447928279c iproute2: filter routing entries based on clone flag
Before IPv6 routing cache entries were always displayed
if additional tables beside MAIN and LOCAL are installed.

Signed-off-by: Ulrich Weber <uweber@astaro.com>
2010-07-23 09:01:01 -07:00
Patrick McHardy b6c8e808fc ip: add support for multicast rules
commit 44a5293c1c47b8c32d9bb0756660ea5d4802acf2
Author: Patrick McHardy <kaber@trash.net>
Date:   Tue Apr 13 17:03:47 2010 +0200

    ip: add support for multicast rules

    Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09 08:49:24 -07:00
Jan Engelhardt 8864ac9dc5 Add IFLA_STATS64 support
`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds recognition for the new IFLA_STATS64
attribute that exports them in full 64 bit.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-19 09:06:03 -07:00
Stephen Hemminger b4397f580e Update kernel derived headers
Version for 2.6.35 version on -next
2010-05-19 08:58:13 -07:00
Stephen Hemminger d248a8fe23 v2.6.34 2010-05-19 08:32:43 -07:00
Stephen Hemminger 704f4df477 ip: add documentation for initrwnd
Cloned from ip-cref.tex
2010-05-19 08:30:09 -07:00
Brian Bloniarz 6299857dd5 ip: document initcwnd
Mention initcwnd in ip(8). Text taken from doc/ip-cref.tex.

Signed-off-by: Brian Bloniarz <bmb@athenacr.com>
2010-05-19 08:25:24 -07:00
Chris Wright 3fd8663087 iproute2: rework SR-IOV VF support
The kernel interface changed just before 2.6.34 was released.  This brings
iproute2 in line with the current changes.  The VF portion of setlink is
comprised of a set of nested attributes.

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2010-05-18 08:12:31 -07:00
Stephen Hemminger df33d7a489 Add documentation for ip link add/delete sub-commands
Add some missing pieces. Still need to add doucmentation for rest
of vlan arguments.
2010-05-17 09:50:17 -07:00
Stephen Hemminger a171395410 Update kernel headers to 2.6.34 final version
Last minute IOV format change.
2010-05-17 08:57:24 -07:00
Florian Westphal 24abb62ee7 iproute2: fix addrlabel interface names handling
ip addrlabel outputs if%d names due to missing init call:
$ ip addrlabel s
prefix a::42/128 dev if4 label 1000

Also, ip did not accept "if%d" interfaces on input.

Signed-off-by: Florian Westphal <fw@strlen.de>
2010-05-13 09:23:46 -07:00
Bart Trojanowski 608a96c727 fix build issues with flex ver 2.5
When building on an old environment, the flex generated
tc/emp_ematch.lex.c file would not compile.  The error given was:

emp_ematch.lex.c:1686: error: expected ‘;’, ‘,’ or ‘)’ before numeric constant

The emp_ematch.l uses 'str' as a start symbol name, and  flex would create
a '#define str 1' statement.  This particular version of flex,
unfortunately, used 'str' as names of string variables in the generated
parser functions.  This is line 1686 in the generated file:

YY_BUFFER_STATE ematch__scan_string (yyconst char * str )

This patch just substitutes 'str' for 'lexstr' in emp_ematch.l to avoid
the collision.
2010-04-22 15:27:42 -07:00
Stephen Hemminger 4ec1933dfd Update ip.8 man page to describe route table id values
2.6 kernel allows 2^32 route tables, but documentation stated only
255 values were possible.
2010-04-22 15:24:37 -07:00
Alexandre Cassen 3979ef91de Detect 6rd kernel missing support / 6rd tunnel scope
This patch fix two issues:

* If kernel is not supporting 6rd then ioctl() call
  will return EINVAL, if so just skip perror call.

* 6rd scope is ipv6/ip tunnels. Dont try to fetch
  6rd tunnel parms if tunnel protocol != IPPROTO_IPV6.

Signed-off-by: Alexandre Cassen <acassen@freebox.fr>
2010-04-12 11:45:51 -07:00
Andreas Henriksson ab814d6355 iproute2: add option to build m_xt as a tc module (v3)
This will build the xt module (action ipt) of tc as a
shared object that is linked at runtime by tc if used,
rather then built into tc.

This is similar to how the atm qdisc support
is handled (q_atm.so).

Signed-off-by: Andreas Henriksson <andreas@xxxxxxxx>
2010-04-12 11:40:29 -07:00
Stephen Hemminger edaaa11e5a Workaround missing ALIGN() macro. 2010-03-29 17:37:49 -07:00
Stephen Hemminger 1b84ad557e Remove mirred debug message
Other commands are quiet if successful. mirred action had leftover
debug message.
2010-03-29 17:32:37 -07:00
Stephen Hemminger 609ceb807d Workaround missing ALIGN() macro
XT_ALIGN() calls ALIGN macro but ALIGN is in kernel source not userspace.
2010-03-29 15:17:48 -07:00
Stephen Hemminger 8881ece54f Update to 2.6.34-rc2 headers 2010-03-29 15:13:14 -07:00
Stephen Hemminger f411a6289e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/shemminger/iproute2 2010-03-29 15:11:24 -07:00
Andreas Henriksson 12ddfff76c iproute2: detect iptables modules dir in configure.
Try to automatically detect iptables modules directory.

Make the configure script look for iptables modules.
This also makes it possible to specify it on the
command line while building via "make IPT_LIB_DIR=/foo/bar".

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2010-03-29 15:10:20 -07:00
Jan Engelhardt 800b444016 ip: correctly report tunnel link type
Up until now, "tun" tunnels were displayed as link/[65534].

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-29 15:03:19 -07:00
YOSHIFUJI Hideaki / 吉藤英明 697af1fcc6 gaiconf: /etc/gai.conf configuration helper.
This tool reads /etc/gai.conf, configuration for getaddrinfo(3), and
set up kernel parameter.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-03-29 13:59:28 -07:00
Michele Petrazzo - Unipex 1db61e022d Continue after errors in -batch
Allow ip to process all the file passed with the -batch argument when
is passed also the -force switch

Signed-off-by: Michele Petrazzo <michele.petrazzo@unipex.it>
2010-03-09 07:50:19 -08:00
Stephen Hemminger 33ff9324de Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2010-03-04 08:31:13 -08:00
Wolfgang Grandegger 8a5179466a iproute2: netlink support for bus-error reporting and counters
This patch uses the new features of the kernel's netlink CAN interface
making the bus-error reporting configurable and allowing to retrieve
the CAN TX and RX bus error counters via netlink interface. Here is the
output of my test session showing how to use them:

# ip link set can0 up type can bitrate 500000 berr-reporting on
# ip -d -s link show can0
2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0) restart-ms 0
                              CAN bus error counter values ^^^^^^^^^^^
    bitrate 500000 sample-point 0.875
    tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 8000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          54101      0          1          1          0
    RX: bytes  packets  errors  dropped overrun mcast
    432808     54101    54101   0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

# ifconfig can0 down
# ip link set can0 up type can berr-reporting off
# candump -t d any,0:0,#FFFFFFFF
 (0.000000)  can0  20000004  [8] 00 08 00 00 00 00 60 00   ERRORFRAME
 (0.000474)  can0  20000004  [8] 00 20 00 00 00 00 80 00   ERRORFRAME
                                                   ^^ ^^
						    \  \___ rxerr
						     \_____ txerr

Furthermore, the missing support for one-shot mode has been added.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
2010-03-03 16:45:10 -08:00
Jamal Hadi Salim c90cda9400 xfrm: add support for SA by mark
Add support for SA manipulation by mark

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
2010-03-03 16:37:29 -08:00
Jamal Hadi Salim f6fd52e626 xfrm: Introduce xfrm by mark
This patch carries basic infrastructure.
You need to make sure that the proper include/linux/xfrm.h is included
for it to compile.

Example:
2010-03-03 16:37:28 -08:00
Jamal Hadi Salim ee675e8714 xfrm: policy by mark
Add support for SP manipulation by mark

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
2010-03-03 16:37:26 -08:00
jamal e906975a53 skbedit: use get_u32 for parsing mark
parsing a mark as a classid allows for acceptance of strange
informal input.

cheers,
jamal
commit aad0da6507ff8a95a63ed8e529c05f52be5b0e75
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date:   Mon Feb 15 06:45:29 2010 -0500

    skbedit: use get_u32 for parsing mark

    get_u32 is the more appropriate parser for a mark.

    Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
2010-03-03 16:35:30 -08:00
Williams, Mitch A 6e46ec813b libnetlink: Modify the parser to track first duplicated attributes
Modify the parser to keep track of the first of any duplicated attributes,
instead of the last. This is required for VF configuration reporting, where
multiple attributes of the same type are added sequentially.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2010-03-03 16:33:28 -08:00
Williams, Mitch A ae7229d5f9 ip: Add support for setting and showing SR-IOV virtual funtion link params
Add support to 'ip' for setting and showing SR-IOV virtual function
link parameters.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2010-03-03 16:33:26 -08:00
Williams, Mitch A 46dab6e925 Update man page to indicate current options
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2010-03-03 16:33:25 -08:00
Stephen Hemminger 3e4f6a380a Fix line numbering on batch commands
ip command should not keep track of lineno, that is done
in getcmdline().
2010-03-03 16:31:09 -08:00
Stephen Hemminger 8ecdcce083 Update headers for 2.6.33-net-next
Use santized headers from net-next tree.
2010-03-03 16:22:00 -08:00
laurent chavey f5fd80039f Add initrwnd to iproute2
Add initrwnd option parsing to iproute. This option uses the new
rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.

Signed-off-by: Laurent Chavey <chavey@google.com>
2010-03-03 16:19:47 -08:00
Hagen Paul Pfeifer f703129d34 tc: add new queue discipline: head drop fifo
This adds the required changes to gain access to
the head drop classfull queuing discipline named
pfifo_head_drop. In difference to pfifo or pfifo_fast
this queuing discipline will drop the first packet
in the case of queue congestion. As a result the queue
contain always the freshest packets.

To replace the current a root queueing discipline
for eth0:
$ tc qdisc replace dev eth0 root pfifo_head_drop

And show statistics:
$ tc -s qdisc show dev eth0

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2010-03-03 16:15:44 -08:00
Stephen Hemminger 7cd96eee69 iproute2-10224
Final 2.6.33 version
2010-02-24 19:56:50 -08:00
Alexandre Cassen b88215c468 IPv6: 6rd iproute2 support
This patch provides iproute2 facilities to configure 6rd tunnel. To
configure a 6rd tunnel, you need to configure a sit tunnel and set
6rd prefix as following :

  ip tunnel add sit1 mode sit local a.b.c.d ttl 64
  ip tunnel 6rd dev sit1 6rd-prefix xxxx:yyyy::/z

Optionally you can provide a relay prefix :

  ip tunnel 6rd dev sit1 6rd-relay_prefix e.f.g.h/i

Finally you can reset previous tunnel settings :

  ip tunnel 6rd dev sit1 6rd-reset

Signed-off-by: Alexandre Cassen <acassen@freebox.fr>
2010-02-09 14:01:57 -08:00
Brian Haley a1b9ffccc2 ip: print "temporary" for IPv6 temp addresses
IPv6 addresses that have IFA_F_SECONDARY set are actually temporary addresses,
hence the IFA_F_TEMPORARY equivalent.  Change the output in this case and
allow filtering on the word "temporary".

Signed-off-by: Brian Haley <brian.haley@hp.com>
2010-02-09 11:05:49 -08:00
Andreas Henriksson 63a0f20ac1 iproute2: drop equalize support
Currently you can configure "equalize" and it looks all fine and dandy.
The kernel has the interface defined, but apparently there's never actually
been any implementation for it (only a never merged patch in the 2.4 era).

I'm suggesting to drop the code to give any potential users of this feature
the benefit of receiving a proper error message. I see it unlikely that
this will be implemented in the near future, but if it ever happens
reviving the iproute2 side should be as easy as git revert this patch.

For more details see http://bugs.debian.org/149897
2010-02-09 10:58:51 -08:00
Stephen Hemminger a982e10a52 iproute2-100205 2010-02-05 12:02:38 -08:00
Florian Westphal 5080db330e tc: man: add man page for drr scheduler
With help from Patrick McHardy.

Signed-off-by: Florian Westphal <fw@strlen.de>
2010-01-21 11:27:23 -08:00
Florian Westphal 8d8de1139c tc: remove stale code
remove unused #define and "ok" statements.

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
2010-01-21 10:13:01 -08:00
Florian Westphal ddf216c863 tc: red, gred, tbf: more helpful error messages
$ tc qdisc add dev eth1 root tbf
RTNETLINK answers: Invalid argument

$ tc qdisc add dev eth1 root red
RTNETLINK answers: Invalid argument

with patch:
$ tc qdisc add dev eth1 root red
Required parameter (min, max, burst, limit, avpkt) is missing

$ tc qdisc add dev eth1 root tbf
Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS ...

Signed-off-by: Florian Westphal <fw@strlen.de>
2010-01-21 10:12:57 -08:00
Florian Westphal 9e318455a3 tc: man: SO_PRIORITY is described in socket documentation, not tc one
fix up reference: there is no tc(7) man page.

Signed-off-by: Florian Westphal <fw@strlen.de>
2010-01-21 10:12:54 -08:00
Florian Westphal 60de6507bb tc: man: add limit parameter to tc-sfq man page
Signed-off-by: Florian Westphal <fw@strlen.de>
2010-01-21 10:12:50 -08:00
Alex Badea e6e0b60f2a ip xfrm policy: allow different tmpl family
Allow tmpl IP addresses to have a different family than
selector addresses.  This is useful in conjunction with
XFRM_STATE_AF_UNSPEC.

Signed-off-by: Alex Badea <abadea@ixiacom.com>
2010-01-21 10:11:23 -08:00
Alex Badea 15bb82c6fb ip xfrm state: parse and print "icmp" and "af-unspec" flags
Convert to/from XFRM_STATE_ICMP and XFRM_STATE_AF_UNSPEC state flags.

Signed-off-by: Alex Badea <abadea@ixiacom.com>
2010-01-21 10:10:34 -08:00
Andreas Henriksson 14743a78eb iproute2: avoid using bashisms in configure script.
"function foo" should be "foo()" to work when sh is not bash.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2010-01-21 10:09:22 -08:00
Mike Frysinger 73152614bc tc: respect LDFLAGS for %.so targets
Since there aren't any targets that currently use this pattern rule, this
is more of a proactive fix.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-01-21 10:05:39 -08:00
Arnd Bergmann d63a9b2b1e iproute2/iplink: add macvlan options for bridge mode
Macvlan can now optionally support forwarding between its
ports, if they are in "bridge" mode. This adds support
for this option to "ip link add", "ip link set" and "ip
-d link show".

The default mode in the kernel is now "vepa" mode, meaning
"virtual ethernet port aggregator". This mode is used
together with the "hairpin" mode of an ethernet bridge
that the parent of the macvlan device is connected to.
All frames still get sent out to the external interface,
but the adjacent bridge is able to send them back on
the same wire in hairpin mode, so the macvlan ports
are able to see each other, which the bridge can be
configured to monitor and control traffic between
all macvlan instances. Multicast traffic coming in
from the external interface is checked for the source
MAC address and only delivered to ports that have not
yet seen it.

In bridge mode, macvlan will send all multicast traffic
to other interfaces that are also in bridge mode but
not to those in vepa mode, which get them on the way
back from the hairpin.

The third supported mode is "private", which prevents
communication between macvlans even if the adjacent
bridge is in hairpin mode. This behavior is closer to
the original implementation of macvlan but stricly
maintains isolation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-26 11:22:57 -08:00
Brian Haley a1f277943f Add dadfailed option to ip command
Fix support for IFA_F_DADFAILED and update ip.8 man page.

Signed-off-by: Brian Haley <brian.haley@hp.com>
2009-12-26 11:16:23 -08:00
Patrick McHardy 85eae222d2 iprule: add oif classification support
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Mon, 30 Nov 2009 19:00:14 +0100
>
>> This patch contains iproute support for iprule oif classification
>> for the send-to-self RFC I just sent out.
>
> Patrick, you need to submit a new version of this patch with
> the FIB_RULE_* macro fixed, just like the kernel version got
> fixed.

Thanks for reminind me of this. New patch attached.

commit 0fe5164cbaa1d65dda341075710be71bf1f32d10
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Dec 4 07:06:18 2009 +0100

    iprule: add oif classification support

    Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-26 11:14:22 -08:00
Jamal Hadi Salim e04dd30a38 skbedit: Add support to mark packets
This adds support for setting the skb mark.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
2009-12-26 11:12:43 -08:00
Patrick McHardy 2180b6b50b iplink_vlan: add support for VLAN loose binding flag
This patch adds support for the VLAN loose binding flag that is
supported in net-next to iplink_vlan.

commit 870970deb6cbea7a5d4881bdd717304d5284d315
Author: Patrick McHardy <kaber@trash.net>
Date:   Tue Dec 1 12:21:15 2009 +0100

    iplink_vlan: add support for VLAN loose binding flag

    Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-26 11:07:16 -08:00
Stephen Hemminger ab32267329 Update exported kernel headers
These corespond with 2.6.33-rc2
2009-12-26 11:02:25 -08:00
Stephen Hemminger abdd9bf7c4 iproute2-091226 2009-12-26 10:26:44 -08:00
Andreas Henriksson f1a0125bc0 Slightly improve the configure script.
Split up in functions. Make XT checks bail if previous XT check
was successful.

This result improves the output of the configure script to not indicate
using iptables only because the last test failed (when previous ones could
have already succeded).

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-26 10:24:06 -08:00
Stephen Hemminger 896ebd6c70 Fix warning about sprintf() and NSTAT_HIST
The environment variable could contain format characters, causing
problems. Better to just use it directly.
2009-12-26 10:21:13 -08:00
Stephen Hemminger 985f4578c6 Fix warning about strtod() return value 2009-12-26 10:20:50 -08:00
Simon Horman b49240ec7e flush secondary addresses before primary ones
Unless promote_secondaries has been active deleting the primary address of
an interface will automatically delete all the secondary addresses.

In the case where ip flush requests the primary then secondary addresses to
be removed - which is the order the addresses are returned by the kernel -
this will cause an error as by the time the request to remove a secondary
address is made it will be missing as it will have been deleted in the
course of deleting the primary address.

This approach to solving this problem orders requests for the
deletion of secondary addresses before primary ones providing
rtnl_dump_filter_l(), a version of rtnl_dump_filter() that
iterates over a list of filters. And by providing two specialised
filters print_addrinfo_secondary() and print_addrinfo_primary().

rtnl_dump_filter_l() first iterates over all addresses using
print_addrinfo_secondary(), which appends secondary addresses to the
request buffer.  Then again using print_addrinfo_primary() which appends
primary addresses.

This approach should work regardless of it promote_secondaries is
active or not. And regardless of if any primary of secondary addresses
are present or not.

Signed-off-by: Simon Horman <horms@verge.net.au>
2009-12-26 10:11:02 -08:00
Andreas Henriksson a36ceb85d7 Add new (iptables 1.4.5 compatible) tc/ipt/xt module.
Add a new cleaned up m_xt.c based on m_xt_old.c
The new m_xt.c has been updated to use the new names and new api
that xtables exposes in iptables 1.4.5.
All the old internal api cruft has also been dropped.

Additionally, a configure script test is added to check for
the new xtables api and set the TC_CONFIG_XT flag in Config.
(tc/Makefile already handles this flag in previous commit.)

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-26 10:09:27 -08:00
Andreas Henriksson 80d689d055 Keep the old tc/ipt/xt module for compatibility.
Move the file and rename the configure flags.
The file is being kept around for iptables < 1.4.5 compatibility.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-26 10:09:26 -08:00
Andreas Henriksson 7a96e19977 iproute: make ss --help output to stdout
Peter Palfrader said in http://bugs.debian.org/545008 that
"--help output, if explicitly requested, should go to stdout, not stderr."
which this patch fixes.

Additionally, the exit code was adjusted to success if help was
explicitly requested.

(Syntax error still outputs to stderr and has the same exit code.)

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-26 10:05:27 -08:00
Patrick McHardy c90308ffc7 f_fw: fix compat mode
The kernel takes a lack of options as indication that the fw classifier
should operate in compatibility mode, where marks are mapped directly to
classids.

Commit e22b42a (tc mask patch) broke this by adding an empty TCA_OPTIONS
attribute even if no handle is specified. Restore the old behaviour.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-01 16:20:01 -08:00
Andreas Henriksson 6837f771ed iproute2: use -fPIC in lib/
The static libnetlink.a library is exposed to other users in Debian via the
"iproute-dev" package. Apparently people are interested in using it in their
shared libraries and would like to see the code be position independent.

Patch below makes the code under lib/ build with -fPIC.

See http://bugs.debian.org/547602

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-01 16:17:59 -08:00
Mark Borst 080b3ad428 iproute: "ip mroute show" doesn't show all output interfaces
The command "ip mroute show" will only show the first Oif.

mark@flappie:~$ ip mroute show
(192.168.1.1, 224.0.0.123)       Iif: _rename    Oifs: eth1

mark@flappie:~$ cat /proc/net/ip_mr_cache
Group    Origin   Iif     Pkts    Bytes    Wrong Oifs
7B0000E0 0101A8C0 2          0        0        0  0:1    1:1

This shows 2 Oifs here. However, ipmroute.c, function read_mroute_list(), uses sscanf() with a %s mask for oiflist, which stops after the first whitespace (i.e. after Oif 0:1). The patch below fixes this to read until the newline (though I'm not sure whether this is the proper way to fix it).

After this patch:
mark@flappie:~/iproute-20090324/ip$ ./ip mroute show
(192.168.1.1, 224.0.0.123)       Iif: _rename    Oifs: eth1 eth0

This patch originally submitted as http://bugs.debian.org/550097

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-12-01 16:15:15 -08:00
Brian Haley f4af851bac ipv6: Add IFA_F_DADFAILED flag
Add IFA_F_DADFAILED flag to denote an IPv6 address that has
failed Duplicate Address Detection, that way tools like
/sbin/ip can be more informative.

3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:db8::1/64 scope global tentative dadfailed
       valid_lft forever preferred_lft forever

Signed-off-by: Brian Haley <brian.haley@hp.com>
2009-12-01 15:58:44 -08:00
David Ward ee7ba9875d iproute2: Add ll_index_to_addr function
After calling ll_init_map, all of the information stored in the link-layer map
can be retrieved by function calls (ll_index_to_*), except for the link-layer
address. This patch fills the gap by adding a ll_index_to_addr function.
Changes welcome.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2009-12-01 15:54:16 -08:00
Gilad Ben-Yossef 71e5815105 iproute2 add hoplimit parsing and update usage and documentation
- Parse and handle the hoplimit ip route option and add it to the usage
  line and documentation.

- Add the missing reordering ip route option to the usage line.

- Add documentation for initcwnd ip route option.

Tested by setting hoplimit and retreiving it via "show".

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
[ported to HEAD, fixed a bug with hoplimit lock handling, added documentation]
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Signed-off-by: Yony Amit <yony@comsleep.com>
2009-12-01 15:51:44 -08:00
Stephen Hemminger 232642c28c Remove Changes: comments
Discourage developers from putting change log in comments
now that software has been under change control for 5 years.
2009-12-01 15:49:48 -08:00
David Ward e03dcc040d iproute2: Support 20-byte link layer address in idxmap
Extend the link-layer address field from 8 to 20 bytes to support InfiniBand.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2009-12-01 15:41:39 -08:00
Stephen Hemminger 5a326efed0 iproute2-091117 2009-11-17 10:04:57 -08:00
Mike Frysinger bba2fcd3fa Ignore GDB related files
Revised version of Mike's original patch
2009-11-13 14:20:41 -08:00
Stephen Hemminger 5e2f74a75c Add more files to gitignore
Ignore files from cscope, patch, etc.
2009-11-13 14:18:35 -08:00
Mike Frysinger 05b4f8492b tc: remove dlfcn.h from files that dont need it
A bunch of source files look like they're copy & pasted from other files,
and some include header files that they don't actually need.  Since dlfcn
has very specific usage (and is a pain on a static-only system), drop it
where it isn't really needed.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2009-11-13 14:14:07 -08:00
Stephen Hemminger a6992a9c91 Add static-syms.h to ignore 2009-11-10 10:45:05 -08:00
Mike Frysinger f2e27cfb01 support static-only systems
The iptables code supports a "no shared libs" mode where it can be used
without requiring dlfcn related functionality.  This adds similar support
to iproute2 so that it can easily be used on systems like nommu Linux (but
obviously with a few limitations -- no dynamic plugins).

Rather than modify every location that uses dlfcn.h, I hooked the dlfcn.h
header with stub functions when shared library support is disabled.  Then
symbol lookup is done via a local static lookup table (which is generated
automatically at build time) so that internal symbols can be found.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2009-11-10 10:44:20 -08:00
Mike Frysinger a7a9ddbb67 arpd/ifstat/nstat/rtacct: use daemon()
A bunch of misc utils basically reimplement the daemon() function (the
whole fork/close/chdir/etc...).  Rather than do that, use daemon() as
that will work under nommu Linux systems that lack fork().

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2009-11-10 10:41:44 -08:00
Patrick McHardy 7f03191fda iproute uses too small of a receive buffer
It uses 1MB as receive buf limit by default (without
increasing /proc/sys/net/core/rmem_max it will be limited by less
however) and allows to specify the size manually using "-rcvbuf X"
(-r is already used, so you need to specify at least -rc).

Additionally rtnl_listen() continues on ENOBUFS after printing the
error message.
2009-11-10 09:14:33 -08:00
Sven Anders 24f3818244 Fix flushing code - rtnl_send_check
I experienced an error, if I try to perform a

  ip route flush proto 4

with many routes in a complex environment, it
gave me the following error:

  Failed to send flush request: Success
  Flush terminated
2009-11-10 09:07:26 -08:00
Stephen Hemminger 8a1c7fcb27 Consolidate fprintf statements
Doing one item per call is like old MODULA2 code.
2009-11-10 09:01:57 -08:00
Stephen Hemminger 8007bfb5ad Update to 2.6.32 kernel headers 2009-11-10 08:51:17 -08:00
David Woodhouse 580fbd88f7 Add 'ip tuntap' support.
This patch provides support for 'ip tuntap', allowing creation and
deletion of persistent tun/tap devices.
2009-09-19 12:49:41 -07:00
Eric Dumazet daf49fd614 ss: adds a space before congestion string
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-09-11 08:06:53 -07:00
Eric Dumazet bbe3205336 ss: correct display of sk pointer
On 64bit arches, sk pointer was 32/32 reversed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-09-11 08:06:07 -07:00
Stephen Hemminger f40554f687 Update kernel headers to 2.6.31
Final 2.6.31 released, so update sanitized headers.
2009-09-10 09:03:22 -07:00
Stephen Hemminger f0309aa493 add include/linux/if_arp.h 2009-08-26 09:41:02 -07:00
Mike Frysinger 729cbe84b8 tc/q_atm.so: respect LDFLAGS
The q_atm.so target defines its own link target, but it doesn't respect the
$(LDFLAGS) variable.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2009-08-06 14:50:08 -07:00
Andreas Henriksson 915fae193b iproute: typo in ip manpage
Fix spelling (s/commoand/command/) in ip(8) manpage.

Spotted by dann frazier <dannf@hp.com> - http://bugs.debian.org/539830
2009-08-06 14:49:31 -07:00
Stephen Hemminger c1cdf2d214 Fix typo in IPPROTO_DCCP 2009-08-06 14:38:18 -07:00
Stephen Hemminger c40bba6922 update kernel headers to 2.6.31-rc5 2009-08-06 14:38:13 -07:00
Stephen Hemminger 2d8240f8d9 Fix flushing of large number of entries
Checking for errors would cause some responses to be lost.
2009-07-13 10:15:23 -07:00
Stephen Hemminger 1558971d43 fix handling of GRED DPs args 2009-05-26 15:58:05 -07:00
Wolfgang Grandegger 5a2044782b iproute2: Support for the CAN netlink
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
2009-05-26 15:22:44 -07:00
Wolfgang Grandegger ed1af7e868 iproute2: Fixes an issue with cross-compilation
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
2009-05-26 15:22:20 -07:00
Sascha Hlusiak a07e991253 iproute2: ISATAP potential router list
--Boundary-01=_wxi/JRaNdLkbr7g
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi Stephen,

please review attached patch to add support for in-kernel potential router=
=20
lists for ISATAP tunnels.

Usage:
ip tunnel add name isatap0 mode isatap local 192.168.1.100
ip tunnel prl dev isatap0 prl-default 192.168.1.1
ip tunnel prl dev isatap0 prl-nodefault 192.168.1.2
ip tunnel prl dev isatap0 prl-delete 192.168.1.1
ip tunnel show # pr and pdr will be listed as well

Patch based on http://osprey67.com/seal/iproute2_diff.v0_3.txt by Fred L.=20
Templin.

Thanks,
Sascha
2009-05-26 15:21:21 -07:00
Denys Fedoryshchenko f4a8b23d39 Filter class output by classid
Sometimes while dividing bandwidth by classes it is useful to see how some
specific class doing things live.

Which my simple patch it is possible to do
watch -n1 "tc -s -d class show dev eth0.2022 classid 1:1520"
and to get live statistics, how packets queued or dropped, and how much
bandwidth used (if estimator defined) for specific class.

Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>
2009-05-26 15:20:26 -07:00
Andreas Henriksson cb2eb9997a Bug#526329: iproute: Segfault on garbage lladdr
On tor, 2009-04-30 at 14:32 +0100, Timothy Baldwin wrote:
> Package: iproute
> Version: 20090324-1
> Severity: minor
>
>
> $ ip link set eth0 address help
> "help" is invalid lladdr.
> Segmentation fault
>
> Desipte the invalid command line arguments it shouldn't crash.
>

Callers need to check return value from ll_addr_a2n(). Patch below.

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2009-05-26 15:18:05 -07:00
Stephen Hemminger ebde878097 Allow default DP of zero in gred
To emulate WRED behaviour, allow default DP of zero.
2009-05-26 15:15:01 -07:00
Stephen Hemminger d13cee6d59 Add IPV6 match pretty print 2009-05-26 15:14:29 -07:00
Stephen Hemminger b4d41f41b6 Add u32 extension to match on ether source/destination
Use existing u32 mechanism to match based on Ethernet header.
No need for protocol that already exists.
2009-04-15 15:39:34 -07:00
Thomas Egerer b9ab720e33 Fix display of xfrm
When using iproute2 to display information on policies installed in kernel
(ip x p s) output is incorrect: IPv6 addresses printed as IPv4 addresses.
In case I am dealing with inter protocol policies where the template's address
family differs from those of the policy itself.
The patch attached solves this problem.
2009-04-14 16:15:17 -07:00
Thomas Graf ff213c4bf2 cgroup support
Stephen,

iproute2 part of the cgroup classifier that has been included upstream
for a while. Please apply.
2009-04-13 13:38:33 -07:00
Oliver Hartkopp 685f3a9ffb iproute2: add vcan to ip link help text
Hello Stephen,

thanks für the commit of my last CAN patch.

Today i got a hint, that the help text of 'ip link' can be improved also.

Many thanks!

Oliver

Signed-Off-By: Oliver Hartkopp <oliver@hartkopp.net>
2009-04-13 13:38:05 -07:00
Stephen Hemminger 9fce67dd46 Remove goto chain
The selector logic is clearer with if / else if
2009-04-03 09:44:04 -07:00
Olaf Rempel e48f73d6a5 iproute2-2.6.14-051107: missing arpd directory
arpd requires a directory (/var/lib/arpd/) to run.
see attached patch, which lets iproute create this directroy during install.
2009-03-27 11:26:57 -07:00
Oliver Hartkopp 98f9a1d244 Add support Controller Area Network
t's not a big problem, but it makes a better show in 'ip link show' on
CAN interfaces :-)

I also moved __PF(CAN,can) in ll_proto.c to the same position where it
can be found in if_ether.h .

The only thing i did not know if the __PF(CAN,can) in ll_types.c needs
to be put in #ifdef ARPHDR_CAN like __PF(HWX25,hwx25) is or not. You
definitely know that better than me.
2009-03-27 11:21:29 -07:00
Srivats P c3651bf476 ip6tunnel: Fix no default display of ip4ip6 tunnels
"ip -6 tunnel show" displays only ip6ip6 tunnels not ip4ip6 tunnels
 - it should display all irrespective of proto.

This is because the default tunnel proto is initialized to IPPROTO_IPV6 in ip6_tnl_parm_init() which is fine for a 'add' command but not for 'show'. This patch overrides proto with 0 signifying 'mode any' as the default in case of a 'show'.
2009-03-27 11:17:26 -07:00
Sascha Hlusiak eeef12c514 iptunnel: allow ISATAP with stateless autoconf
please commit my patch below to the iproutes package. It just an incorrect
check so that adding an isatap tunnel with remote works, since it's needed if
one wants to use stateless autoconf. The current check makes tunnel mode
isatap unusable for all client users.
2009-03-27 11:14:00 -07:00
Andreas Henriksson 6cdbf37063 iproute2: drop equalize support.
Hello Stephen and netdev people!

Currently you can configure "equalize" and it looks all fine and dandy.
The kernel has the interface defined, but apparently there's never actually
been any implementation for it (only a never merged patch in the 2.4 era).

I'm suggesting to drop the code to give any potential users of this feature
the benefit of receiving a proper error message. I see it unlikely that
this will be implemented in the near future, but if it ever happens
reviving the iproute2 side should be as easy as git revert this patch.

For more details see http://bugs.debian.org/149897

Regards,
Andreas Henriksson
2009-03-27 11:11:12 -07:00
Varun Chandramohan 4b6e07d8fd Enable Type Labels For "ip monitor all"
This patch adds prefix lables for "ip monitor all" command to simplfy
understanding of the output.

Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
2009-03-27 11:09:04 -07:00
Varun Chandramohan fb063322b4 Add Monitor Support For Neigh Table
This patch adds exclusive support to enable monitoring
neighbour table entries in ip command.

Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
2009-03-27 11:09:04 -07:00
Stephen Hemminger 52d6a85050 remove duplicate limits.h 2009-03-27 11:07:46 -07:00
Petr Jediný 10494d2724 Changing commandline help text to be more uniform... 2009-03-27 11:05:44 -07:00
jamal 4cd23bdde9 ip: Allow for easier debug of buggy devices that dont send their names
patch attached this time..

On Fri, 2008-08-08 at 10:01 -0400, jamal wrote:
> wireless drivers using wext is a prime example if you need a test case.
>
> cheers,
> jamal

ip: Allow for easier debug of buggy devices that dont send their names

With the old message couldnt tell which device had the bug.
This patch provides at least an ifindex to narrow it down.
Theres also no point in bailing out because of one bug; we
allow it to go on so we could dump as much info as we can

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
2009-03-27 10:59:25 -07:00
742 changed files with 153869 additions and 30731 deletions

130
.clang-format Normal file
View File

@ -0,0 +1,130 @@
# SPDX-License-Identifier: GPL-2.0
#
# clang-format configuration file. Intended for clang-format >= 4.
#
# For more information, see:
#
# Documentation/process/clang-format.rst
# https://clang.llvm.org/docs/ClangFormat.html
# https://clang.llvm.org/docs/ClangFormatStyleOptions.html
#
---
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
#AlignEscapedNewlines: Left # Unknown to clang-format-4.0
AlignOperands: true
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
#AfterExternBlock: false # Unknown to clang-format-5.0
BeforeCatch: false
BeforeElse: false
IndentBraces: false
#SplitEmptyFunction: true # Unknown to clang-format-4.0
#SplitEmptyRecord: true # Unknown to clang-format-4.0
#SplitEmptyNamespace: true # Unknown to clang-format-4.0
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
#BreakBeforeInheritanceComma: false # Unknown to clang-format-4.0
BreakBeforeTernaryOperators: false
BreakConstructorInitializersBeforeComma: false
#BreakConstructorInitializers: BeforeComma # Unknown to clang-format-4.0
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: false
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
#CompactNamespaces: false # Unknown to clang-format-4.0
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 8
ContinuationIndentWidth: 8
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
#FixNamespaceComments: false # Unknown to clang-format-4.0
# Taken from:
# git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ \
# | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \
# | sort | uniq
ForEachMacros:
- 'list_for_each_entry'
- 'list_for_each_entry_safe'
- 'mnl_attr_for_each_nested'
- 'hlist_for_each'
- 'hlist_for_each_safe'
- 'hlist_for_each_entry'
#IncludeBlocks: Preserve # Unknown to clang-format-5.0
IncludeCategories:
- Regex: '.*'
Priority: 1
IncludeIsMainRegex: '(Test)?$'
IndentCaseLabels: false
#IndentPPDirectives: None # Unknown to clang-format-5.0
IndentWidth: 8
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: Inner
#ObjCBinPackProtocolList: Auto # Unknown to clang-format-5.0
ObjCBlockIndentWidth: 8
ObjCSpaceAfterProperty: true
ObjCSpaceBeforeProtocolList: true
# Taken from git's rules
#PenaltyBreakAssignment: 10 # Unknown to clang-format-4.0
PenaltyBreakBeforeFirstCallParameter: 30
PenaltyBreakComment: 10
PenaltyBreakFirstLessLess: 0
PenaltyBreakString: 10
PenaltyExcessCharacter: 100
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
ReflowComments: false
SortIncludes: false
#SortUsingDeclarations: false # Unknown to clang-format-4.0
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
#SpaceBeforeCtorInitializerColon: true # Unknown to clang-format-5.0
#SpaceBeforeInheritanceColon: true # Unknown to clang-format-5.0
SpaceBeforeParens: ControlStatements
#SpaceBeforeRangeBasedForLoopColon: true # Unknown to clang-format-5.0
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInContainerLiterals: false
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp03
TabWidth: 8
UseTab: Always
...

36
.gitignore vendored
View File

@ -1,5 +1,41 @@
# locally generated
Config
static-syms.h
config.*
*.o
*.a
*.so
*~
\#*#
# cscope
cscope.*
ncscope.*
tags
TAGS
# git files that we don't want to ignore even it they are dot-files
!.gitignore
!.mailmap
# for patch generation
*.diff
*.patch
*.orig
*.rej
# for quilt
.pc
patches
series
# for gdb
.gdbinit
.gdb_history
*.gdb
# tests
testsuite/results
testsuite/iproute2/iproute2-this
testsuite/tools/generate_nlmsg
testsuite/tests/ip/link/dev_wo_vf_rate.nl

22
.mailmap Normal file
View File

@ -0,0 +1,22 @@
#
# This list is used by git-shortlog to fix a few botched name translations
# in the git archive, either because the author's full name was messed up
# and/or not always written the same way, making contributions from the
# same person appearing not to be so or badly displayed.
#
# Format
# Full name <goodaddress> <badaddress>
Steve Wise <larrystevenwise@gmail.com> <swise@opengridcomputing.com>
Steve Wise <larrystevenwise@gmail.com> <swise@chelsio.com>
Stephen Hemminger <stephen@networkplumber.org> <sthemmin@microsoft.com>
Stephen Hemminger <stephen@networkplumber.org> <shemming@brocade.com>
Stephen Hemminger <stephen@networkplumber.org> <stephen.hemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@vyatta.com>
Stephen Hemminger <stephen@networkplumber.org> <shemminger>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@linux-foundation.org>
Stephen Hemminger <stephen@networkplumber.org> <shemminger@osdl.org>
Stephen Hemminger <stephen@networkplumber.org> <osdl.org!shemminger>
Stephen Hemminger <stephen@networkplumber.org> <osdl.net!shemminger>
David Ahern <dsahern@gmail.com> <dsa@cumulusnetworks.com>

563
ChangeLog
View File

@ -1,563 +0,0 @@
2006-03-21 Stephen Hemminger <shemminger@freekitty.pdx.osdl.net>
* Back out the 2.4 utsname patch
2006-03-21 James Lentini <jlentini@netapp.com>
* Increase size of hw address allowed for ip neigh to allow
for IB.
2006-03-14 Russell Stuart <russell-lartc@stuart.id.au>
* Fix missing memset in tc sample
* Fixes for tc hash samples
* Add sample divisor
2006-03-10 Alpt <alpt@freaknet.org>
* Add more rt_proto values
2006-03-10 Dale Sedivec <darkness@caliginous.net>
* Warn when using "handle" instead of "classid" with "tc class"
2006-03-10 Jean Tourrilhes <jt@hpl.hp.com>
* Fix endless loop in netlink error handling
2006-03-10 Stephen Hemminger <shemminger@osdl.org>
* Change default lnstat count to 1
* Update to 2.6.16 headers
* Add fake version of include/linux/socket.h to fix warnings
2006-01-12 Patrick McHardy <kaber@trash.net>
* Handle DCCP in ipxfrm.c to allow using port numbers in the selector.
2006-01-10 Masahide NAKAMURA <nakam@linux-ipv6.org>
* Add ip link ntable
2006-01-10 Stephen Hemminger <shemminger@osdl.org>
* Update headers to santized kernel 2.6.15
* Fix ipv6 priority option in u32
2006-01-03 Alpt <alpt@freaknet.org>
* Ip man page addition
2006-01-03 Jamal Hadi Salim <hadi@znyx.com>
* Documentation for ifb
2005-12-09 Stephen Hemminger <shemminger@osdl.org>
* Add corrupt feature to netem
2005-12-02 Stephen Hemminger <shemminger@osdl.org>
* Backout ambigious ip command matches
2005-11-22 Stephen Hemminger <shemminger@osdl.org>
* Handle ambigious ip command matches
2005-11-22 Patrick McHardy <kaber@trash.net>
* Add back ip command aliases
2005-11-07 Masahide NAKAMURA <nakam@linux-ipv6.org>
* Updating for 2.6.14
- Show UPD{SA,POLICY} message information from kernel instead of error
- Add lengh check of deleting message from kernel
- Use macro for struct xfrm_user{sa,policy}_id
* Minor fix:
- Add fflush at the end of normal dump
2005-11-01 Jamal Hadi Salim <hadi@znyx.com>
* Fix handling of XFRM monitor and state
2005-11-01 Stephen Hemminger <shemminger@osdl.org
* Update to 2.6.14 sanitized headers
2005-10-24 Patrick McHardy <kaber@trash.net>
* Fix ip commnad shortcuts
2005-10-12 Stephen Hemminger <shemminger@osdl.org>
* Add more CBQ examples from Fedora Core
* Fix buffer overrun in iproute because of bits vs. bytes confusion
2005-10-12 Jamal Hadi Salim <hadi@znyx.com>
* Fix ip rule flush, need to reopen rtnl
2005-10-07 Stephen Hemminger <shemminger@osdl.org>
* Reenable ip mroute
2005-10-07 Mike Frysinger <vapier@gentoo.org>
* Handle pfifo_fast that has no qopt without segfaulting
2005-10-05 Mads Martin Joergensen <mmj@suse.de>
* Trivial netem ccopts
2005-10-04 Jerome Borsboom <j.borsboom@erasmusmc.nl>
* Fix regression in ip addr (libnetlink) handling
2005-09-21 Stephen Hemminger <shemminger@osdl.org>
* Fix uninitialized memory and leaks with valgrind
Reported by Redhat
2005-09-01 Mike Frysinger <vapier@gentoo.org>
* Fix build issues with netem tables (parallel make and HOSTCC)
2005-09-01 Stephen Hemminger <shemminger@osdl.org>
* Integrate support for DCCP into 'ss' (from acme)
* Add -batch option to ip.
* Update to 2.6.14 headers
2005-09-01 Eric Dumazet <dada1@cosmosbay.com>
* Fix lnstat : First column should not be summed
2005-08-16 Stephen Hemminger <shemminger@osdl.org>
* Limit ip route flush to 10 rounds.
* Cleanup ip rule flush error message
2005-08-08 Stephen Hemminger <shemminger@osdl.org>
* Update to 2.6.13+ kernel headers
* Fix array overrun in paretonormal
* Fix ematch to not include dropped fields from skb.
2005-07-14 Thomas Graf <tgraf@suug.ch>
* Make ematch bison/lex build with common flex
2005-07-10 Stephen Hemminger <shemminger@osdl.org>
* Fix Gcc 4.0 build warnings signed/unsigned
2005-06-23 Jamal Hadi Salim <hadi@znyx.com>
* Fix for options process with ipt
2005-06-23 Thomas Graf <tgraf@suug.ch>
* Add extended matches (nbyte, cmp, u32, meta)
* Add basic classifier
* Fix clean/distclean makefile targets
* update local header file copies
* IPv4 multipath algorithm selection support
* cscope Makefile target
* Fix off-by-one while generating argument vector
in batched mode.
* Assume stdin if no argument is given to -batch
2005-06-22 Stephen Hemminger <shemminger@osdl.org>
* Update include files to 2.6.12
* Add ss support for TCP_CONG
2005-06-13 Steven Whitehouse <steve@chygwyn.com>
* Decnet doc's update
2005-06-07 Stephen Hemminger <shemminger@osdl.org>
* Fix 'ip link' map to handle case where device gets autoloaded
by using if_nametoindex as fallback
* Device indices are unsigned not int.
2005-06-07 Masahide NAKAMURA <nakam@linux-ipv6.org>
* [ip] show timestamp when using '-t' option.
* [ip] remove duplicated code for expired message of xfrm.
* [ip] add "deleteall" command for xfrm;
"flush" uses kernel's flush interface and
"deleteall" uses legacy iproute2's flush feature like
getting-and-deleting-for-each.
2005-03-30 Stephen Hemminger <shemminger@osdl.org>
* include/linux/netfilter_ipv4/ip_tables.h dont include compiler.h
because it isn't needed and not on all systems
* Update rtnetlink.h and pkt_cls.h to be stripped versions
of headers from 2.6.12-rc1
2005-03-30 Jamal Hadi Salim <hadi@znyx.com>
* Proper verison of iptables headers (from 1.3.1)
* Set revision file in m_ipt
* Fix action_util naming in mirred
* don't call ll_init_map in mirred
2005-03-19 Thomas Graf <tgraf@suug.ch>
* Warn about wildcard deletions and provide IFA_ADDRESS upon
deletions to enforce prefix length validation for IPv4.
* Fix netlink message alignment when the last routing attribute added
has a data length not aligned to RTA_ALIGNTO.
2005-03-30 Masahide NAKAMURA <nakam@linux-ipv6.org>
* ipv6 xfrm allocspi and monitor support.
2005-03-29 Stephen Hemminger <shemminger@osdl.org>
* switch to stack for netem tables
2005-03-18 Stephen Hemminger <shemminger@osdl.org>
* add -force option to batch mode
* handle midline comments in batch mode
* sum per cpu fields in lnstat correctly
2005-03-14 Stephen Hemminger <shemminger@osdl.org>
* cleanup batch mode, allow continuation, comments etc.
* recode reuse of netlink socket
2005-03-14 Boian Bonev <boian@bonev.com>
* enhancement to batch mode.
it does not exit on error, just report it
tc reuses the already open netlink socket for subsequent command(s)
2005-03-14 Thomas Graf <tgraf@suug.ch>
* ip link command
print NO-CARRIER flag if there is no carrier and the link is up.
2005-03-14 Patrick McHardy <kaber@trash.net>
* bug: Use USER_HZ where necessary
2005-03-10 Jamal Hadi Salim <hadi@znyx.com>
* Fix bug with register_target
2005-03-10 Stephen Hemminger <shemminger@osdl.org>
* fix pkt_cls.h to have tc_u32_mark
* update include files to be stripped versions of 2.6.11
* add documentation about netem distributions [from nistnet]
* turn off nup in document make [from FC3]
* don't build with extra debug info (-g) [from FC3]
2005-03-10 Nix <nix@esperi.org.uk>
* make man3 directory
2005-03-10 Pasi <Pasi.Eronen@nokia.com>
* add ESP-in-UDP encapsulation
2005-03-10 Thomas Graf <tgraf@suug.ch>
* [NETEM] Fix off by one
* update local header file copies
* [NEIGH] print number of probes done so far (statistics mode only)
2005-03-10 Herbert Xu <herbert@gondor.apana.org.au>
* Trivial typo in ip help
2005-02-09 Stephen Hemminger <shemminger@osdl.org>
* netem distribution data reorganization
2005-02-09 Roland Dreier <roland@topspin.com>
* ip over infiniband address display
2005-02-09 Jim Gifford <lfs@jg555.com>
* make install fix for ip/
2005-02-07 Mads Martin Joergensen <mmj@suse.de>
* Don't mix address families when flushing
2005-02-07 Stephen Hemminger <shemminger@osdl.org>
* Validate classid is not too large to cause loss of bits.
2005-02-07 Jean-Marc Ranger <jmranger@sympatico.ca>
* need to call getline() with null for first usage
* don't overwrite const arg
2005-02-07 Stephen Hemminger <shemminger@osdl.org>
* Add experimental distribution
2005-01-18 Yun Mao <maoy@cis.upenn.edu>
* typo in ss
2005-01-18 Thomas Graf <tgraf@suug.ch>
* tc pedit/action cleanups
* add addraw_l
* rtattr_parse cleanups
2005-01-17 Jamal Hadi Salim <hadi@znyx.com>
* typo in m_mirred
* add support for pedit
2005-01-13 Jim Gifford <lfs@jg555.com>
* Fix allocation size error in nomal and paretonormal generation
programs.
2005-01-12 Masahide Nakamura <nakam@linux-ipv6.org>
* ipmonitor shows IPv6 prefix list notification
* update to iproute2 xfrm for ipv6
2005-01-12 Stephen Hemminger <shemminger@osdl.org>
* Fix compile warnings when building 64bit system since
u64 is unsigned long, but format is %llu
2005-01-12 "Catalin(ux aka Dino) BOIE" <util@deuroconsult.ro>
* Add the possibility to use fwmark in u32 filters
2005-01-12 Andi Kleen <ak@suse.de>
* Add netlink manual page
2004-10-20 Stephen Hemminger <shemminger@osdl.org>
* Add warning about "ip route nat" no longer supported
2005-01-12 Thomas Graf <tgraf@suug.ch>
* Tc testsuite
2005-01-12 Jamal Hadi Salim <hadi@znyx.com>
* Add iptables tc support. This meant borrowing headers
from iptables *ugh*
2004-12-08 Jamal Hadi Salim <hadi@znyx.com>
* Add mirror and redirect actions
2004-10-20 Stephen Hemminger <shemminger@osdl.org>
* Don't include <asm/byteorder.h> since then we get dependant on
kernel headers on host machine
* Minor fix for building on old machine without IPPROTO_SCTP
2004-10-19 Harald Welte <laforge@gnumonks.org>
* Replace rtstat (and ctstat) with new lnstat
2004-10-19 Mads Martin Joergensen <mmj@suse.de>
* Ip is using the wrong structure in ipaddress.c when showing stats
* Make sure no buffer overflow in nstat
2004-10-19 Michal <md@lnet.pl>
* fix scaling in print_rates (for bits)
2004-09-28 Stephen Hemminger <shemminger@osdl.org>
* fix build problems with arpd and pthread
* add pkt_sched.h
2004-09-28 Mike Frysinger <vapier@gentoo.org>
* make man8 directory
* install ifcfg and rtpr scripts
2004-09-28 Andreas Haumer <andreas@xss.co.at>
* make install symlink fix.
2004-09-28 Masahide Nakamura <nakam@linux-ipv6.org>
* ICMP/ICMPv6's type and code in IPsec selector.
* fixes `ip xfrm`'s algorithm key when using hexadecimal
* support 'ip xfrm' protocol types
* flush message types for XFRM's policy/state
2004-09-01 Stephen Hemminger <shemminger@osdl.org>
* Fix ip command to not crash when interface name is too long.
always use strncpy(.., IFNAMSIZ)
2004-08-31 Stephen Hemminger <shemminger@osdl.org>
* Add gact documentation from jamal
* Chang more arguments to rtnetlink API const
* Drop dead queuing disciplines
* Handle qdisc without xstats in core rather than
putting stub's everywhere
* Add requeue to tc_stats and handle new/old ABI issues
2004-08-30 Stephen Hemminger <shemminger@osdl.org>
* Make clean and install changes for man pages
* Patch from jamal to support gact
* Add support for loading distributions to netem
2004-08-23 Stephen Hemminger <shemminger@osdl.org>
* Update from jamal for all the parts that got broken in the
last classification patch.
* Hfsc/sc patch from patrick
2004-08-13 Stephen Hemminger <shemminger@osdl.org>
* Add jamal's tc extensions for classification
* Get rid of old Patches/ directory for tcp_diag module
* Make get_rate table based.
2004-08-11 Stephen Hemminger <shemminger@osdl.org>
* Add xfrm message formatting from
Masahide Nakamura <nakam@linux-ipv6.org>
2004-08-09 Stephen Hemminger <shemminger@osdl.org>
* Fix netem scheduler to handle case where psched us != real us
* Remove configuration for everything that can depend on
extracted kernel headers
* Add kernel headers required to include/linux
2004-08-04 Stephen Hemminger <shemminger@osdl.org>
* Get rid of old tcp_diag module, it is part of kernel.
* Add some kernel include files back (netlink, tcp_diag, pkt_sched)
2004-07-30 Stephen Hemminger <shemminger@osdl.org>
* Make ip xfrm stuff config option since it doesn't exist on 2.4
* HFSC doesn't exist on older 2.4 kernels so make it configurable
* HTB API changed and won't build with mismatched version.
Rather than sticking user with a build error, just don't
build it in on mismatch.
* Change configure script to make sure netem is the correct
version. I changed the structure def. a couple of times before
settling on the final API
2004-07-16 Stephen Hemminger <shemminger@osdl.org>
* Add htb mpu support
http://luxik.cdi.cz/~devik/qos/htb/v3/htb_tc_overhead.diff
* Three small xfrm updates
2004-07-07 Stephen Hemminger <shemminger@osdl.org>
* Fix if_ether.h to fix arpd build
* Add hfsc scheduler support
* Add ip xfrm support
* Add add jitter (instead of rate) to netem scheduler
2004-07-02 Stephen Hemminger <shemminger@osdl.org>
* use compile to test for ATM libraries
* put TC layered scheduler hooks in /usr/lib/tc as shared lib
before it looked in standard search path (/lib;/usr/lib;...)
which seems out of place.
* build netem as shared library (more for testing/example)
* build ATM as shared library since libatm may be on build
machine but not on deployment machine
* fix make install to not install SCCS directories
2004-07-01 Stephen Hemminger <shemminger@osdl.org>
* add more link options to ip command (from Mark Smith
* add rate and duplicate arguments to tc command
* add -iec flag for tc printout
* rename delay scheduler to netem
2004-06-25 Stephen Hemminger <shemminger@osdl.org>
* Add loss parameter to delay
* Rename delay qdisc to netsim
* Add autoconfiguration by building a Config file
and using it.
2004-06-09 Stephen Hemminger <shemminger@osdl.org>
* Report rates in K=1000 (requested by several people)
* Add GNU long style options
* For HTB use get_hz to pick up value of system HZ at runtime
* Delete unused funcs.
2004-06-08 Stephen Hemminger <shemminger@osdl.org>
* Cleanup ss
- use const char and local functions where possible
* Add man pages from SuSe
* SuSE patches
- path to db4.1
- don't hardcode path to /tmp in ifstat
Alternat fix: was to use TMPDIR
- handle non-root user calling ip route flush going into
an infinite loop.
Alternate fix: was to timeout if route table doesn't empty.
* Try and get rid of dependency on kernel include files
Get rid of having private glibc-include headers
2004-06-07 Stephen Hemminger <shemminger@osdl.org>
* Import patches that make sense from Fedora Core 2
- iproute2-2.4.7-hex
print fwmark in hex
- iproute2-2.4.7-netlink
handle getting right netlink mesg back
- iproute2-2.4.7-htb3-tc
add HTB scheduler
- iproute2-2.4.7-default
add entry default to rttable
2004-06-04 Stephen Hemminger <shemminger@osdl.org>
* Add support for vegas info to ss
2004-06-02 Stephen Hemminger <shemminger@osdl.org>
* Use const char in utility routines where appropriate
* Rearrange include files so can build with standard headers
* For "tc qdisc ls" see the default queuing discpline "pfifo_fast"
and understand it
* Get rid of private defintions of network headers which existed
only to handle old glibc
2004-04-15 Stephen Hemminger <shemminger@osdl.org>
* Add the delay (network simulation scheduler)
2004-04-15 Stephen Hemminger <shemminger@osdl.org>
* Starting point baseline based on iproute2-2.4.7-ss020116-try

157
Makefile
View File

@ -1,80 +1,137 @@
DESTDIR=/usr/
LIBDIR=/usr/lib/
SBINDIR=/sbin
CONFDIR=/etc/iproute2
DOCDIR=/share/doc/iproute2
MANDIR=/share/man
# SPDX-License-Identifier: GPL-2.0
# Top level Makefile for iproute2
-include config.mk
ifeq ("$(origin V)", "command line")
VERBOSE = $(V)
endif
ifndef VERBOSE
VERBOSE = 0
endif
ifeq ($(VERBOSE),0)
MAKEFLAGS += --no-print-directory
endif
PREFIX?=/usr
SBINDIR?=/sbin
CONFDIR?=/etc/iproute2
NETNS_RUN_DIR?=/var/run/netns
NETNS_ETC_DIR?=/etc/netns
DATADIR?=$(PREFIX)/share
HDRDIR?=$(PREFIX)/include/iproute2
DOCDIR?=$(DATADIR)/doc/iproute2
MANDIR?=$(DATADIR)/man
ARPDDIR?=/var/lib/arpd
KERNEL_INCLUDE?=/usr/include
BASH_COMPDIR?=$(DATADIR)/bash-completion/completions
# Path to db_185.h include
DBM_INCLUDE:=/usr/include
DBM_INCLUDE:=$(DESTDIR)/usr/include
SHARED_LIBS = y
DEFINES= -DRESOLVE_HOSTNAMES -DLIBDIR=\"$(LIBDIR)\"
ifneq ($(SHARED_LIBS),y)
DEFINES+= -DNO_SHARED_LIBS
endif
#options if you have a bind>=4.9.4 libresolv (or, maybe, glibc)
LDLIBS=-lresolv
ADDLIB=
DEFINES+=-DCONFDIR=\"$(CONFDIR)\" \
-DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \
-DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\"
#options for decnet
ADDLIB+=dnet_ntop.o dnet_pton.o
#options for AX.25
ADDLIB+=ax25_ntop.o
#options for ipx
ADDLIB+=ipx_ntop.o ipx_pton.o
#options for AX.25
ADDLIB+=rose_ntop.o
CC = gcc
HOSTCC = gcc
CCOPTS = -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall
CFLAGS = $(CCOPTS) -I../include $(DEFINES)
#options for mpls
ADDLIB+=mpls_ntop.o mpls_pton.o
#options for NETROM
ADDLIB+=netrom_ntop.o
CC := gcc
HOSTCC ?= $(CC)
DEFINES += -D_GNU_SOURCE
# Turn on transparent support for LFS
DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
CCOPTS = -O2 -pipe
WFLAGS := -Wall -Wstrict-prototypes -Wmissing-prototypes
WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS)
YACCFLAGS = -d -t -v
LDLIBS += -L../lib -lnetlink -lutil
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma dcb man vdpa
SUBDIRS=lib ip tc misc netem genl
LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
LDLIBS += $(LIBNETLINK)
LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
all: Config
all: config.mk
@set -e; \
for i in $(SUBDIRS); \
do $(MAKE) $(MFLAGS) -C $$i; done
do echo; echo $$i; $(MAKE) -C $$i; done
Config:
sh configure $(KERNEL_INCLUDE)
.PHONY: clean clobber distclean check cscope version
help:
@echo "Make Targets:"
@echo " all - build binaries"
@echo " clean - remove products of build"
@echo " distclean - remove configuration and build"
@echo " install - install binaries on local machine"
@echo " check - run tests"
@echo " cscope - build cscope database"
@echo " version - update version"
@echo ""
@echo "Make Arguments:"
@echo " V=[0|1] - set build verbosity level"
config.mk:
@if [ ! -f config.mk -o configure -nt config.mk ]; then \
sh configure $(KERNEL_INCLUDE); \
fi
install: all
install -m 0755 -d $(DESTDIR)$(SBINDIR)
install -m 0755 -d $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples
install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples/diffserv
install -m 0644 README.iproute2+tc $(shell find examples -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples
install -m 0644 $(shell find examples/diffserv -maxdepth 1 -type f) \
$(DESTDIR)$(DOCDIR)/examples/diffserv
@for i in $(SUBDIRS) doc; do $(MAKE) -C $$i install; done
install -m 0755 -d $(DESTDIR)$(ARPDDIR)
install -m 0755 -d $(DESTDIR)$(HDRDIR)
@for i in $(SUBDIRS); do $(MAKE) -C $$i install; done
install -m 0644 $(shell find etc/iproute2 -maxdepth 1 -type f) $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(MANDIR)/man8
install -m 0644 $(shell find man/man8 -maxdepth 1 -type f) $(DESTDIR)$(MANDIR)/man8
ln -sf tc-bfifo.8 $(DESTDIR)$(MANDIR)/man8/tc-pfifo.8
ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/rtstat.8
ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/ctstat.8
ln -sf rtacct.8 $(DESTDIR)$(MANDIR)/man8/nstat.8
ln -sf routel.8 $(DESTDIR)$(MANDIR)/man8/routef.8
install -m 0755 -d $(DESTDIR)$(MANDIR)/man3
install -m 0644 $(shell find man/man3 -maxdepth 1 -type f) $(DESTDIR)$(MANDIR)/man3
install -m 0755 -d $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/tc $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 bash-completion/devlink $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR)
snapshot:
echo "static const char SNAPSHOT[] = \""`date +%y%m%d`"\";" \
> include/SNAPSHOT.h
version:
echo "static const char version[] = \""`git describe --tags --long`"\";" \
> include/version.h
clean:
rm -f cscope.*
@for i in $(SUBDIRS) doc; \
do $(MAKE) $(MFLAGS) -C $$i clean; done
@for i in $(SUBDIRS) testsuite; \
do $(MAKE) -C $$i clean; done
clobber: clean
rm -f Config
clobber:
touch config.mk
$(MAKE) clean
rm -f config.mk cscope.*
distclean: clobber
check: all
$(MAKE) -C testsuite
$(MAKE) -C testsuite alltests
@if command -v man >/dev/null 2>&1; then \
echo "Checking manpages for syntax errors..."; \
$(MAKE) -C man check; \
else \
echo "man not installed, skipping checks for syntax errors."; \
fi
cscope:
cscope -b -q -R -Iinclude -sip -slib -smisc -snetem -stc

46
README
View File

@ -1,36 +1,42 @@
Primary site is:
http://developer.osdl.org/dev/iproute2
This is a set of utilities for Linux networking.
Original FTP site is:
ftp://ftp.inr.ac.ru/ip-routing/
Information:
https://wiki.linuxfoundation.org/networking/iproute2
Download:
http://www.kernel.org/pub/linux/utils/net/iproute2/
Stable version repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
Development repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
How to compile this.
--------------------
1. Look at start of Makefile and set correct values for:
1. libdbm
KERNEL_INCLUDE should point to correct linux kernel include directory.
Default (/usr/src/linux/include) is right as rule.
arpd needs to have the db4 development libraries. For debian
users this is the package with a name like libdb4.x-dev.
arpd needs to have the berkeleydb development libraries. For Debian
users this is the package with a name like libdbX.X-dev.
DBM_INCLUDE points to the directory with db_185.h which
is the include file used by arpd to get to the old format Berkely
is the include file used by arpd to get to the old format Berkeley
database routines. Often this is in the db-devel package.
2. make
The makefile will automatically build a file Config which
contains whether or not ATM is available, etc.
The makefile will automatically build a config.mk file which
contains definitions of libraries that may or may not be available
on the system such as: ATM, ELF, MNL, and SELINUX.
3. To make documentation, cd to doc/ directory , then
look at start of Makefile and set correct values for
PAGESIZE=a4 , ie: a4 , letter ... (string)
PAGESPERPAGE=2 , ie: 1 , 2 ... (numeric)
and make there. It assumes, that latex, dvips and psnup
are in your path.
3. include/uapi
This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different
kernel include files.
Stephen Hemminger
shemminger@osdl.org
stephen@networkplumber.org
Alexey Kuznetsov
kuznet@ms2.inr.ac.ru

View File

@ -1,33 +0,0 @@
Here are a few quick points about DECnet support...
o iproute2 is the tool of choice for configuring the DECnet support for
Linux. For many features, it is the only tool which can be used to
configure them.
o No name resolution is available as yet, all addresses must be
entered numerically.
o Remember to set the hardware address of the interface using:
ip link set ethX address xx:xx:xx:xx:xx:xx
(where xx:xx:xx:xx:xx:xx is the MAC address for your DECnet node
address)
if your Ethernet card won't listen to more than one unicast
mac address at once. If the Linux DECnet stack doesn't talk to
any other DECnet nodes, then check this with tcpdump and if its
a problem, change the mac address (but do this _before_ starting
any other network protocol on the interface)
o Whilst you can use ip addr add to add more than one DECnet address to an
interface, don't expect addresses which are not the same as the
kernels node address to work properly with 2.4 kernels. This should
be fine with 2.6 kernels as the routing code has been extensively
modified and improved.
o The DECnet support is currently self contained. It does not depend on
the libdnet library.
Steve Whitehouse <steve@chygwyn.com>

18
README.devel Normal file
View File

@ -0,0 +1,18 @@
Iproute2 development is closely tied to Linux kernel networking
development. Most new features require a kernel and a utility component.
Please submit both to the Linux networking mailing list
<netdev@vger.kernel.org>
The current source for the stable version is in the git repository:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
The development git repository is available at the following address:
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
The stable repository contains the source corresponding to the
current code in the Linux networking tree (net), which in turn is
aligned on the mainline Linux kernel (ie follows Linus).
The iproute2-next repository tracks the code intended for the next
release; it corresponds with networking development tree (net-next)
in the kernel.

View File

@ -1,95 +0,0 @@
I. About the distribution tables
The table used for "synthesizing" the distribution is essentially a scaled,
translated, inverse to the cumulative distribution function.
Here's how to think about it: Let F() be the cumulative distribution
function for a probability distribution X. We'll assume we've scaled
things so that X has mean 0 and standard deviation 1, though that's not
so important here. Then:
F(x) = P(X <= x) = \int_{-inf}^x f
where f is the probability density function.
F is monotonically increasing, so has an inverse function G, with range
0 to 1. Here, G(t) = the x such that P(X <= x) = t. (In general, G may
have singularities if X has point masses, i.e., points x such that
P(X = x) > 0.)
Now we create a tabular representation of G as follows: Choose some table
size N, and for the ith entry, put in G(i/N). Let's call this table T.
The claim now is, I can create a (discrete) random variable Y whose
distribution has the same approximate "shape" as X, simply by letting
Y = T(U), where U is a discrete uniform random variable with range 1 to N.
To see this, it's enough to show that Y's cumulative distribution function,
(let's call it H), is a discrete approximation to F. But
H(x) = P(Y <= x)
= (# of entries in T <= x) / N -- as Y chosen uniformly from T
= i/N, where i is the largest integer such that G(i/N) <= x
= i/N, where i is the largest integer such that i/N <= F(x)
-- since G and F are inverse functions (and F is
increasing)
= floor(N*F(x))/N
as desired.
II. How to create distribution tables (in theory)
How can we create this table in practice? In some cases, F may have a
simple expression which allows evaluating its inverse directly. The
pareto distribution is one example of this. In other cases, and
especially for matching an experimentally observed distribution, it's
easiest simply to create a table for F and "invert" it. Here, we give
a concrete example, namely how the new "experimental" distribution was
created.
1. Collect enough data points to characterize the distribution. Here, I
collected 25,000 "ping" roundtrip times to a "distant" point (time.nist.gov).
That's far more data than is really necessary, but it was fairly painless to
collect it, so...
2. Normalize the data so that it has mean 0 and standard deviation 1.
3. Determine the cumulative distribution. The code I wrote creates a table
covering the range -10 to +10, with granularity .00005. Obviously, this
is absurdly over-precise, but since it's a one-time only computation, I
figured it hardly mattered.
4. Invert the table: for each table entry F(x) = y, make the y*TABLESIZE
(here, 4096) entry be x*TABLEFACTOR (here, 8192). This creates a table
for the ("normalized") inverse of size TABLESIZE, covering its domain 0
to 1 with granularity 1/TABLESIZE. Note that even with the granularity
used in creating the table for F, it's possible not all the entries in
the table for G will be filled in. So, make a pass through the
inverse's table, filling in any missing entries by linear interpolation.
III. How to create distribution tables (in practice)
If you want to do all this yourself, I've provided several tools to help:
1. maketable does the steps 2-4 above, and then generates the appropriate
header file. So if you have your own time distribution, you can generate
the header simply by:
maketable < time.values > header.h
2. As explained in the other README file, the somewhat sleazy way I have
of generating correlated values needs correction. You can generate your
own correction tables by compiling makesigtable and makemutable with
your header file. Check the Makefile to see how this is done.
3. Warning: maketable, makesigtable and especially makemutable do
enormous amounts of floating point arithmetic. Don't try running
these on an old 486. (NIST Net itself will run fine on such a
system, since in operation, it just needs to do a few simple integral
calculations. But getting there takes some work.)
4. The tables produced are all normalized for mean 0 and standard
deviation 1. How do you know what values to use for real? Here, I've
provided a simple "stats" utility. Give it a series of floating point
values, and it will return their mean (mu), standard deviation (sigma),
and correlation coefficient (rho). You can then plug these values
directly into NIST Net.

View File

@ -1,119 +0,0 @@
iproute2+tc*
It's the first release of Linux traffic control engine.
NOTES.
* csz scheduler is inoperational at the moment, and probably
never will be repaired but replaced with h-pfq scheduler.
* To use "fw" classifier you will need ipfwchains patch.
* No manual available. Ask me, if you have problems (only try to guess
answer yourself at first 8)).
Micro-manual how to start it the first time
-------------------------------------------
A. Attach CBQ to eth1:
tc qdisc add dev eth1 root handle 1: cbq bandwidth 10Mbit allot 1514 cell 8 \
avpkt 1000 mpu 64
B. Add root class:
tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 10Mbit rate 10Mbit \
allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000
C. Add default interactive class:
tc class add dev eth1 parent 1:1 classid 1:2 cbq bandwidth 10Mbit rate 1Mbit \
allot 1514 cell 8 weight 100Kbit prio 3 maxburst 20 avpkt 1000 split 1:0 \
defmap c0
D. Add default class:
tc class add dev eth1 parent 1:1 classid 1:3 cbq bandwidth 10Mbit rate 8Mbit \
allot 1514 cell 8 weight 800Kbit prio 7 maxburst 20 avpkt 1000 split 1:0 \
defmap 3f
etc. etc. etc. Well, it is enough to start 8) The rest can be guessed 8)
Look also at more elaborated example, ready to start rsvpd,
in rsvp/cbqinit.eth1.
Terminology and advices about setting CBQ parameters may be found in Sally Floyd
papers.
Pairs X:Y are class handles, X:0 are qdisc heandles.
weight should be proportional to rate for leaf classes
(I choosed it ten times less, but it is not necessary)
defmap is bitmap of logical priorities served by this class.
E. Another qdiscs are simpler. F.e. let's join TBF on class 1:2
tc qdisc add dev eth1 parent 1:2 tbf rate 64Kbit buffer 5Kb/8 limit 10Kb
F. Look at all that we created:
tc qdisc ls dev eth1
tc class ls dev eth1
G. Install "route" classifier on root of cbq and map destination from realm
1 to class 1:2
tc filter add dev eth1 parent 1:0 protocol ip prio 100 route to 1 classid 1:2
H. Assign routes to 10.11.12.0/24 to realm 1
ip route add 10.11.12.0/24 dev eth1 via whatever realm 1
etc. The same thing can be made with rules.
I still did not test ipchains, but they should work too.
Setup of rsvp and u32 classifiers is more hairy.
If you read RSVP specs, you will understand how rsvp classifier
works easily. What's about u32... That's example:
#! /bin/sh
TC=/home/root/tc
# Setup classifier root on eth1 root (it is cbq)
$TC filter add dev eth1 parent 1:0 prio 5 protocol ip u32
# Create hash table of 256 slots with ID 1:
$TC filter add dev eth1 parent 1:0 prio 5 handle 1: u32 divisor 256
# Add to 6th slot of hash table rule to select tcp/telnet to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (32kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:6: \
match ip dst 193.233.7.75 \
match tcp dst 0x17 0xffff \
flowid 1:4 \
police rate 32kbit buffer 5kb/8 mpu 64 mtu 1514 index 1
# Add to 1th slot of hash table rule to select icmp to 193.233.7.75
# direct it to class 1:4 and prescribe to fall to best effort,
# if traffic violate TBF (10kbit,5K)
$TC filter add dev eth1 parent 1:0 prio 5 u32 ht 1:: \
sample ip protocol 1 0xff \
match ip dst 193.233.7.75 \
flowid 1:4 \
police rate 10kbit buffer 5kb/8 mpu 64 mtu 1514 index 2
# Lookup hash table, if it is not fragmented frame
# Use protocol as hash key
$TC filter add dev eth1 parent 1:0 prio 5 handle ::1 u32 ht 800:: \
match ip nofrag \
offset mask 0x0F00 shift 6 \
hashkey mask 0x00ff0000 at 8 \
link 1:
Alexey Kuznetsov
kuznet@ms2.inr.ac.ru

View File

@ -1,81 +0,0 @@
lnstat - linux networking statistics
(C) 2004 Harald Welte <laforge@gnumonks.org
======================================================================
This tool is a generalized and more feature-complete replacement for the old
'rtstat' program.
In addition to routing cache statistics, it supports any kind of statistics
the linux kernel exports via a file in /proc/net/stat. In a stock 2.6.9
kernel, this is
per-protocol neighbour cache statistics
(ipv4, ipv6, atm, decnet)
routing cache statistics
(ipv4)
connection tracking statistics
(ipv4)
Please note that lnstat will adopt to any additional statistics that might be
added to the kernel at some later point
I personally always like examples more than any reference documentation, so I
list the following examples. If somebody wants to do a manpage, feel free
to send me a patch :)
EXAMPLES:
In order to get a list of supported statistics files, you can run
lnstat -d
It will display something like
/proc/net/stat/arp_cache:
1: entries
2: allocs
3: destroys
[...]
/proc/net/stat/rt_cache:
1: entries
2: in_hit
3: in_slow_tot
You can now select the files/keys you are interested by something like
lnstat -k arp_cache:entries,rt_cache:in_hit,arp_cache:destroys
arp_cach|rt_cache|arp_cach|
entries| in_hit|destroys|
6| 6| 0|
6| 0| 0|
6| 2| 0|
You can specify the interval (e.g. 10 seconds) by:
lnstat -i 10
You can specify to only use one particular statistics file:
lnstat -f ip_conntrack
You can specify individual field widths
lnstat -k arp_cache:entries,rt_cache:entries -w 20,8
You can specify not to print a header at all
lnstat -s 0
You can specify to print a header only at start of the program
lnstat -s 1
You can specify to print a header at start and every 20 lines:
lnstat -s 20
You can specify the number of samples you want to take (e.g. 5):
lnstat -c 5

168
RELNOTES
View File

@ -1,168 +0,0 @@
[020116]
! 1. Compile with rh-7.2
! 2. What the hell some people blame on socklen_t defined in unistd.h? Check.
* Kim Woelders <kim@woelders.dk>, various useful fixups: compilation
with old kernels, cross-compiling, "all" == "any" in prefix spec.
* Collected from my disk, cleaned and packed to directory iproute2/misc/
several utilities: ss, nstat, ifstat, rtacct, arpd and module tcp_diag.
Writing some docs. me.
* prepared patchlet for pidentd to use tcp_diag.
* David Miller: 64bit (and even worse 64bit kernel/32 bit user :-) fixes
to above. tcp_diag is merged to main tree.
* Alexandr D. Kanevskiy <kad@blackcatlinux.com>: various flaws in ss
* Alexandr D. Kanevskiy <kad@blackcatlinux.com>: oops, more aggressive caching
of names opened old bugs: ip started to print garbage in some places.
* Robert Olsson, rt_cache_stat. Renamed to rtstat.
* An old bug in "ip maddr ls": reduntant empty lines in output.
Seeing this crap for ages but lucky match of desire/ability to repair
and a huff about this happened only today. :-)
* "Mr. James W. Laferriere" <babydr@baby-dragons.com>
doc: option to produce ps output for non-a4 and not only 2 pages/sheet.
* Jamal's patch for ingres qdisc.
* Bernd Eckenfels <ecki@lina.inka.de>: deleted orphaned bogus #include
in include/utils.h.
* Julian Anastasov <ja@ssi.bg>: uninitialized fields in nexthop
producing funny "dead" nexthops in multipath routes.
Stupid me, look at the first line in [010803]... Was it difficult to guess
this that time? People blame for several months. :-)
Special thanks to bert hubert <ahu@ds9a.nl> who raised the issue in netdev.
Thanks and apologies to Terry Schmidt <terry@nycwireless.net>,
Ruben Puettmann <ruben.puettmann@freenet-ag.de>,
Mark Ivens <mivens@clara.net>.
* willy tarreau <wtarreau@yahoo.fr>: "make install" target.
* Tunable limit for sch_sfq. Patch to kernel activating this
is about to be submitted. Reminded by Adi Nugroho <Adi@iNterNUX.co.id>.
[010824]
* ip address add sets scope of loopback addreses to "host".
Advised by David Miller.
* ZIP! <zip@killerlabs.com> and David Ford <david@blue-labs.org>
Some strcpy's changed to strncpy's.
* David Ford <david@blue-labs.org>, test for compilation with gcc3.
* David Ford <david@blue-labs.org>. Damn, I broke rtnl_talk in previous
snapshot.
[010803]
* If "dev" is not specified in multipath route, ifindex remained
uninitialized. Grr. Thanks to Kunihiro Ishiguro <kunihiro@zebra.org>.
* Rafal Maszkowski <rzm@icm.edu.pl>, batch mode tc. The most old patch.
* Updates list of data protocol ids.
Lots of reporters. I bring my apologies.
* Jan Rekorajski <baggins@sith.mimuw.edu.pl>. Updated list of datalink types.
* Christina Chen <chenchristina@cwc.nus.edu.sg>. Bug in parsing IPv6 address match in u32.
* Pekka Savola <pekkas@netcore.fi>. ip -6 route flush dev lo stuck
on deleting root of the table.
* Werner. dsmark fixes.
* Alexander Demenshin <aldem-reply@aldem.net>. Old miracleous bug
in ip monitor. It was puzzle, people permanently blame that
it prints some crap.
* Rui Prior <rprior@inescporto.pt>. f_route failed to resolve fromif.
Werner also noticed this and sent patch. Bad place... [RETHINK]
* Kim Woelders <kim@woelders.dk>.
- changes in Makefile for cross-compile
- understand "all" as alias for "any"
- bug in iprule.c
! [ NB. Also he sent patch for kernel. Do not forget! ]
* Werner. Fix to tc core files: wrong exits etc.
* Bernd Jendrissek <berndj@prism.co.za>. Some sanitizations of tc.c
!* Marian Jancar <marian.jancar@infonet.cz>. He say q_tbf prints wrong latency!
! Seems, he is wrong.
* Werner (and Nikolai Vladychevski <niko@isl.net.mx>) check ->print_copts
to avoid segfault.
[001007]
* Compiles under rh-7.0
[000928]
* Sorry. I have lost all the CVS with changes made since 000305.
If someone sent me a patch after this date, please, resubmit.
Restored from the last backup and mailboxes:
* Edit ip-cref.tex by raf <raf2@zip.com.au>.
* RTAX_REORDERING support.
* IFLA_MASTER support.
* Bug in rtnl_talk(), libnetlink.c. Reported by David P. Olshfski
<olshef@us.ibm.com>
[000305]
* Bugs in RESOLVE_HOSTNAMES. Bratislav Ilich <bilik@@zepter.ru>
* ARPHRD_IEEE802_TR
[000225]
* ECN in q_red.c.
[000221]
* diffserv update from Jamal Hadi Salim
* Some bits of IPX from Steve Whitehouse.
* ATM qdisc from Werner Almesberger
* Support for new attributes on routes in linux-2.3.
[991023]
No news, only several bugs are fixed.
* Since ss990630 "ip rule list" printed wrong prefix length.
Vladimir V. Ivanov <vlad@alis.tusur.ru>
* "ip rule" parsed >INT_MAX values of metric incorrectly.
Matthew G. Marsh <mgm@paktronix.com>
* Some improvements in doc/Makefile advised by
Andi Kleen and Werner Almesberger.
[990824]
* new attributes in "ip route": rtt, rttvar, cwnd, ssthresh and advmss.
* some updates in documentaion to reflect new status.
[990630]
* DiffServ support.
Werner Almesberger <almesber@lrc.di.epfl.ch>
Jamal Hadi Salim <hadi@nortelnetworks.com>
* DECnet support.
Steve Whitehouse <SteveW@ACM.org>
* Some minor tweaks in docs and code.
[990530]
* routel script. Stephen R. van den Berg <srb@cuci.nl>
* Bug in tc/q_prio.c resetting priomap. Reported by
Ole Husgaard <sparre@login.dknet.dk> and
Jan Kasprzak <kas@informatics.muni.cz>
* IP command reference manual is published (ip-cref.tex).
I am sorry, but tc-cref.tex is still not ready, to be more
exact the draft does not describe current tc 8-)
* ip, rtmon, rtacct utilities are updated according to manual 8-)
Lots of changes:
- (MAIN) "flush" command for addr, neigh and route.
- error messages are sanitized; now it does not print
usage() page on each error.
- output format is improved.
- "oneline" mode is added.
- etc.
* Name databases; resolution acsii <-> numeric is split out to lib/*
* scripts ifcfg, ifone and rtpr.
* examples/dhcp-client-script is copied from my patch to ISC dhcp.
* Makefile in doc/ directory.
[990417]
* "pmtudisc" flag to "ip tunnel". Phil Karn <karn@ka9q.ampr.org>
* bug in tc/q_tbf.c preventing setting peak_rate, Martin Mares <mj@ucw.cz>
* doc/flowlabels.tex
[990329]
* This snapshot fixes some compatibility problems, which I introduced
occasionally to previous snapshots.
* Namely, "allot" to "tc qdisc add ... cbq" is accepted but ignored.
* Another changes are supposed to be shown in the next snapshot, but
because of troubles with "allot" I am forced to release premature
version. Namely, "cell", "prio", "weight" etc. are optional now.
* doc/ip-tunnels.tex
[990327]
* History was not recorded.
[981002]
* Rani Assaf <rani@magic.metawire.com> contributed resolving
addresses to names.
BEWARE! DO NOT USE THIS OPTION, WHEN REPORTING BUGS IN
IPROUTE OR IN KERENEL. ALL THE BUG REPORTS MUST CONTAIN
ONLY NUMERIC ADDRESSES.
[981101]
* now it should compile for any libc.

1006
bash-completion/devlink Normal file

File diff suppressed because it is too large Load Diff

809
bash-completion/tc Normal file
View File

@ -0,0 +1,809 @@
# tc(8) completion -*- shell-script -*-
# Copyright 2016 6WIND S.A.
# Copyright 2016 Quentin Monnet <quentin.monnet@6wind.com>
QDISC_KIND=' choke codel bfifo pfifo pfifo_head_drop fq fq_codel gred hhf \
mqprio multiq netem pfifo_fast pie fq_pie red rr sfb sfq tbf atm \
cbq drr dsmark hfsc htb prio qfq '
FILTER_KIND=' basic bpf cgroup flow flower fw route rsvp tcindex u32 matchall '
ACTION_KIND=' gact mirred bpf sample '
# Takes a list of words in argument; each one of them is added to COMPREPLY if
# it is not already present on the command line. Returns no value.
_tc_once_attr()
{
local w subcword found
for w in $*; do
found=0
for (( subcword=3; subcword < ${#words[@]}-1; subcword++ )); do
if [[ $w == ${words[subcword]} ]]; then
found=1
break
fi
done
[[ $found -eq 0 ]] && \
COMPREPLY+=( $( compgen -W "$w" -- "$cur" ) )
done
}
# Takes a list of words in argument; each one of them is added to COMPREPLY if
# it is not already present on the command line from the provided index. Returns
# no value.
_tc_once_attr_from()
{
local w subcword found from=$1
shift
for w in $*; do
found=0
for (( subcword=$from; subcword < ${#words[@]}-1; subcword++ )); do
if [[ $w == ${words[subcword]} ]]; then
found=1
break
fi
done
[[ $found -eq 0 ]] && \
COMPREPLY+=( $( compgen -W "$w" -- "$cur" ) )
done
}
# Takes a list of words in argument; adds them all to COMPREPLY if none of them
# is already present on the command line. Returns no value.
_tc_one_of_list()
{
local w subcword
for w in $*; do
for (( subcword=3; subcword < ${#words[@]}-1; subcword++ )); do
[[ $w == ${words[subcword]} ]] && return 1
done
done
COMPREPLY+=( $( compgen -W "$*" -- "$cur" ) )
}
# Takes a list of words in argument; adds them all to COMPREPLY if none of them
# is already present on the command line from the provided index. Returns no
# value.
_tc_one_of_list_from()
{
local w subcword from=$1
shift
for w in $*; do
for (( subcword=$from; subcword < ${#words[@]}-1; subcword++ )); do
[[ $w == ${words[subcword]} ]] && return 1
done
done
COMPREPLY+=( $( compgen -W "$*" -- "$cur" ) )
}
# Returns "$cur ${cur}arg1 ${cur}arg2 ..."
_tc_expand_units()
{
[[ $cur =~ ^[0-9]+ ]] || return 1
local value=${cur%%[^0-9]*}
[[ $cur == $value ]] && echo $cur
echo ${@/#/$value}
}
# Complete based on given word, usually $prev (or possibly the word before),
# for when an argument or an option name has but a few possible arguments (so
# tc does not take particular commands into account here).
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_direct_complete()
{
case $1 in
# Command options
dev)
_available_interfaces
return 0
;;
classid)
return 0
;;
estimator)
local list=$( _tc_expand_units 'secs' 'msecs' 'usecs' )
COMPREPLY+=( $( compgen -W "$list" -- "$cur" ) )
return 0
;;
handle)
return 0
;;
parent|flowid)
local i iface ids cmd
for (( i=3; i < ${#words[@]}-2; i++ )); do
[[ ${words[i]} == dev ]] && iface=${words[i+1]}
break
done
for cmd in qdisc class; do
if [[ -n $iface ]]; then
ids+=$( tc $cmd show dev $iface 2>/dev/null | \
cut -d\ -f 3 )" "
else
ids+=$( tc $cmd show 2>/dev/null | cut -d\ -f 3 )
fi
done
[[ $ids != " " ]] && \
COMPREPLY+=( $( compgen -W "$ids" -- "$cur" ) )
return 0
;;
protocol) # list comes from lib/ll_proto.c
COMPREPLY+=( $( compgen -W ' 802.1Q 802.1ad 802_2 802_3 LLDP aarp \
all aoe arp atalk atmfate atmmpoa ax25 bpq can control cust \
ddcmp dec diag dna_dl dna_rc dna_rt econet ieeepup ieeepupat \
ip ipv4 ipv6 ipx irda lat localtalk loop mobitex ppp_disc \
ppp_mp ppp_ses ppptalk pup pupat rarp sca snap tipc tr_802_2 \
wan_ppp x25' -- "$cur" ) )
return 0
;;
prio)
return 0
;;
stab)
COMPREPLY+=( $( compgen -W 'mtu tsize mpu overhead
linklayer' -- "$cur" ) )
;;
# Qdiscs and classes options
alpha|bands|beta|buckets|corrupt|debug|decrement|default|\
default_index|depth|direct_qlen|divisor|duplicate|ewma|flow_limit|\
flows|hh_limit|increment|indices|linklayer|non_hh_weight|num_tc|\
penalty_burst|penalty_rate|prio|priomap|probability|queues|r2q|\
reorder|vq|vqs)
return 0
;;
setup)
COMPREPLY+=( $( compgen -W 'vqs' -- "$cur" ) )
return 0
;;
hw)
COMPREPLY+=( $( compgen -W '1 0' -- "$cur" ) )
return 0
;;
distribution)
COMPREPLY+=( $( compgen -W 'uniform normal pareto
paretonormal' -- "$cur" ) )
return 0
;;
loss)
COMPREPLY+=( $( compgen -W 'random state gmodel' -- "$cur" ) )
return 0
;;
# Qdiscs and classes options options
gap|gmodel|state)
return 0
;;
# Filters options
map)
COMPREPLY+=( $( compgen -W 'key' -- "$cur" ) )
return 0
;;
hash)
COMPREPLY+=( $( compgen -W 'keys' -- "$cur" ) )
return 0
;;
indev)
_available_interfaces
return 0
;;
eth_type)
COMPREPLY+=( $( compgen -W 'ipv4 ipv6' -- "$cur" ) )
return 0
;;
ip_proto)
COMPREPLY+=( $( compgen -W 'tcp udp' -- "$cur" ) )
return 0
;;
# Filters options options
key|keys)
[[ ${words[@]} =~ graft ]] && return 1
COMPREPLY+=( $( compgen -W 'src dst proto proto-src proto-dst iif \
priority mark nfct nfct-src nfct-dst nfct-proto-src \
nfct-proto-dst rt-classid sk-uid sk-gid vlan-tag rxhash' -- \
"$cur" ) )
return 0
;;
# BPF options - used for filters, actions, and exec
export|bytecode|bytecode-file|object-file)
_filedir
return 0
;;
object-pinned|graft) # Pinned object is probably under /sys/fs/bpf/
[[ -n "$cur" ]] && _filedir && return 0
COMPREPLY=( $( compgen -G "/sys/fs/bpf/*" -- "$cur" ) ) || _filedir
compopt -o nospace
return 0
;;
section)
if (type objdump > /dev/null 2>&1) ; then
local fword objfile section_list
for (( fword=3; fword < ${#words[@]}-3; fword++ )); do
if [[ ${words[fword]} == object-file ]]; then
objfile=${words[fword+1]}
break
fi
done
section_list=$( objdump -h $objfile 2>/dev/null | \
sed -n 's/^ *[0-9]\+ \([^ ]*\) *.*/\1/p' )
COMPREPLY+=( $( compgen -W "$section_list" -- "$cur" ) )
fi
return 0
;;
import|run)
_filedir
return 0
;;
type)
COMPREPLY+=( $( compgen -W 'cls act' -- "$cur" ) )
return 0
;;
# Actions options
random)
_tc_one_of_list 'netrand determ'
return 0
;;
# Units for option arguments
bandwidth|maxrate|peakrate|rate)
local list=$( _tc_expand_units 'bit' \
'kbit' 'kibit' 'kbps' 'kibps' \
'mbit' 'mibit' 'mbps' 'mibps' \
'gbit' 'gibit' 'gbps' 'gibps' \
'tbit' 'tibit' 'tbps' 'tibps' )
COMPREPLY+=( $( compgen -W "$list" -- "$cur" ) )
;;
admit_bytes|avpkt|burst|cell|initial_quantum|limit|max|min|mtu|mpu|\
overhead|quantum|redflowlist)
local list=$( _tc_expand_units \
'b' 'kbit' 'k' 'mbit' 'm' 'gbit' 'g' )
COMPREPLY+=( $( compgen -W "$list" -- "$cur" ) )
;;
db|delay|evict_timeout|interval|latency|perturb|rehash|reset_timeout|\
target|tupdate)
local list=$( _tc_expand_units 'secs' 'msecs' 'usecs' )
COMPREPLY+=( $( compgen -W "$list" -- "$cur" ) )
;;
esac
return 1
}
# Complete with options names for qdiscs. Each qdisc has its own set of options
# and it seems we cannot really parse it from anywhere, so we add it manually
# in this function.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_qdisc_options()
{
case $1 in
choke)
_tc_once_attr 'limit bandwidth ecn min max burst'
return 0
;;
codel)
_tc_once_attr 'limit target interval'
_tc_one_of_list 'ecn noecn'
return 0
;;
bfifo|pfifo|pfifo_head_drop)
_tc_once_attr 'limit'
return 0
;;
fq)
_tc_once_attr 'limit flow_limit quantum initial_quantum maxrate \
buckets'
_tc_one_of_list 'pacing nopacing'
return 0
;;
fq_codel)
_tc_once_attr 'limit flows target interval quantum'
_tc_one_of_list 'ecn noecn'
return 0
;;
gred)
_tc_once_attr 'setup vqs default grio vq prio limit min max avpkt \
burst probability bandwidth ecn harddrop'
return 0
;;
hhf)
_tc_once_attr 'limit quantum hh_limit reset_timeout admit_bytes \
evict_timeout non_hh_weight'
return 0
;;
mqprio)
_tc_once_attr 'num_tc map queues hw'
return 0
;;
netem)
_tc_once_attr 'delay distribution corrupt duplicate loss ecn \
reorder rate'
return 0
;;
pie)
_tc_once_attr 'limit target tupdate alpha beta'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
fq_pie)
_tc_once_attr 'limit flows target tupdate \
alpha beta quantum memory_limit ecn_prob'
_tc_one_of_list 'ecn noecn'
_tc_one_of_list 'bytemode nobytemode'
_tc_one_of_list 'dq_rate_estimator no_dq_rate_estimator'
return 0
;;
red)
_tc_once_attr 'limit min max avpkt burst adaptive probability \
bandwidth ecn harddrop'
return 0
;;
rr|prio)
_tc_once_attr 'bands priomap multiqueue'
return 0
;;
sfb)
_tc_once_attr 'rehash db limit max target increment decrement \
penalty_rate penalty_burst'
return 0
;;
sfq)
_tc_once_attr 'limit perturb quantum divisor flows depth headdrop \
redflowlimit min max avpkt burst probability ecn harddrop'
return 0
;;
tbf)
_tc_once_attr 'limit burst rate mtu peakrate latency overhead \
linklayer'
return 0
;;
cbq)
_tc_once_attr 'bandwidth avpkt mpu cell ewma'
return 0
;;
dsmark)
_tc_once_attr 'indices default_index set_tc_index'
return 0
;;
hfsc)
_tc_once_attr 'default'
return 0
;;
htb)
_tc_once_attr 'default r2q direct_qlen debug'
return 0
;;
multiq|pfifo_fast|atm|drr|qfq)
return 0
;;
esac
return 1
}
# Complete with options names for BPF filters or actions.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_bpf_options()
{
[[ ${words[${#words[@]}-3]} == object-file ]] && \
_tc_once_attr 'section export'
[[ ${words[${#words[@]}-5]} == object-file ]] && \
[[ ${words[${#words[@]}-3]} =~ (section|export) ]] && \
_tc_once_attr 'section export'
_tc_one_of_list 'bytecode bytecode-file object-file object-pinned'
_tc_once_attr 'verbose index direct-action action classid'
return 0
}
# Complete with options names for filter actions.
# This function is recursive, thus allowing multiple actions statement to be
# parsed.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_filter_action_options()
{
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++));
do
if [[ action == ${words[acwd]} ]]; then
_tc_filter_action_options $((acwd+1)) && return 0
fi
done
local action acwd
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++)); do
if [[ $ACTION_KIND =~ ' '${words[acwd]}' ' ]]; then
_tc_one_of_list_from $acwd action
_tc_action_options $acwd && return 0
fi
done
_tc_one_of_list_from $acwd $ACTION_KIND
return 0
}
# Complete with options names for filters.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_filter_options()
{
for ((acwd=$1; acwd < ${#words[@]}-1; acwd++));
do
if [[ action == ${words[acwd]} ]]; then
_tc_filter_action_options $((acwd+1)) && return 0
fi
done
filter=${words[$1]}
case $filter in
basic)
_tc_once_attr 'match action classid'
return 0
;;
bpf)
_tc_bpf_options
return 0
;;
cgroup)
_tc_once_attr 'match action'
return 0
;;
flow)
local i
for (( i=5; i < ${#words[@]}-1; i++ )); do
if [[ ${words[i]} =~ ^keys?$ ]]; then
_tc_direct_complete 'key'
COMPREPLY+=( $( compgen -W 'or and xor rshift addend' -- \
"$cur" ) )
break
fi
done
_tc_once_attr 'map hash divisor baseclass match action'
return 0
;;
matchall)
_tc_once_attr 'action classid skip_sw skip_hw'
return 0
;;
flower)
_tc_once_attr 'action classid indev dst_mac src_mac eth_type \
ip_proto dst_ip src_ip dst_port src_port'
return 0
;;
fw)
_tc_once_attr 'action classid'
return 0
;;
route)
_tc_one_of_list 'from fromif'
_tc_once_attr 'to classid action'
return 0
;;
rsvp)
_tc_once_attr 'ipproto session sender classid action tunnelid \
tunnel flowlabel spi/ah spi/esp u8 u16 u32'
[[ ${words[${#words[@]}-3]} == tunnel ]] && \
COMPREPLY+=( $( compgen -W 'skip' -- "$cur" ) )
[[ ${words[${#words[@]}-3]} =~ u(8|16|32) ]] && \
COMPREPLY+=( $( compgen -W 'mask' -- "$cur" ) )
[[ ${words[${#words[@]}-3]} == mask ]] && \
COMPREPLY+=( $( compgen -W 'at' -- "$cur" ) )
return 0
;;
tcindex)
_tc_once_attr 'hash mask shift classid action'
_tc_one_of_list 'pass_on fall_through'
return 0
;;
u32)
_tc_once_attr 'match link classid action offset ht hashkey sample'
COMPREPLY+=( $( compgen -W 'ip ip6 udp tcp icmp u8 u16 u32 mark \
divisor' -- "$cur" ) )
return 0
;;
esac
return 1
}
# Complete with options names for actions.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_action_options()
{
local from=$1
local action=${words[from]}
case $action in
bpf)
_tc_bpf_options
return 0
;;
mirred)
_tc_one_of_list_from $from 'ingress egress'
_tc_one_of_list_from $from 'mirror redirect'
_tc_once_attr_from $from 'index dev'
return 0
;;
sample)
_tc_once_attr_from $from 'rate'
_tc_once_attr_from $from 'trunc'
_tc_once_attr_from $from 'group'
return 0
;;
gact)
_tc_one_of_list_from $from 'reclassify drop continue pass'
_tc_once_attr_from $from 'random'
return 0
;;
esac
return 1
}
# Complete with options names for exec.
# Returns 0 is completion should stop after running this function, 1 otherwise.
_tc_exec_options()
{
case $1 in
import)
[[ ${words[${#words[@]}-3]} == import ]] && \
_tc_once_attr 'run'
return 0
;;
graft)
COMPREPLY+=( $( compgen -W 'key type' -- "$cur" ) )
[[ ${words[${#words[@]}-3]} == object-file ]] && \
_tc_once_attr 'type'
_tc_bpf_options
return 0
;;
esac
return 1
}
# Main completion function
# Logic is as follows:
# 1. Check if previous word is a global option; if so, propose arguments.
# 2. Check if current word is a global option; if so, propose completion.
# 3. Check for the presence of a main command (qdisc|class|filter|...). If
# there is one, first call _tc_direct_complete to see if previous word is
# waiting for a particular completion. If so, propose completion and exit.
# 4. Extract main command and -- if available -- its subcommand
# (add|delete|show|...).
# 5. Propose completion based on main and sub- command in use. Additional
# functions may be called for qdiscs, classes or filter options.
_tc()
{
local cur prev words cword
_init_completion || return
case $prev in
-V|-Version)
return 0
;;
-b|-batch|-cf|-conf)
_filedir
return 0
;;
-force)
COMPREPLY=( $( compgen -W '-batch' -- "$cur" ) )
return 0
;;
-nm|name)
[[ -r /etc/iproute2/tc_cls ]] || \
COMPREPLY=( $( compgen -W '-conf' -- "$cur" ) )
return 0
;;
-n|-net|-netns)
local nslist=$( ip netns list 2>/dev/null )
COMPREPLY+=( $( compgen -W "$nslist" -- "$cur" ) )
return 0
;;
-tshort)
_tc_once_attr '-statistics'
COMPREPLY+=( $( compgen -W 'monitor' -- "$cur" ) )
return 0
;;
-timestamp)
_tc_once_attr '-statistics -tshort'
COMPREPLY+=( $( compgen -W 'monitor' -- "$cur" ) )
return 0
;;
esac
# Search for main commands
local subcword cmd subcmd
for (( subcword=1; subcword < ${#words[@]}-1; subcword++ )); do
[[ ${words[subcword]} == -b?(atch) ]] && return 0
[[ -n $cmd ]] && subcmd=${words[subcword]} && break
[[ ${words[subcword]} != -* && \
${words[subcword-1]} != -@(n?(et?(ns))|c?(on)f) ]] && \
cmd=${words[subcword]}
done
if [[ -z $cmd ]]; then
case $cur in
-*)
local c='-Version -statistics -details -raw -pretty \
-iec -graphe -batch -name -netns -timestamp'
[[ $cword -eq 1 ]] && c+=' -force'
COMPREPLY=( $( compgen -W "$c" -- "$cur" ) )
return 0
;;
*)
COMPREPLY=( $( compgen -W "help $( tc help 2>&1 | \
command sed \
-e '/OBJECT := /!d' \
-e 's/.*{//' \
-e 's/}.*//' \
-e \ 's/|//g' )" -- "$cur" ) )
return 0
;;
esac
fi
[[ $subcmd == help ]] && return 0
# For this set of commands we may create COMPREPLY just by analysing the
# previous word, if it expects for a specific list of options or values.
if [[ $cmd =~ (qdisc|class|filter|action|exec) ]]; then
_tc_direct_complete $prev && return 0
if [[ ${words[${#words[@]}-3]} == estimator ]]; then
local list=$( _tc_expand_units 'secs' 'msecs' 'usecs' )
COMPREPLY+=( $( compgen -W "$list" -- "$cur" ) ) && return 0
fi
fi
# Completion depends on main command and subcommand in use.
case $cmd in
qdisc)
case $subcmd in
add|change|replace|link|del|delete)
if [[ $(($cword-$subcword)) -eq 1 ]]; then
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local qdisc qdwd
for ((qdwd=$subcword; qdwd < ${#words[@]}-1; qdwd++)); do
if [[ $QDISC_KIND =~ ' '${words[qdwd]}' ' ]]; then
qdisc=${words[qdwd]}
_tc_qdisc_options $qdisc && return 0
fi
done
_tc_one_of_list $QDISC_KIND
_tc_one_of_list 'root ingress parent clsact'
_tc_once_attr 'handle estimator stab'
;;
show)
_tc_once_attr 'dev'
_tc_one_of_list 'ingress clsact'
_tc_once_attr '-statistics -details -raw -pretty -iec \
-graph -name'
;;
help)
return 0
;;
*)
[[ $cword -eq $subcword ]] && \
COMPREPLY=( $( compgen -W 'help add delete change \
replace link show' -- "$cur" ) )
;;
esac
;;
class)
case $subcmd in
add|change|replace|del|delete)
if [[ $(($cword-$subcword)) -eq 1 ]]; then
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local qdisc qdwd
for ((qdwd=$subcword; qdwd < ${#words[@]}-1; qdwd++)); do
if [[ $QDISC_KIND =~ ' '${words[qdwd]}' ' ]]; then
qdisc=${words[qdwd]}
_tc_qdisc_options $qdisc && return 0
fi
done
_tc_one_of_list $QDISC_KIND
_tc_one_of_list 'root parent'
_tc_once_attr 'classid'
;;
show)
_tc_once_attr 'dev'
_tc_one_of_list 'root parent'
_tc_once_attr '-statistics -details -raw -pretty -iec \
-graph -name'
;;
help)
return 0
;;
*)
[[ $cword -eq $subcword ]] && \
COMPREPLY=( $( compgen -W 'help add delete change \
replace show' -- "$cur" ) )
;;
esac
;;
filter)
case $subcmd in
add|change|replace|del|delete)
if [[ $(($cword-$subcword)) -eq 1 ]]; then
COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
return 0
fi
local filter fltwd
for ((fltwd=$subcword; fltwd < ${#words[@]}-1; fltwd++));
do
if [[ $FILTER_KIND =~ ' '${words[fltwd]}' ' ]]; then
_tc_filter_options $fltwd && return 0
fi
done
_tc_one_of_list $FILTER_KIND
_tc_one_of_list 'root ingress egress parent'
_tc_once_attr 'handle estimator pref protocol'
;;
show)
_tc_once_attr 'dev'
_tc_one_of_list 'root ingress egress parent'
_tc_once_attr '-statistics -details -raw -pretty -iec \
-graph -name'
;;
help)
return 0
;;
*)
[[ $cword -eq $subcword ]] && \
COMPREPLY=( $( compgen -W 'help add delete change \
replace show' -- "$cur" ) )
;;
esac
;;
action)
case $subcmd in
add|change|replace)
local action acwd
for ((acwd=$subcword; acwd < ${#words[@]}-1; acwd++)); do
if [[ $ACTION_KIND =~ ' '${words[acwd]}' ' ]]; then
_tc_action_options $acwd && return 0
fi
done
_tc_one_of_list $ACTION_KIND
;;
get|del|delete)
_tc_once_attr 'index'
;;
lst|list|flush|show)
_tc_one_of_list $ACTION_KIND
;;
*)
[[ $cword -eq $subcword ]] && \
COMPREPLY=( $( compgen -W 'help add delete change \
replace show list flush action' -- "$cur" ) )
;;
esac
;;
monitor)
COMPREPLY=( $( compgen -W 'help' -- "$cur" ) )
;;
exec)
case $subcmd in
bpf)
local excmd exwd EXEC_KIND=' import debug graft '
for ((exwd=$subcword; exwd < ${#words[@]}-1; exwd++)); do
if [[ $EXEC_KIND =~ ' '${words[exwd]}' ' ]]; then
excmd=${words[exwd]}
_tc_exec_options $excmd && return 0
fi
done
_tc_one_of_list $EXEC_KIND
;;
*)
[[ $cword -eq $subcword ]] && \
COMPREPLY=( $( compgen -W 'bpf' -- "$cur" ) )
;;
esac
;;
esac
} &&
complete -F _tc tc
# ex: ts=4 sw=4 et filetype=sh

1
bridge/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
bridge

15
bridge/Makefile Normal file
View File

@ -0,0 +1,15 @@
# SPDX-License-Identifier: GPL-2.0
BROBJ = bridge.o fdb.o monitor.o link.o mdb.o vlan.o
include ../config.mk
all: bridge
bridge: $(BROBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
install -m 0755 bridge $(DESTDIR)$(SBINDIR)
clean:
rm -f $(BROBJ) bridge

31
bridge/br_common.h Normal file
View File

@ -0,0 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#define MDB_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(struct br_mdb_entry))))
#define MDB_RTR_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32))))
void print_vlan_info(struct rtattr *tb, int ifindex);
int print_linkinfo(struct nlmsghdr *n, void *arg);
int print_mdb_mon(struct nlmsghdr *n, void *arg);
int print_fdb(struct nlmsghdr *n, void *arg);
void print_stp_state(__u8 state);
int parse_stp_state(const char *arg);
int print_vlan_rtm(struct nlmsghdr *n, void *arg, bool monitor,
bool global_only);
void br_print_router_port_stats(struct rtattr *pattr);
int do_fdb(int argc, char **argv);
int do_mdb(int argc, char **argv);
int do_monitor(int argc, char **argv);
int do_vlan(int argc, char **argv);
int do_link(int argc, char **argv);
extern int preferred_family;
extern int show_stats;
extern int show_details;
extern int timestamp;
extern int compress_vlans;
extern int json;
extern struct rtnl_handle rth;

193
bridge/bridge.c Normal file
View File

@ -0,0 +1,193 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get/set/delete bridge with netlink
*
* Authors: Stephen Hemminger <shemminger@vyatta.com>
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <string.h>
#include <errno.h>
#include "version.h"
#include "utils.h"
#include "br_common.h"
#include "namespace.h"
#include "color.h"
struct rtnl_handle rth = { .fd = -1 };
int preferred_family = AF_UNSPEC;
int oneline;
int show_stats;
int show_details;
static int color;
int compress_vlans;
int json;
int timestamp;
static const char *batch_file;
int force;
static void usage(void) __attribute__((noreturn));
static void usage(void)
{
fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
" bridge [ -force ] -batch filename\n"
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -color -p[retty] -j[son] }\n");
exit(-1);
}
static int do_help(int argc, char **argv)
{
usage();
}
static const struct cmd {
const char *cmd;
int (*func)(int argc, char **argv);
} cmds[] = {
{ "link", do_link },
{ "fdb", do_fdb },
{ "mdb", do_mdb },
{ "vlan", do_vlan },
{ "monitor", do_monitor },
{ "help", do_help },
{ 0 }
};
static int do_cmd(const char *argv0, int argc, char **argv)
{
const struct cmd *c;
for (c = cmds; c->cmd; ++c) {
if (matches(argv0, c->cmd) == 0)
return c->func(argc-1, argv+1);
}
fprintf(stderr,
"Object \"%s\" is unknown, try \"bridge help\".\n", argv0);
return -1;
}
static int br_batch_cmd(int argc, char *argv[], void *data)
{
return do_cmd(argv[0], argc, argv);
}
static int batch(const char *name)
{
int ret;
if (rtnl_open(&rth, 0) < 0) {
fprintf(stderr, "Cannot open rtnetlink\n");
return EXIT_FAILURE;
}
rtnl_set_strict_dump(&rth);
ret = do_batch(name, force, br_batch_cmd, NULL);
rtnl_close(&rth);
return ret;
}
int
main(int argc, char **argv)
{
while (argc > 1) {
const char *opt = argv[1];
if (strcmp(opt, "--") == 0) {
argc--; argv++;
break;
}
if (opt[0] != '-')
break;
if (opt[1] == '-')
opt++;
if (matches(opt, "-help") == 0) {
usage();
} else if (matches(opt, "-Version") == 0) {
printf("bridge utility, %s\n", version);
exit(0);
} else if (matches(opt, "-stats") == 0 ||
matches(opt, "-statistics") == 0) {
++show_stats;
} else if (matches(opt, "-details") == 0) {
++show_details;
} else if (matches(opt, "-oneline") == 0) {
++oneline;
} else if (matches(opt, "-timestamp") == 0) {
++timestamp;
} else if (matches(opt, "-family") == 0) {
argc--;
argv++;
if (argc <= 1)
usage();
if (strcmp(argv[1], "inet") == 0)
preferred_family = AF_INET;
else if (strcmp(argv[1], "inet6") == 0)
preferred_family = AF_INET6;
else if (strcmp(argv[1], "help") == 0)
usage();
else
invarg("invalid protocol family", argv[1]);
} else if (strcmp(opt, "-4") == 0) {
preferred_family = AF_INET;
} else if (strcmp(opt, "-6") == 0) {
preferred_family = AF_INET6;
} else if (matches(opt, "-netns") == 0) {
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
++force;
} else if (matches(opt, "-json") == 0) {
++json;
} else if (matches(opt, "-pretty") == 0) {
++pretty;
} else if (matches(opt, "-batch") == 0) {
argc--;
argv++;
if (argc <= 1)
usage();
batch_file = argv[1];
} else {
fprintf(stderr,
"Option \"%s\" is unknown, try \"bridge help\".\n",
opt);
exit(-1);
}
argc--; argv++;
}
_SL_ = oneline ? "\\" : "\n";
check_enable_color(color, json);
if (batch_file)
return batch(batch_file);
if (rtnl_open(&rth, 0) < 0)
exit(1);
rtnl_set_strict_dump(&rth);
if (argc > 1)
return do_cmd(argv[1], argc-1, argv+1);
rtnl_close(&rth);
usage();
}

695
bridge/fdb.c Normal file
View File

@ -0,0 +1,695 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get/set/delete fdb table with netlink
*
* TODO: merge/replace this with ip neighbour
*
* Authors: Stephen Hemminger <shemminger@vyatta.com>
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <netdb.h>
#include <time.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <net/if.h>
#include <netinet/in.h>
#include <linux/if_bridge.h>
#include <linux/if_ether.h>
#include <linux/neighbour.h>
#include <string.h>
#include <limits.h>
#include <stdbool.h>
#include "json_print.h"
#include "libnetlink.h"
#include "br_common.h"
#include "rt_names.h"
#include "utils.h"
static unsigned int filter_index, filter_dynamic, filter_master,
filter_state, filter_vlan;
static void usage(void)
{
fprintf(stderr,
"Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n"
" [ self ] [ master ] [ use ] [ router ] [ extern_learn ]\n"
" [ sticky ] [ local | static | dynamic ] [ vlan VID ]\n"
" { [ dst IPADDR ] [ port PORT] [ vni VNI ] | [ nhid NHID ] }\n"
" [ via DEV ] [ src_vni VNI ]\n"
" bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ]\n"
" [ state STATE ] [ dynamic ] ]\n"
" bridge fdb get [ to ] LLADDR [ br BRDEV ] { brport | dev } DEV\n"
" [ vlan VID ] [ vni VNI ] [ self ] [ master ] [ dynamic ]\n");
exit(-1);
}
static const char *state_n2a(unsigned int s)
{
static char buf[32];
if (s & NUD_PERMANENT)
return "permanent";
if (s & NUD_NOARP)
return "static";
if (s & NUD_STALE)
return "stale";
if (s & NUD_REACHABLE)
return "";
if (is_json_context())
sprintf(buf, "%#x", s);
else
sprintf(buf, "state=%#x", s);
return buf;
}
static int state_a2n(unsigned int *s, const char *arg)
{
if (matches(arg, "permanent") == 0)
*s = NUD_PERMANENT;
else if (matches(arg, "static") == 0 || matches(arg, "temp") == 0)
*s = NUD_NOARP;
else if (matches(arg, "stale") == 0)
*s = NUD_STALE;
else if (matches(arg, "reachable") == 0 || matches(arg, "dynamic") == 0)
*s = NUD_REACHABLE;
else if (strcmp(arg, "all") == 0)
*s = ~0;
else if (get_unsigned(s, arg, 0))
return -1;
return 0;
}
static void fdb_print_flags(FILE *fp, unsigned int flags)
{
open_json_array(PRINT_JSON,
is_json_context() ? "flags" : "");
if (flags & NTF_SELF)
print_string(PRINT_ANY, NULL, "%s ", "self");
if (flags & NTF_ROUTER)
print_string(PRINT_ANY, NULL, "%s ", "router");
if (flags & NTF_EXT_LEARNED)
print_string(PRINT_ANY, NULL, "%s ", "extern_learn");
if (flags & NTF_OFFLOADED)
print_string(PRINT_ANY, NULL, "%s ", "offload");
if (flags & NTF_MASTER)
print_string(PRINT_ANY, NULL, "%s ", "master");
if (flags & NTF_STICKY)
print_string(PRINT_ANY, NULL, "%s ", "sticky");
close_json_array(PRINT_JSON, NULL);
}
static void fdb_print_stats(FILE *fp, const struct nda_cacheinfo *ci)
{
static int hz;
if (!hz)
hz = get_user_hz();
if (is_json_context()) {
print_uint(PRINT_JSON, "used", NULL,
ci->ndm_used / hz);
print_uint(PRINT_JSON, "updated", NULL,
ci->ndm_updated / hz);
} else {
fprintf(fp, "used %d/%d ", ci->ndm_used / hz,
ci->ndm_updated / hz);
}
}
int print_fdb(struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
struct ndmsg *r = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[NDA_MAX+1];
__u16 vid = 0;
if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
}
len -= NLMSG_LENGTH(sizeof(*r));
if (len < 0) {
fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
return -1;
}
if (r->ndm_family != AF_BRIDGE)
return 0;
if (filter_index && filter_index != r->ndm_ifindex)
return 0;
if (filter_state && !(r->ndm_state & filter_state))
return 0;
parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
if (tb[NDA_VLAN])
vid = rta_getattr_u16(tb[NDA_VLAN]);
if (filter_vlan && filter_vlan != vid)
return 0;
if (filter_dynamic && (r->ndm_state & NUD_PERMANENT))
return 0;
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELNEIGH)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
if (tb[NDA_LLADDR]) {
const char *lladdr;
SPRINT_BUF(b1);
lladdr = ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
RTA_PAYLOAD(tb[NDA_LLADDR]),
ll_index_to_type(r->ndm_ifindex),
b1, sizeof(b1));
print_color_string(PRINT_ANY, COLOR_MAC,
"mac", "%s ", lladdr);
}
if (!filter_index && r->ndm_ifindex) {
print_string(PRINT_FP, NULL, "dev ", NULL);
print_color_string(PRINT_ANY, COLOR_IFNAME,
"ifname", "%s ",
ll_index_to_name(r->ndm_ifindex));
}
if (tb[NDA_DST]) {
int family = AF_INET;
const char *dst;
if (RTA_PAYLOAD(tb[NDA_DST]) == sizeof(struct in6_addr))
family = AF_INET6;
dst = format_host(family,
RTA_PAYLOAD(tb[NDA_DST]),
RTA_DATA(tb[NDA_DST]));
print_string(PRINT_FP, NULL, "dst ", NULL);
print_color_string(PRINT_ANY,
ifa_family_color(family),
"dst", "%s ", dst);
}
if (vid)
print_uint(PRINT_ANY,
"vlan", "vlan %hu ", vid);
if (tb[NDA_PORT])
print_uint(PRINT_ANY,
"port", "port %u ",
rta_getattr_be16(tb[NDA_PORT]));
if (tb[NDA_VNI])
print_uint(PRINT_ANY,
"vni", "vni %u ",
rta_getattr_u32(tb[NDA_VNI]));
if (tb[NDA_SRC_VNI])
print_uint(PRINT_ANY,
"src_vni", "src_vni %u ",
rta_getattr_u32(tb[NDA_SRC_VNI]));
if (tb[NDA_IFINDEX]) {
unsigned int ifindex = rta_getattr_u32(tb[NDA_IFINDEX]);
if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY,
"viaIfIndex", "via ifindex %u ",
ifindex);
else
print_string(PRINT_ANY,
"viaIf", "via %s ",
ll_index_to_name(ifindex));
}
if (tb[NDA_NH_ID])
print_uint(PRINT_ANY, "nhid", "nhid %u ",
rta_getattr_u32(tb[NDA_NH_ID]));
if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY,
"linkNetNsId", "link-netnsid %d ",
rta_getattr_u32(tb[NDA_LINK_NETNSID]));
if (show_stats && tb[NDA_CACHEINFO])
fdb_print_stats(fp, RTA_DATA(tb[NDA_CACHEINFO]));
fdb_print_flags(fp, r->ndm_flags);
if (tb[NDA_MASTER])
print_string(PRINT_ANY, "master", "master %s ",
ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
print_string(PRINT_ANY, "state", "%s\n",
state_n2a(r->ndm_state));
close_json_object();
fflush(fp);
return 0;
}
static int fdb_linkdump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
if (filter_index) {
struct ifinfomsg *ifm = NLMSG_DATA(nlh);
ifm->ifi_index = filter_index;
}
if (filter_master) {
err = addattr32(nlh, reqlen, IFLA_MASTER, filter_master);
if (err)
return err;
}
return 0;
}
static int fdb_dump_filter(struct nlmsghdr *nlh, int reqlen)
{
int err;
if (filter_index) {
struct ndmsg *ndm = NLMSG_DATA(nlh);
ndm->ndm_ifindex = filter_index;
}
if (filter_master) {
err = addattr32(nlh, reqlen, NDA_MASTER, filter_master);
if (err)
return err;
}
return 0;
}
static int fdb_show(int argc, char **argv)
{
char *filter_dev = NULL;
char *br = NULL;
int rc;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
NEXT_ARG();
filter_dev = *argv;
} else if (strcmp(*argv, "br") == 0) {
NEXT_ARG();
br = *argv;
} else if (strcmp(*argv, "vlan") == 0) {
NEXT_ARG();
if (filter_vlan)
duparg("vlan", *argv);
filter_vlan = atoi(*argv);
} else if (strcmp(*argv, "state") == 0) {
unsigned int state;
NEXT_ARG();
if (state_a2n(&state, *argv))
invarg("invalid state", *argv);
filter_state |= state;
} else if (strcmp(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (matches(*argv, "help") == 0)
usage();
}
argc--; argv++;
}
if (br) {
int br_ifindex = ll_name_to_index(br);
if (br_ifindex == 0) {
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
filter_master = br_ifindex;
}
/*we'll keep around filter_dev for older kernels */
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
if (rth.flags & RTNL_HANDLE_F_STRICT_CHK)
rc = rtnl_neighdump_req(&rth, PF_BRIDGE, fdb_dump_filter);
else
rc = rtnl_fdb_linkdump_req_filter_fn(&rth, fdb_linkdump_filter);
if (rc < 0) {
perror("Cannot send dump request");
exit(1);
}
new_json_obj(json);
if (rtnl_dump_filter(&rth, print_fdb, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
delete_json_obj();
fflush(stdout);
return 0;
}
static int fdb_modify(int cmd, int flags, int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ndmsg ndm;
char buf[256];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
.n.nlmsg_flags = NLM_F_REQUEST | flags,
.n.nlmsg_type = cmd,
.ndm.ndm_family = PF_BRIDGE,
.ndm.ndm_state = NUD_NOARP,
};
char *addr = NULL;
char *d = NULL;
char abuf[ETH_ALEN];
int dst_ok = 0;
inet_prefix dst;
unsigned long port = 0;
unsigned long vni = ~0;
unsigned long src_vni = ~0;
unsigned int via = 0;
char *endptr;
short vid = -1;
__u32 nhid = 0;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "dst") == 0) {
NEXT_ARG();
if (dst_ok)
duparg2("dst", *argv);
get_addr(&dst, *argv, preferred_family);
dst_ok = 1;
} else if (strcmp(*argv, "nhid") == 0) {
NEXT_ARG();
if (get_u32(&nhid, *argv, 0))
invarg("\"id\" value is invalid\n", *argv);
} else if (strcmp(*argv, "port") == 0) {
NEXT_ARG();
port = strtoul(*argv, &endptr, 0);
if (endptr && *endptr) {
struct servent *pse;
pse = getservbyname(*argv, "udp");
if (!pse)
invarg("invalid port\n", *argv);
port = ntohs(pse->s_port);
} else if (port > 0xffff)
invarg("invalid port\n", *argv);
} else if (strcmp(*argv, "vni") == 0) {
NEXT_ARG();
vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "src_vni") == 0) {
NEXT_ARG();
src_vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(src_vni >> 24) || src_vni == ULONG_MAX)
invarg("invalid src VNI\n", *argv);
} else if (strcmp(*argv, "via") == 0) {
NEXT_ARG();
via = ll_name_to_index(*argv);
if (!via)
exit(nodev(*argv));
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
req.ndm.ndm_flags |= NTF_MASTER;
} else if (matches(*argv, "router") == 0) {
req.ndm.ndm_flags |= NTF_ROUTER;
} else if (matches(*argv, "local") == 0 ||
matches(*argv, "permanent") == 0) {
req.ndm.ndm_state |= NUD_PERMANENT;
} else if (matches(*argv, "temp") == 0 ||
matches(*argv, "static") == 0) {
req.ndm.ndm_state |= NUD_REACHABLE;
} else if (matches(*argv, "dynamic") == 0) {
req.ndm.ndm_state |= NUD_REACHABLE;
req.ndm.ndm_state &= ~NUD_NOARP;
} else if (matches(*argv, "vlan") == 0) {
if (vid >= 0)
duparg2("vlan", *argv);
NEXT_ARG();
vid = atoi(*argv);
} else if (matches(*argv, "use") == 0) {
req.ndm.ndm_flags |= NTF_USE;
} else if (matches(*argv, "extern_learn") == 0) {
req.ndm.ndm_flags |= NTF_EXT_LEARNED;
} else if (matches(*argv, "sticky") == 0) {
req.ndm.ndm_flags |= NTF_STICKY;
} else {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
if (matches(*argv, "help") == 0)
usage();
if (addr)
duparg2("to", *argv);
addr = *argv;
}
argc--; argv++;
}
if (d == NULL || addr == NULL) {
fprintf(stderr, "Device and address are required arguments.\n");
return -1;
}
if (nhid && (dst_ok || port || vni != ~0)) {
fprintf(stderr, "dst, port, vni are mutually exclusive with nhid\n");
return -1;
}
/* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF;
/* Assume permanent */
if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
req.ndm.ndm_state |= NUD_PERMANENT;
if (sscanf(addr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
abuf, abuf+1, abuf+2,
abuf+3, abuf+4, abuf+5) != 6) {
fprintf(stderr, "Invalid mac address %s\n", addr);
return -1;
}
addattr_l(&req.n, sizeof(req), NDA_LLADDR, abuf, ETH_ALEN);
if (dst_ok)
addattr_l(&req.n, sizeof(req), NDA_DST, &dst.data, dst.bytelen);
if (vid >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vid);
if (nhid > 0)
addattr32(&req.n, sizeof(req), NDA_NH_ID, nhid);
if (port) {
unsigned short dport;
dport = htons((unsigned short)port);
addattr16(&req.n, sizeof(req), NDA_PORT, dport);
}
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (src_vni != ~0)
addattr32(&req.n, sizeof(req), NDA_SRC_VNI, src_vni);
if (via)
addattr32(&req.n, sizeof(req), NDA_IFINDEX, via);
req.ndm.ndm_ifindex = ll_name_to_index(d);
if (!req.ndm.ndm_ifindex)
return nodev(d);
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
static int fdb_get(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ndmsg ndm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_GETNEIGH,
.ndm.ndm_family = AF_BRIDGE,
};
char *d = NULL, *br = NULL;
struct nlmsghdr *answer;
unsigned long vni = ~0;
char abuf[ETH_ALEN];
int br_ifindex = 0;
char *addr = NULL;
short vlan = -1;
char *endptr;
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "br") == 0) {
NEXT_ARG();
br = *argv;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "vni") == 0) {
NEXT_ARG();
vni = strtoul(*argv, &endptr, 0);
if ((endptr && *endptr) ||
(vni >> 24) || vni == ULONG_MAX)
invarg("invalid VNI\n", *argv);
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
req.ndm.ndm_flags |= NTF_MASTER;
} else if (matches(*argv, "vlan") == 0) {
if (vlan >= 0)
duparg2("vlan", *argv);
NEXT_ARG();
vlan = atoi(*argv);
} else if (matches(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
if (matches(*argv, "help") == 0)
usage();
if (addr)
duparg2("to", *argv);
addr = *argv;
}
argc--; argv++;
}
if ((d == NULL && br == NULL) || addr == NULL) {
fprintf(stderr, "Device or master and address are required arguments.\n");
return -1;
}
if (sscanf(addr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
abuf, abuf+1, abuf+2,
abuf+3, abuf+4, abuf+5) != 6) {
fprintf(stderr, "Invalid mac address %s\n", addr);
return -1;
}
addattr_l(&req.n, sizeof(req), NDA_LLADDR, abuf, ETH_ALEN);
if (vlan >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vlan);
if (vni != ~0)
addattr32(&req.n, sizeof(req), NDA_VNI, vni);
if (d) {
req.ndm.ndm_ifindex = ll_name_to_index(d);
if (!req.ndm.ndm_ifindex) {
fprintf(stderr, "Cannot find device \"%s\"\n", d);
return -1;
}
}
if (br) {
br_ifindex = ll_name_to_index(br);
if (!br_ifindex) {
fprintf(stderr, "Cannot find bridge device \"%s\"\n", br);
return -1;
}
addattr32(&req.n, sizeof(req), NDA_MASTER, br_ifindex);
}
if (rtnl_talk(&rth, &req.n, &answer) < 0)
return -2;
/*
* Initialize a json_writer and open an array object
* if -json was specified.
*/
new_json_obj(json);
if (print_fdb(answer, stdout) < 0) {
fprintf(stderr, "An error :-)\n");
return -1;
}
delete_json_obj();
return 0;
}
int do_fdb(int argc, char **argv)
{
ll_init_map(&rth);
if (argc > 0) {
if (matches(*argv, "add") == 0)
return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
if (matches(*argv, "append") == 0)
return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_APPEND, argc-1, argv+1);
if (matches(*argv, "replace") == 0)
return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_REPLACE, argc-1, argv+1);
if (matches(*argv, "delete") == 0)
return fdb_modify(RTM_DELNEIGH, 0, argc-1, argv+1);
if (matches(*argv, "get") == 0)
return fdb_get(argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)
return fdb_show(argc-1, argv+1);
if (matches(*argv, "help") == 0)
usage();
} else
return fdb_show(0, NULL);
fprintf(stderr, "Command \"%s\" is unknown, try \"bridge fdb help\".\n", *argv);
exit(-1);
}

585
bridge/link.c Normal file
View File

@ -0,0 +1,585 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netinet/in.h>
#include <linux/if.h>
#include <linux/if_bridge.h>
#include <string.h>
#include <stdbool.h>
#include "json_print.h"
#include "libnetlink.h"
#include "utils.h"
#include "br_common.h"
static unsigned int filter_index;
static const char *stp_states[] = {
[BR_STATE_DISABLED] = "disabled",
[BR_STATE_LISTENING] = "listening",
[BR_STATE_LEARNING] = "learning",
[BR_STATE_FORWARDING] = "forwarding",
[BR_STATE_BLOCKING] = "blocking",
};
static const char *hw_mode[] = {
"VEB", "VEPA"
};
static void print_link_flags(FILE *fp, unsigned int flags, unsigned int mdown)
{
open_json_array(PRINT_ANY, is_json_context() ? "flags" : "<");
if (flags & IFF_UP && !(flags & IFF_RUNNING))
print_string(PRINT_ANY, NULL,
flags ? "%s," : "%s", "NO-CARRIER");
flags &= ~IFF_RUNNING;
#define _PF(f) if (flags&IFF_##f) { \
flags &= ~IFF_##f ; \
print_string(PRINT_ANY, NULL, flags ? "%s," : "%s", #f); }
_PF(LOOPBACK);
_PF(BROADCAST);
_PF(POINTOPOINT);
_PF(MULTICAST);
_PF(NOARP);
_PF(ALLMULTI);
_PF(PROMISC);
_PF(MASTER);
_PF(SLAVE);
_PF(DEBUG);
_PF(DYNAMIC);
_PF(AUTOMEDIA);
_PF(PORTSEL);
_PF(NOTRAILERS);
_PF(UP);
_PF(LOWER_UP);
_PF(DORMANT);
_PF(ECHO);
#undef _PF
if (flags)
print_hex(PRINT_ANY, NULL, "%x", flags);
if (mdown)
print_string(PRINT_ANY, NULL, ",%s", "M-DOWN");
close_json_array(PRINT_ANY, "> ");
}
void print_stp_state(__u8 state)
{
if (state <= BR_STATE_BLOCKING)
print_string(PRINT_ANY, "state",
"state %s ", stp_states[state]);
else
print_uint(PRINT_ANY, "state",
"state (%d) ", state);
}
int parse_stp_state(const char *arg)
{
size_t nstates = ARRAY_SIZE(stp_states);
int state;
for (state = 0; state < nstates; state++)
if (strcmp(stp_states[state], arg) == 0)
break;
if (state == nstates)
state = -1;
return state;
}
static void print_hwmode(__u16 mode)
{
if (mode >= ARRAY_SIZE(hw_mode))
print_0xhex(PRINT_ANY, "hwmode",
"hwmode %#llx ", mode);
else
print_string(PRINT_ANY, "hwmode",
"hwmode %s ", hw_mode[mode]);
}
static void print_protinfo(FILE *fp, struct rtattr *attr)
{
if (attr->rta_type & NLA_F_NESTED) {
struct rtattr *prtb[IFLA_BRPORT_MAX + 1];
parse_rtattr_nested(prtb, IFLA_BRPORT_MAX, attr);
if (prtb[IFLA_BRPORT_STATE])
print_stp_state(rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
if (prtb[IFLA_BRPORT_PRIORITY])
print_uint(PRINT_ANY, "priority",
"priority %u ",
rta_getattr_u16(prtb[IFLA_BRPORT_PRIORITY]));
if (prtb[IFLA_BRPORT_COST])
print_uint(PRINT_ANY, "cost",
"cost %u ",
rta_getattr_u32(prtb[IFLA_BRPORT_COST]));
if (!show_details)
return;
if (!is_json_context())
fprintf(fp, "%s ", _SL_);
if (prtb[IFLA_BRPORT_MODE])
print_on_off(PRINT_ANY, "hairpin", "hairpin %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
if (prtb[IFLA_BRPORT_GUARD])
print_on_off(PRINT_ANY, "guard", "guard %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
if (prtb[IFLA_BRPORT_PROTECT])
print_on_off(PRINT_ANY, "root_block", "root_block %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
if (prtb[IFLA_BRPORT_FAST_LEAVE])
print_on_off(PRINT_ANY, "fastleave", "fastleave %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
if (prtb[IFLA_BRPORT_LEARNING])
print_on_off(PRINT_ANY, "learning", "learning %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
if (prtb[IFLA_BRPORT_LEARNING_SYNC])
print_on_off(PRINT_ANY, "learning_sync", "learning_sync %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
print_on_off(PRINT_ANY, "flood", "flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_FLOOD])
print_on_off(PRINT_ANY, "mcast_flood", "mcast_flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_TO_UCAST])
print_on_off(PRINT_ANY, "mcast_to_unicast", "mcast_to_unicast %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_TO_UCAST]));
if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS])
print_on_off(PRINT_ANY, "neigh_suppress", "neigh_suppress %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS]));
if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_on_off(PRINT_ANY, "vlan_tunnel", "vlan_tunnel %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
if (prtb[IFLA_BRPORT_BACKUP_PORT]) {
int ifidx;
ifidx = rta_getattr_u32(prtb[IFLA_BRPORT_BACKUP_PORT]);
print_string(PRINT_ANY,
"backup_port", "backup_port %s ",
ll_index_to_name(ifidx));
}
if (prtb[IFLA_BRPORT_ISOLATED])
print_on_off(PRINT_ANY, "isolated", "isolated %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
} else
print_stp_state(rta_getattr_u8(attr));
}
/*
* This is reported by HW devices that have some bridging
* capabilities.
*/
static void print_af_spec(struct rtattr *attr, int ifindex)
{
struct rtattr *aftb[IFLA_BRIDGE_MAX+1];
parse_rtattr_nested(aftb, IFLA_BRIDGE_MAX, attr);
if (aftb[IFLA_BRIDGE_MODE])
print_hwmode(rta_getattr_u16(aftb[IFLA_BRIDGE_MODE]));
if (!show_details)
return;
if (aftb[IFLA_BRIDGE_VLAN_INFO])
print_vlan_info(aftb[IFLA_BRIDGE_VLAN_INFO], ifindex);
}
int print_linkinfo(struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
unsigned int m_flag = 0;
int len = n->nlmsg_len;
const char *name;
len -= NLMSG_LENGTH(sizeof(*ifi));
if (len < 0) {
fprintf(stderr, "Message too short!\n");
return -1;
}
if (!(ifi->ifi_family == AF_BRIDGE || ifi->ifi_family == AF_UNSPEC))
return 0;
if (filter_index && filter_index != ifi->ifi_index)
return 0;
parse_rtattr_flags(tb, IFLA_MAX, IFLA_RTA(ifi), len, NLA_F_NESTED);
name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
if (!name)
return -1;
open_json_object(NULL);
if (n->nlmsg_type == RTM_DELLINK)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
m_flag = print_name_and_link("%s: ", name, tb);
print_link_flags(fp, ifi->ifi_flags, m_flag);
if (tb[IFLA_MTU])
print_int(PRINT_ANY,
"mtu", "mtu %u ",
rta_getattr_u32(tb[IFLA_MTU]));
if (tb[IFLA_MASTER]) {
int master = rta_getattr_u32(tb[IFLA_MASTER]);
print_string(PRINT_ANY, "master", "master %s ",
ll_index_to_name(master));
}
if (tb[IFLA_PROTINFO])
print_protinfo(fp, tb[IFLA_PROTINFO]);
if (tb[IFLA_AF_SPEC])
print_af_spec(tb[IFLA_AF_SPEC], ifi->ifi_index);
print_string(PRINT_FP, NULL, "%s", "\n");
close_json_object();
fflush(fp);
return 0;
}
static void usage(void)
{
fprintf(stderr,
"Usage: bridge link set dev DEV [ cost COST ] [ priority PRIO ] [ state STATE ]\n"
" [ guard {on | off} ]\n"
" [ hairpin {on | off} ]\n"
" [ fastleave {on | off} ]\n"
" [ root_block {on | off} ]\n"
" [ learning {on | off} ]\n"
" [ learning_sync {on | off} ]\n"
" [ flood {on | off} ]\n"
" [ mcast_flood {on | off} ]\n"
" [ mcast_to_unicast {on | off} ]\n"
" [ neigh_suppress {on | off} ]\n"
" [ vlan_tunnel {on | off} ]\n"
" [ isolated {on | off} ]\n"
" [ hwmode {vepa | veb} ]\n"
" [ backup_port DEVICE ] [ nobackup_port ]\n"
" [ self ] [ master ]\n"
" bridge link show [dev DEV]\n");
exit(-1);
}
static int brlink_modify(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct ifinfomsg ifm;
char buf[512];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_SETLINK,
.ifm.ifi_family = PF_BRIDGE,
};
char *d = NULL;
int backup_port_idx = -1;
__s8 neigh_suppress = -1;
__s8 learning = -1;
__s8 learning_sync = -1;
__s8 flood = -1;
__s8 vlan_tunnel = -1;
__s8 mcast_flood = -1;
__s8 mcast_to_unicast = -1;
__s8 isolated = -1;
__s8 hairpin = -1;
__s8 bpdu_guard = -1;
__s8 fast_leave = -1;
__s8 root_block = -1;
__u32 cost = 0;
__s16 priority = -1;
__s8 state = -1;
__s16 mode = -1;
__u16 flags = 0;
struct rtattr *nest;
int ret;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "guard") == 0) {
NEXT_ARG();
bpdu_guard = parse_on_off("guard", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "hairpin") == 0) {
NEXT_ARG();
hairpin = parse_on_off("hairpin", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "fastleave") == 0) {
NEXT_ARG();
fast_leave = parse_on_off("fastleave", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "root_block") == 0) {
NEXT_ARG();
root_block = parse_on_off("root_block", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning") == 0) {
NEXT_ARG();
learning = parse_on_off("learning", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "learning_sync") == 0) {
NEXT_ARG();
learning_sync = parse_on_off("learning_sync", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "flood") == 0) {
NEXT_ARG();
flood = parse_on_off("flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_flood") == 0) {
NEXT_ARG();
mcast_flood = parse_on_off("mcast_flood", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "mcast_to_unicast") == 0) {
NEXT_ARG();
mcast_to_unicast = parse_on_off("mcast_to_unicast", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "cost") == 0) {
NEXT_ARG();
cost = atoi(*argv);
} else if (strcmp(*argv, "priority") == 0) {
NEXT_ARG();
priority = atoi(*argv);
} else if (strcmp(*argv, "state") == 0) {
NEXT_ARG();
char *endptr;
state = strtol(*argv, &endptr, 10);
if (!(**argv != '\0' && *endptr == '\0')) {
state = parse_stp_state(*argv);
if (state == -1) {
fprintf(stderr,
"Error: invalid STP port state\n");
return -1;
}
}
} else if (strcmp(*argv, "hwmode") == 0) {
NEXT_ARG();
flags = BRIDGE_FLAGS_SELF;
if (strcmp(*argv, "vepa") == 0)
mode = BRIDGE_MODE_VEPA;
else if (strcmp(*argv, "veb") == 0)
mode = BRIDGE_MODE_VEB;
else {
fprintf(stderr,
"Mode argument must be \"vepa\" or \"veb\".\n");
return -1;
}
} else if (strcmp(*argv, "self") == 0) {
flags |= BRIDGE_FLAGS_SELF;
} else if (strcmp(*argv, "master") == 0) {
flags |= BRIDGE_FLAGS_MASTER;
} else if (strcmp(*argv, "neigh_suppress") == 0) {
NEXT_ARG();
neigh_suppress = parse_on_off("neigh_suppress", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "vlan_tunnel") == 0) {
NEXT_ARG();
vlan_tunnel = parse_on_off("vlan_tunnel", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "isolated") == 0) {
NEXT_ARG();
isolated = parse_on_off("isolated", *argv, &ret);
if (ret)
return ret;
} else if (strcmp(*argv, "backup_port") == 0) {
NEXT_ARG();
backup_port_idx = ll_name_to_index(*argv);
if (!backup_port_idx) {
fprintf(stderr, "Error: device %s does not exist\n",
*argv);
return -1;
}
} else if (strcmp(*argv, "nobackup_port") == 0) {
backup_port_idx = 0;
} else {
usage();
}
argc--; argv++;
}
if (d == NULL) {
fprintf(stderr, "Device is a required argument.\n");
return -1;
}
req.ifm.ifi_index = ll_name_to_index(d);
if (req.ifm.ifi_index == 0) {
fprintf(stderr, "Cannot find bridge device \"%s\"\n", d);
return -1;
}
/* Nested PROTINFO attribute. Contains: port flags, cost, priority and
* state.
*/
nest = addattr_nest(&req.n, sizeof(req),
IFLA_PROTINFO | NLA_F_NESTED);
/* Flags first */
if (bpdu_guard >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_GUARD, bpdu_guard);
if (hairpin >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MODE, hairpin);
if (fast_leave >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_FAST_LEAVE,
fast_leave);
if (root_block >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_PROTECT, root_block);
if (flood >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_UNICAST_FLOOD, flood);
if (mcast_flood >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_FLOOD,
mcast_flood);
if (mcast_to_unicast >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_MCAST_TO_UCAST,
mcast_to_unicast);
if (learning >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING, learning);
if (learning_sync >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING_SYNC,
learning_sync);
if (cost > 0)
addattr32(&req.n, sizeof(req), IFLA_BRPORT_COST, cost);
if (priority >= 0)
addattr16(&req.n, sizeof(req), IFLA_BRPORT_PRIORITY, priority);
if (state >= 0)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_STATE, state);
if (neigh_suppress != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_NEIGH_SUPPRESS,
neigh_suppress);
if (vlan_tunnel != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_VLAN_TUNNEL,
vlan_tunnel);
if (isolated != -1)
addattr8(&req.n, sizeof(req), IFLA_BRPORT_ISOLATED, isolated);
if (backup_port_idx != -1)
addattr32(&req.n, sizeof(req), IFLA_BRPORT_BACKUP_PORT,
backup_port_idx);
addattr_nest_end(&req.n, nest);
/* IFLA_AF_SPEC nested attribute. Contains IFLA_BRIDGE_FLAGS that
* designates master or self operation and IFLA_BRIDGE_MODE
* for hw 'vepa' or 'veb' operation modes. The hwmodes are
* only valid in 'self' mode on some devices so far.
*/
if (mode >= 0 || flags > 0) {
nest = addattr_nest(&req.n, sizeof(req), IFLA_AF_SPEC);
if (flags > 0)
addattr16(&req.n, sizeof(req), IFLA_BRIDGE_FLAGS, flags);
if (mode >= 0)
addattr16(&req.n, sizeof(req), IFLA_BRIDGE_MODE, mode);
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
static int brlink_show(int argc, char **argv)
{
char *filter_dev = NULL;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if (filter_dev)
duparg("dev", *argv);
filter_dev = *argv;
}
argc--; argv++;
}
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
if (show_details) {
if (rtnl_linkdump_req_filter(&rth, PF_BRIDGE,
(compress_vlans ?
RTEXT_FILTER_BRVLAN_COMPRESSED :
RTEXT_FILTER_BRVLAN)) < 0) {
perror("Cannon send dump request");
exit(1);
}
} else {
if (rtnl_linkdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannon send dump request");
exit(1);
}
}
new_json_obj(json);
if (rtnl_dump_filter(&rth, print_linkinfo, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
delete_json_obj();
fflush(stdout);
return 0;
}
int do_link(int argc, char **argv)
{
ll_init_map(&rth);
if (argc > 0) {
if (matches(*argv, "set") == 0 ||
matches(*argv, "change") == 0)
return brlink_modify(argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)
return brlink_show(argc-1, argv+1);
if (matches(*argv, "help") == 0)
usage();
} else
return brlink_show(0, NULL);
fprintf(stderr, "Command \"%s\" is unknown, try \"bridge link help\".\n", *argv);
exit(-1);
}

585
bridge/mdb.c Normal file
View File

@ -0,0 +1,585 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Get mdb table with netlink
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <net/if.h>
#include <netinet/in.h>
#include <linux/if_bridge.h>
#include <linux/if_ether.h>
#include <string.h>
#include <arpa/inet.h>
#include "libnetlink.h"
#include "utils.h"
#include "br_common.h"
#include "rt_names.h"
#include "json_print.h"
#ifndef MDBA_RTA
#define MDBA_RTA(r) \
((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
#endif
static unsigned int filter_index, filter_vlan;
static void usage(void)
{
fprintf(stderr,
"Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [src SOURCE] [permanent | temp] [vid VID]\n"
" bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
exit(-1);
}
static bool is_temp_mcast_rtr(__u8 type)
{
return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP;
}
static const char *format_timer(__u32 ticks, int align)
{
struct timeval tv;
static char tbuf[32];
__jiffies_to_tv(&tv, ticks);
if (align)
snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
else
snprintf(tbuf, sizeof(tbuf), "%lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
return tbuf;
}
void br_print_router_port_stats(struct rtattr *pattr)
{
struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1];
parse_rtattr(tb, MDBA_ROUTER_PATTR_MAX, MDB_RTR_RTA(RTA_DATA(pattr)),
RTA_PAYLOAD(pattr) - RTA_ALIGN(sizeof(uint32_t)));
if (tb[MDBA_ROUTER_PATTR_TIMER]) {
__u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer, 1));
}
if (tb[MDBA_ROUTER_PATTR_TYPE]) {
__u8 type = rta_getattr_u8(tb[MDBA_ROUTER_PATTR_TYPE]);
print_string(PRINT_ANY, "type", " %s",
is_temp_mcast_rtr(type) ? "temp" : "permanent");
}
}
static void br_print_router_ports(FILE *f, struct rtattr *attr,
const char *brifname)
{
int rem = RTA_PAYLOAD(attr);
struct rtattr *i;
if (is_json_context())
open_json_array(PRINT_JSON, brifname);
else if (!show_stats)
fprintf(f, "router ports on %s: ", brifname);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
uint32_t *port_ifindex = RTA_DATA(i);
const char *port_ifname = ll_index_to_name(*port_ifindex);
if (is_json_context()) {
open_json_object(NULL);
print_string(PRINT_JSON, "port", NULL, port_ifname);
if (show_stats)
br_print_router_port_stats(i);
close_json_object();
} else if (show_stats) {
fprintf(f, "router ports on %s: %s",
brifname, port_ifname);
br_print_router_port_stats(i);
fprintf(f, "\n");
} else {
fprintf(f, "%s ", port_ifname);
}
}
if (!show_stats)
print_nl();
close_json_array(PRINT_JSON, NULL);
}
static void print_src_entry(struct rtattr *src_attr, int af, const char *sep)
{
struct rtattr *stb[MDBA_MDB_SRCATTR_MAX + 1];
SPRINT_BUF(abuf);
const char *addr;
__u32 timer_val;
parse_rtattr_nested(stb, MDBA_MDB_SRCATTR_MAX, src_attr);
if (!stb[MDBA_MDB_SRCATTR_ADDRESS] || !stb[MDBA_MDB_SRCATTR_TIMER])
return;
addr = inet_ntop(af, RTA_DATA(stb[MDBA_MDB_SRCATTR_ADDRESS]), abuf,
sizeof(abuf));
if (!addr)
return;
timer_val = rta_getattr_u32(stb[MDBA_MDB_SRCATTR_TIMER]);
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%s", sep);
print_color_string(PRINT_ANY, ifa_family_color(af),
"address", "%s", addr);
print_string(PRINT_ANY, "timer", "/%s", format_timer(timer_val, 0));
close_json_object();
}
static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
struct nlmsghdr *n, struct rtattr **tb)
{
const void *grp, *src;
const char *addr;
SPRINT_BUF(abuf);
const char *dev;
int af;
if (filter_vlan && e->vid != filter_vlan)
return;
if (!e->addr.proto) {
af = AF_PACKET;
grp = &e->addr.u.mac_addr;
} else if (e->addr.proto == htons(ETH_P_IP)) {
af = AF_INET;
grp = &e->addr.u.ip4;
} else {
af = AF_INET6;
grp = &e->addr.u.ip6;
}
dev = ll_index_to_name(ifindex);
open_json_object(NULL);
print_int(PRINT_JSON, "index", NULL, ifindex);
print_color_string(PRINT_ANY, COLOR_IFNAME, "dev", "dev %s", dev);
print_string(PRINT_ANY, "port", " port %s",
ll_index_to_name(e->ifindex));
/* The ETH_ALEN argument is ignored for all cases but AF_PACKET */
addr = rt_addr_n2a_r(af, ETH_ALEN, grp, abuf, sizeof(abuf));
if (!addr)
return;
print_color_string(PRINT_ANY, ifa_family_color(af),
"grp", " grp %s", addr);
if (tb && tb[MDBA_MDB_EATTR_SOURCE]) {
src = (const void *)RTA_DATA(tb[MDBA_MDB_EATTR_SOURCE]);
print_color_string(PRINT_ANY, ifa_family_color(af),
"src", " src %s",
inet_ntop(af, src, abuf, sizeof(abuf)));
}
print_string(PRINT_ANY, "state", " %s",
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
if (show_details && tb) {
if (tb[MDBA_MDB_EATTR_GROUP_MODE]) {
__u8 mode = rta_getattr_u8(tb[MDBA_MDB_EATTR_GROUP_MODE]);
print_string(PRINT_ANY, "filter_mode", " filter_mode %s",
mode == MCAST_INCLUDE ? "include" :
"exclude");
}
if (tb[MDBA_MDB_EATTR_SRC_LIST]) {
struct rtattr *i, *attr = tb[MDBA_MDB_EATTR_SRC_LIST];
const char *sep = " ";
int rem;
open_json_array(PRINT_ANY, is_json_context() ?
"source_list" :
" source_list");
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem);
i = RTA_NEXT(i, rem)) {
print_src_entry(i, af, sep);
sep = ",";
}
close_json_array(PRINT_JSON, NULL);
}
if (tb[MDBA_MDB_EATTR_RTPROT]) {
__u8 rtprot = rta_getattr_u8(tb[MDBA_MDB_EATTR_RTPROT]);
SPRINT_BUF(rtb);
print_string(PRINT_ANY, "protocol", " proto %s ",
rtnl_rtprot_n2a(rtprot, rtb, sizeof(rtb)));
}
}
open_json_array(PRINT_JSON, "flags");
if (e->flags & MDB_FLAGS_OFFLOAD)
print_string(PRINT_ANY, NULL, " %s", "offload");
if (e->flags & MDB_FLAGS_FAST_LEAVE)
print_string(PRINT_ANY, NULL, " %s", "fast_leave");
if (e->flags & MDB_FLAGS_STAR_EXCL)
print_string(PRINT_ANY, NULL, " %s", "added_by_star_ex");
if (e->flags & MDB_FLAGS_BLOCKED)
print_string(PRINT_ANY, NULL, " %s", "blocked");
close_json_array(PRINT_JSON, NULL);
if (e->vid)
print_uint(PRINT_ANY, "vid", " vid %u", e->vid);
if (show_stats && tb && tb[MDBA_MDB_EATTR_TIMER]) {
__u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s",
format_timer(timer, 1));
}
print_nl();
close_json_object();
}
static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
struct nlmsghdr *n)
{
struct rtattr *etb[MDBA_MDB_EATTR_MAX + 1];
struct br_mdb_entry *e;
struct rtattr *i;
int rem;
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
e = RTA_DATA(i);
parse_rtattr_flags(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)),
NLA_F_NESTED);
print_mdb_entry(f, ifindex, e, n, etb);
}
}
static void print_mdb_entries(FILE *fp, struct nlmsghdr *n,
int ifindex, struct rtattr *mdb)
{
int rem = RTA_PAYLOAD(mdb);
struct rtattr *i;
for (i = RTA_DATA(mdb); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
br_print_mdb_entry(fp, ifindex, i, n);
}
static void print_router_entries(FILE *fp, struct nlmsghdr *n,
int ifindex, struct rtattr *router)
{
const char *brifname = ll_index_to_name(ifindex);
if (n->nlmsg_type == RTM_GETMDB) {
if (show_details)
br_print_router_ports(fp, router, brifname);
} else {
struct rtattr *i = RTA_DATA(router);
uint32_t *port_ifindex = RTA_DATA(i);
const char *port_name = ll_index_to_name(*port_ifindex);
if (is_json_context()) {
open_json_array(PRINT_JSON, brifname);
open_json_object(NULL);
print_string(PRINT_JSON, "port", NULL,
port_name);
close_json_object();
close_json_array(PRINT_JSON, NULL);
} else {
fprintf(fp, "router port dev %s master %s\n",
port_name, brifname);
}
}
}
static int __parse_mdb_nlmsg(struct nlmsghdr *n, struct rtattr **tb)
{
struct br_port_msg *r = NLMSG_DATA(n);
int len = n->nlmsg_len;
if (n->nlmsg_type != RTM_GETMDB &&
n->nlmsg_type != RTM_NEWMDB &&
n->nlmsg_type != RTM_DELMDB) {
fprintf(stderr,
"Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
}
len -= NLMSG_LENGTH(sizeof(*r));
if (len < 0) {
fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
return -1;
}
if (filter_index && filter_index != r->ifindex)
return 0;
parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
return 1;
}
static int print_mdbs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (tb[MDBA_MDB])
print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
return 0;
}
static int print_rtrs(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (tb[MDBA_ROUTER])
print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
return 0;
}
int print_mdb_mon(struct nlmsghdr *n, void *arg)
{
struct br_port_msg *r = NLMSG_DATA(n);
struct rtattr *tb[MDBA_MAX+1];
FILE *fp = arg;
int ret;
ret = __parse_mdb_nlmsg(n, tb);
if (ret != 1)
return ret;
if (n->nlmsg_type == RTM_DELMDB)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
if (tb[MDBA_MDB])
print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
if (tb[MDBA_ROUTER])
print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
return 0;
}
static int mdb_show(int argc, char **argv)
{
char *filter_dev = NULL;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if (filter_dev)
duparg("dev", *argv);
filter_dev = *argv;
} else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG();
if (filter_vlan)
duparg("vid", *argv);
filter_vlan = atoi(*argv);
}
argc--; argv++;
}
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
new_json_obj(json);
open_json_object(NULL);
/* get mdb entries */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
open_json_array(PRINT_JSON, "mdb");
if (rtnl_dump_filter(&rth, print_mdbs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_array(PRINT_JSON, NULL);
/* get router ports */
if (rtnl_mdbdump_req(&rth, PF_BRIDGE) < 0) {
perror("Cannot send dump request");
return -1;
}
open_json_object("router");
if (rtnl_dump_filter(&rth, print_rtrs, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return -1;
}
close_json_object();
close_json_object();
delete_json_obj();
fflush(stdout);
return 0;
}
static int mdb_parse_grp(const char *grp, struct br_mdb_entry *e)
{
if (inet_pton(AF_INET, grp, &e->addr.u.ip4)) {
e->addr.proto = htons(ETH_P_IP);
return 0;
}
if (inet_pton(AF_INET6, grp, &e->addr.u.ip6)) {
e->addr.proto = htons(ETH_P_IPV6);
return 0;
}
if (ll_addr_a2n((char *)e->addr.u.mac_addr, sizeof(e->addr.u.mac_addr),
grp) == ETH_ALEN) {
e->addr.proto = 0;
return 0;
}
return -1;
}
static int mdb_modify(int cmd, int flags, int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct br_port_msg bpm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct br_port_msg)),
.n.nlmsg_flags = NLM_F_REQUEST | flags,
.n.nlmsg_type = cmd,
.bpm.family = PF_BRIDGE,
};
char *d = NULL, *p = NULL, *grp = NULL, *src = NULL;
struct br_mdb_entry entry = {};
short vid = 0;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
} else if (strcmp(*argv, "grp") == 0) {
NEXT_ARG();
grp = *argv;
} else if (strcmp(*argv, "port") == 0) {
NEXT_ARG();
p = *argv;
} else if (strcmp(*argv, "permanent") == 0) {
if (cmd == RTM_NEWMDB)
entry.state |= MDB_PERMANENT;
} else if (strcmp(*argv, "temp") == 0) {
;/* nothing */
} else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG();
vid = atoi(*argv);
} else if (strcmp(*argv, "src") == 0) {
NEXT_ARG();
src = *argv;
} else {
if (matches(*argv, "help") == 0)
usage();
}
argc--; argv++;
}
if (d == NULL || grp == NULL || p == NULL) {
fprintf(stderr, "Device, group address and port name are required arguments.\n");
return -1;
}
req.bpm.ifindex = ll_name_to_index(d);
if (!req.bpm.ifindex)
return nodev(d);
entry.ifindex = ll_name_to_index(p);
if (!entry.ifindex)
return nodev(p);
if (mdb_parse_grp(grp, &entry)) {
fprintf(stderr, "Invalid address \"%s\"\n", grp);
return -1;
}
entry.vid = vid;
addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
if (src) {
struct rtattr *nest = addattr_nest(&req.n, sizeof(req),
MDBA_SET_ENTRY_ATTRS);
struct in6_addr src_ip6;
__be32 src_ip4;
nest->rta_type |= NLA_F_NESTED;
if (!inet_pton(AF_INET, src, &src_ip4)) {
if (!inet_pton(AF_INET6, src, &src_ip6)) {
fprintf(stderr, "Invalid source address \"%s\"\n", src);
return -1;
}
addattr_l(&req.n, sizeof(req), MDBE_ATTR_SOURCE, &src_ip6, sizeof(src_ip6));
} else {
addattr32(&req.n, sizeof(req), MDBE_ATTR_SOURCE, src_ip4);
}
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
int do_mdb(int argc, char **argv)
{
ll_init_map(&rth);
if (argc > 0) {
if (matches(*argv, "add") == 0)
return mdb_modify(RTM_NEWMDB, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
if (matches(*argv, "delete") == 0)
return mdb_modify(RTM_DELMDB, 0, argc-1, argv+1);
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)
return mdb_show(argc-1, argv+1);
if (matches(*argv, "help") == 0)
usage();
} else
return mdb_show(0, NULL);
fprintf(stderr, "Command \"%s\" is unknown, try \"bridge mdb help\".\n", *argv);
exit(-1);
}

160
bridge/monitor.c Normal file
View File

@ -0,0 +1,160 @@
/*
* brmonitor.c "bridge monitor"
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
* Authors: Stephen Hemminger <shemminger@vyatta.com>
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <net/if.h>
#include <netinet/in.h>
#include <linux/if_bridge.h>
#include <linux/neighbour.h>
#include <string.h>
#include "utils.h"
#include "br_common.h"
static void usage(void) __attribute__((noreturn));
static int prefix_banner;
static void usage(void)
{
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | vlan | all]\n");
exit(-1);
}
static int accept_msg(struct rtnl_ctrl_data *ctrl,
struct nlmsghdr *n, void *arg)
{
FILE *fp = arg;
if (timestamp)
print_timestamp(fp);
switch (n->nlmsg_type) {
case RTM_NEWLINK:
case RTM_DELLINK:
if (prefix_banner)
fprintf(fp, "[LINK]");
return print_linkinfo(n, arg);
case RTM_NEWNEIGH:
case RTM_DELNEIGH:
if (prefix_banner)
fprintf(fp, "[NEIGH]");
return print_fdb(n, arg);
case RTM_NEWMDB:
case RTM_DELMDB:
if (prefix_banner)
fprintf(fp, "[MDB]");
return print_mdb_mon(n, arg);
case NLMSG_TSTAMP:
print_nlmsg_timestamp(fp, n);
return 0;
case RTM_NEWVLAN:
case RTM_DELVLAN:
if (prefix_banner)
fprintf(fp, "[VLAN]");
return print_vlan_rtm(n, arg, true, false);
default:
return 0;
}
}
int do_monitor(int argc, char **argv)
{
char *file = NULL;
unsigned int groups = ~RTMGRP_TC;
int llink = 0;
int lneigh = 0;
int lmdb = 0;
int lvlan = 0;
rtnl_close(&rth);
while (argc > 0) {
if (matches(*argv, "file") == 0) {
NEXT_ARG();
file = *argv;
} else if (matches(*argv, "link") == 0) {
llink = 1;
groups = 0;
} else if (matches(*argv, "fdb") == 0) {
lneigh = 1;
groups = 0;
} else if (matches(*argv, "mdb") == 0) {
lmdb = 1;
groups = 0;
} else if (matches(*argv, "vlan") == 0) {
lvlan = 1;
groups = 0;
} else if (strcmp(*argv, "all") == 0) {
groups = ~RTMGRP_TC;
lvlan = 1;
prefix_banner = 1;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
fprintf(stderr, "Argument \"%s\" is unknown, try \"bridge monitor help\".\n", *argv);
exit(-1);
}
argc--; argv++;
}
if (llink)
groups |= nl_mgrp(RTNLGRP_LINK);
if (lneigh) {
groups |= nl_mgrp(RTNLGRP_NEIGH);
}
if (lmdb) {
groups |= nl_mgrp(RTNLGRP_MDB);
}
if (file) {
FILE *fp;
int err;
fp = fopen(file, "r");
if (fp == NULL) {
perror("Cannot fopen");
exit(-1);
}
err = rtnl_from_file(fp, accept_msg, stdout);
fclose(fp);
return err;
}
if (rtnl_open(&rth, groups) < 0)
exit(1);
if (lvlan && rtnl_add_nl_group(&rth, RTNLGRP_BRVLAN) < 0) {
fprintf(stderr, "Failed to add bridge vlan group to list\n");
exit(1);
}
ll_init_map(&rth);
if (rtnl_listen(&rth, accept_msg, stdout) < 0)
exit(2);
return 0;
}

1359
bridge/vlan.c Normal file

File diff suppressed because it is too large Load Diff

603
configure vendored
View File

@ -1,14 +1,33 @@
#! /bin/bash
# This is not an autconf generated configure
#
INCLUDE=${1:-"$PWD/include"}
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# This is not an autoconf generated configure
echo "# Generated config based on" $INCLUDE >Config
INCLUDE="$PWD/include"
PREFIX="/usr"
LIBDIR="\${prefix}/lib"
echo "TC schedulers"
# Output file which is input to Makefile
CONFIG=config.mk
echo -n " ATM "
cat >/tmp/atmtest.c <<EOF
# Make a temp directory in build tree.
TMPDIR=$(mktemp -d config.XXXXXX)
trap 'status=$?; rm -rf $TMPDIR; exit $status' EXIT HUP INT QUIT TERM
check_toolchain()
{
: ${PKG_CONFIG:=pkg-config}
: ${AR=ar}
: ${CC=gcc}
: ${YACC=bison}
echo "PKG_CONFIG:=${PKG_CONFIG}" >>$CONFIG
echo "AR:=${AR}" >>$CONFIG
echo "CC:=${CC}" >>$CONFIG
echo "YACC:=${YACC}" >>$CONFIG
}
check_atm()
{
cat >$TMPDIR/atmtest.c <<EOF
#include <atm.h>
int main(int argc, char **argv) {
struct atm_qos qos;
@ -16,20 +35,60 @@ int main(int argc, char **argv) {
return 0;
}
EOF
gcc -I$INCLUDE -o /tmp/atmtest /tmp/atmtest.c -latm >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "TC_CONFIG_ATM:=y" >>Config
echo yes
else
echo no
fi
rm -f /tmp/atmtest.c /tmp/atmtest
echo -n " IPT "
if $CC -I$INCLUDE -o $TMPDIR/atmtest $TMPDIR/atmtest.c -latm >/dev/null 2>&1; then
echo "TC_CONFIG_ATM:=y" >>$CONFIG
echo yes
else
echo no
fi
rm -f $TMPDIR/atmtest.c $TMPDIR/atmtest
}
#check if we need dont our internal header ..
cat >/tmp/ipttest.c <<EOF
check_xtables()
{
if ! ${PKG_CONFIG} xtables --exists; then
echo "TC_CONFIG_NO_XT:=y" >>$CONFIG
fi
}
check_xt()
{
#check if we have xtables from iptables >= 1.4.5.
cat >$TMPDIR/ipttest.c <<EOF
#include <xtables.h>
#include <linux/netfilter.h>
static struct xtables_globals test_globals = {
.option_offset = 0,
.program_name = "tc-ipt",
.program_version = XTABLES_VERSION,
.orig_opts = NULL,
.opts = NULL,
.exit_err = NULL,
};
int main(int argc, char **argv)
{
xtables_init_all(&test_globals, NFPROTO_IPV4);
return 0;
}
EOF
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL \
$(${PKG_CONFIG} xtables --cflags --libs) -ldl >/dev/null 2>&1; then
echo "TC_CONFIG_XT:=y" >>$CONFIG
echo "using xtables"
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
}
check_xt_old()
{
# bail if previous XT checks has already succeeded.
grep -q TC_CONFIG_XT $CONFIG && return
#check if we don't need our internal header ..
cat >$TMPDIR/ipttest.c <<EOF
#include <xtables.h>
char *lib_dir;
unsigned int global_option_offset = 0;
@ -49,18 +108,21 @@ int main(int argc, char **argv) {
}
EOF
gcc -I$INCLUDE $IPTC -o /tmp/ipttest /tmp/ipttest.c $IPTL -ldl >/dev/null 2>&1
if [ $? -eq 0 ]
then
echo "TC_CONFIG_XT:=y" >>Config
echo "using xtables seems no need for internal.h"
else
echo "failed test 2"
fi
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1; then
echo "TC_CONFIG_XT_OLD:=y" >>$CONFIG
echo "using old xtables (no need for xt-internal.h)"
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
}
#check if we need our own internal.h
cat >/tmp/ipttest.c <<EOF
check_xt_old_internal_h()
{
# bail if previous XT checks has already succeeded.
grep -q TC_CONFIG_XT $CONFIG && return
#check if we need our own internal.h
cat >$TMPDIR/ipttest.c <<EOF
#include <xtables.h>
#include "xt-internal.h"
char *lib_dir;
@ -81,14 +143,481 @@ int main(int argc, char **argv) {
}
EOF
gcc -I$INCLUDE $IPTC -o /tmp/ipttest /tmp/ipttest.c $IPTL -ldl >/dev/null 2>&1
if $CC -I$INCLUDE $IPTC -o $TMPDIR/ipttest $TMPDIR/ipttest.c $IPTL -ldl >/dev/null 2>&1; then
echo "using old xtables with xt-internal.h"
echo "TC_CONFIG_XT_OLD_H:=y" >>$CONFIG
fi
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
}
if [ $? -eq 0 ]
then
echo "using xtables instead of iptables (need for internal.h)"
echo "TC_CONFIG_XT_H:=y" >>Config
check_lib_dir()
{
LIBDIR=$(echo $LIBDIR | sed "s|\${prefix}|$PREFIX|")
echo -n "lib directory: "
echo "$LIBDIR"
echo "LIBDIR:=$LIBDIR" >> $CONFIG
}
check_ipt()
{
if ! grep TC_CONFIG_XT $CONFIG > /dev/null; then
echo "using iptables"
fi
}
check_ipt_lib_dir()
{
IPT_LIB_DIR=$(${PKG_CONFIG} --variable=xtlibdir xtables)
if [ -n "$IPT_LIB_DIR" ]; then
echo $IPT_LIB_DIR
echo "IPT_LIB_DIR:=$IPT_LIB_DIR" >> $CONFIG
return
fi
for dir in /lib /usr/lib /usr/local/lib; do
for file in "xtables" "iptables"; do
file="$dir/$file/lib*t_*so"
if [ -f $file ]; then
echo ${file%/*}
echo "IPT_LIB_DIR:=${file%/*}" >> $CONFIG
return
fi
done
done
echo "not found!"
}
check_setns()
{
cat >$TMPDIR/setnstest.c <<EOF
#include <sched.h>
int main(int argc, char **argv)
{
(void)setns(0,0);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/setnstest $TMPDIR/setnstest.c >/dev/null 2>&1; then
echo "IP_CONFIG_SETNS:=y" >>$CONFIG
echo "yes"
echo "CFLAGS += -DHAVE_SETNS" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/setnstest.c $TMPDIR/setnstest
}
check_name_to_handle_at()
{
cat >$TMPDIR/name_to_handle_at_test.c <<EOF
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
struct file_handle *fhp;
int mount_id, flags, dirfd;
char *pathname;
name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/name_to_handle_at_test $TMPDIR/name_to_handle_at_test.c >/dev/null 2>&1; then
echo "yes"
echo "CFLAGS += -DHAVE_HANDLE_AT" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/name_to_handle_at_test.c $TMPDIR/name_to_handle_at_test
}
check_ipset()
{
cat >$TMPDIR/ipsettest.c <<EOF
#include <linux/netfilter/ipset/ip_set.h>
#ifndef IP_SET_INVALID
#define IPSET_DIM_MAX 3
typedef unsigned short ip_set_id_t;
#endif
#include <linux/netfilter/xt_set.h>
struct xt_set_info info;
#if IPSET_PROTOCOL == 6 || IPSET_PROTOCOL == 7
int main(void)
{
return IPSET_MAXNAMELEN;
}
#else
#error unknown ipset version
#endif
EOF
if $CC -I$INCLUDE -o $TMPDIR/ipsettest $TMPDIR/ipsettest.c >/dev/null 2>&1; then
echo "TC_CONFIG_IPSET:=y" >>$CONFIG
echo "yes"
else
echo "no"
fi
rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
}
check_elf()
{
if ${PKG_CONFIG} libelf --exists; then
echo "HAVE_ELF:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_ELF' `${PKG_CONFIG} libelf --cflags` >> $CONFIG
echo 'LDLIBS += ' `${PKG_CONFIG} libelf --libs` >>$CONFIG
else
echo "no"
fi
}
have_libbpf_basic()
{
cat >$TMPDIR/libbpf_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
bpf_program__set_autoload(NULL, false);
bpf_map__ifindex(NULL);
bpf_map__set_pin_path(NULL, NULL);
bpf_object__open_file(NULL, NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_test $TMPDIR/libbpf_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_test.c $TMPDIR/libbpf_test
return $ret
}
have_libbpf_sec_name()
{
cat >$TMPDIR/libbpf_sec_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
void *ptr;
bpf_program__section_name(NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_sec_test $TMPDIR/libbpf_sec_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_sec_test.c $TMPDIR/libbpf_sec_test
return $ret
}
check_force_libbpf_on()
{
# if set LIBBPF_FORCE=on but no libbpf support, just exist the config
# process to make sure we don't build without libbpf.
if [ "$LIBBPF_FORCE" = on ]; then
echo " LIBBPF_FORCE=on set, but couldn't find a usable libbpf"
exit 1
fi
}
check_libbpf()
{
# if set LIBBPF_FORCE=off, disable libbpf entirely
if [ "$LIBBPF_FORCE" = off ]; then
echo "no"
return
fi
if ! ${PKG_CONFIG} libbpf --exists && [ -z "$LIBBPF_DIR" ] ; then
echo "no"
check_force_libbpf_on
return
fi
if [ $(uname -m) = x86_64 ]; then
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib64"
else
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib"
fi
if [ -n "$LIBBPF_DIR" ]; then
LIBBPF_CFLAGS="-I${LIBBPF_DIR}/usr/include"
LIBBPF_LDLIBS="${LIBBPF_LIBDIR}/libbpf.a -lz -lelf"
LIBBPF_VERSION=$(PKG_CONFIG_LIBDIR=${LIBBPF_LIBDIR}/pkgconfig ${PKG_CONFIG} libbpf --modversion)
else
LIBBPF_CFLAGS=$(${PKG_CONFIG} libbpf --cflags)
LIBBPF_LDLIBS=$(${PKG_CONFIG} libbpf --libs)
LIBBPF_VERSION=$(${PKG_CONFIG} libbpf --modversion)
fi
if ! have_libbpf_basic; then
echo "no"
echo " libbpf version $LIBBPF_VERSION is too low, please update it to at least 0.1.0"
check_force_libbpf_on
return
else
echo "HAVE_LIBBPF:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF ' $LIBBPF_CFLAGS >> $CONFIG
echo "CFLAGS += -DLIBBPF_VERSION=\\\"$LIBBPF_VERSION\\\"" >> $CONFIG
echo 'LDLIBS += ' $LIBBPF_LDLIBS >> $CONFIG
if [ -z "$LIBBPF_DIR" ]; then
echo "CFLAGS += -DLIBBPF_DYNAMIC" >> $CONFIG
fi
fi
# bpf_program__title() is deprecated since libbpf 0.2.0, use
# bpf_program__section_name() instead if we support
if have_libbpf_sec_name; then
echo "HAVE_LIBBPF_SECTION_NAME:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF_SECTION_NAME ' >> $CONFIG
fi
echo "yes"
echo " libbpf version $LIBBPF_VERSION"
}
check_selinux()
# SELinux is a compile time option in the ss utility
{
if ${PKG_CONFIG} libselinux --exists; then
echo "HAVE_SELINUX:=y" >>$CONFIG
echo "yes"
echo 'LDLIBS +=' `${PKG_CONFIG} --libs libselinux` >>$CONFIG
echo 'CFLAGS += -DHAVE_SELINUX' `${PKG_CONFIG} --cflags libselinux` >>$CONFIG
else
echo "no"
fi
}
check_mnl()
{
if ${PKG_CONFIG} libmnl --exists; then
echo "HAVE_MNL:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_LIBMNL' `${PKG_CONFIG} libmnl --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libmnl --libs` >> $CONFIG
else
echo "no"
fi
}
check_berkeley_db()
{
cat >$TMPDIR/dbtest.c <<EOF
#include <fcntl.h>
#include <stdlib.h>
#include <db_185.h>
int main(int argc, char **argv) {
dbopen("/tmp/xxx_test_db.db", O_CREAT|O_RDWR, 0644, DB_HASH, NULL);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/dbtest $TMPDIR/dbtest.c -ldb >/dev/null 2>&1; then
echo "HAVE_BERKELEY_DB:=y" >>$CONFIG
echo "yes"
else
echo "no"
fi
rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
}
check_strlcpy()
{
cat >$TMPDIR/strtest.c <<EOF
#include <string.h>
int main(int argc, char **argv) {
char dst[10];
strlcpy(dst, "test", sizeof(dst));
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/strtest $TMPDIR/strtest.c >/dev/null 2>&1; then
echo "no"
else
if ${PKG_CONFIG} libbsd --exists; then
echo 'CFLAGS += -DHAVE_LIBBSD' `${PKG_CONFIG} libbsd --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libbsd --libs` >> $CONFIG
echo "no"
else
echo 'CFLAGS += -DNEED_STRLCPY' >>$CONFIG
echo "yes"
fi
fi
rm -f $TMPDIR/strtest.c $TMPDIR/strtest
}
check_cap()
{
if ${PKG_CONFIG} libcap --exists; then
echo "HAVE_CAP:=y" >>$CONFIG
echo "yes"
echo 'CFLAGS += -DHAVE_LIBCAP' `${PKG_CONFIG} libcap --cflags` >>$CONFIG
echo 'LDLIBS +=' `${PKG_CONFIG} libcap --libs` >> $CONFIG
else
echo "no"
fi
}
quiet_config()
{
cat <<EOF
# user can control verbosity similar to kernel builds (e.g., V=1)
ifeq ("\$(origin V)", "command line")
VERBOSE = \$(V)
endif
ifndef VERBOSE
VERBOSE = 0
endif
ifeq (\$(VERBOSE),1)
Q =
else
echo "failed test 3 using iptables"
Q = @
endif
ifeq (\$(VERBOSE), 0)
QUIET_CC = @echo ' CC '\$@;
QUIET_AR = @echo ' AR '\$@;
QUIET_LINK = @echo ' LINK '\$@;
QUIET_YACC = @echo ' YACC '\$@;
QUIET_LEX = @echo ' LEX '\$@;
endif
EOF
}
usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
--include_dir <dir> Path to iproute2 include dir
--libdir <dir> Path to iproute2 lib dir
--libbpf_dir <dir> Path to libbpf DESTDIR
--libbpf_force <on|off> Enable/disable libbpf by force. Available options:
on: require link against libbpf, quit config if no libbpf support
off: disable libbpf probing
--prefix <dir> Path prefix of the lib files to install
-h | --help Show this usage info
EOF
exit $1
}
# Compat with the old INCLUDE path setting method.
if [ $# -eq 1 ] && [ "$(echo $1 | cut -c 1)" != '-' ]; then
INCLUDE="$1"
else
while [ "$#" -gt 0 ]; do
case "$1" in
--include_dir)
shift
INCLUDE="$1" ;;
--include_dir=*)
INCLUDE="${1#*=}" ;;
--libdir)
shift
LIBDIR="$1" ;;
--libdir=*)
LIBDIR="${1#*=}" ;;
--libbpf_dir)
shift
LIBBPF_DIR="$1" ;;
--libbpf_dir=*)
LIBBPF_DIR="${1#*=}" ;;
--libbpf_force)
shift
LIBBPF_FORCE="$1" ;;
--libbpf_force=*)
LIBBPF_FORCE="${1#*=}" ;;
--prefix)
shift
PREFIX="$1" ;;
--prefix=*)
PREFIX="${1#*=}" ;;
-h | --help)
usage 0 ;;
--*)
;;
*)
usage 1 ;;
esac
[ "$#" -gt 0 ] && shift
done
fi
rm -f /tmp/ipttest.c /tmp/ipttest
[ -d "$INCLUDE" ] || usage 1
if [ "${LIBBPF_DIR-unused}" != "unused" ]; then
[ -d "$LIBBPF_DIR" ] || usage 1
fi
if [ "${LIBBPF_FORCE-unused}" != "unused" ]; then
if [ "$LIBBPF_FORCE" != 'on' ] && [ "$LIBBPF_FORCE" != 'off' ]; then
usage 1
fi
fi
[ -z "$PREFIX" ] && usage 1
[ -z "$LIBDIR" ] && usage 1
echo "# Generated config based on" $INCLUDE >$CONFIG
quiet_config >> $CONFIG
check_toolchain
echo "TC schedulers"
echo -n " ATM "
check_atm
check_xtables
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n " IPT "
check_xt
check_xt_old
check_xt_old_internal_h
check_ipt
echo -n " IPSET "
check_ipset
fi
echo
check_lib_dir
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n "iptables modules directory: "
check_ipt_lib_dir
fi
echo -n "libc has setns: "
check_setns
echo -n "libc has name_to_handle_at: "
check_name_to_handle_at
echo -n "SELinux support: "
check_selinux
echo -n "libbpf support: "
check_libbpf
echo -n "ELF support: "
check_elf
echo -n "libmnl support: "
check_mnl
echo -n "Berkeley DB: "
check_berkeley_db
echo -n "need for strlcpy: "
check_strlcpy
echo -n "libcap support: "
check_cap
echo >> $CONFIG
echo "%.o: %.c" >> $CONFIG
echo ' $(QUIET_CC)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(CPPFLAGS) -c -o $@ $<' >> $CONFIG

1
dcb/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
dcb

31
dcb/Makefile Normal file
View File

@ -0,0 +1,31 @@
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DCBOBJ = dcb.o \
dcb_app.o \
dcb_buffer.o \
dcb_dcbx.o \
dcb_ets.o \
dcb_maxrate.o \
dcb_pfc.o
TARGETS += dcb
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
dcb: $(DCBOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DCBOBJ) $(TARGETS)

611
dcb/dcb.c Normal file
View File

@ -0,0 +1,611 @@
// SPDX-License-Identifier: GPL-2.0+
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include <libmnl/libmnl.h>
#include <getopt.h>
#include "dcb.h"
#include "mnl_utils.h"
#include "namespace.h"
#include "utils.h"
#include "version.h"
static int dcb_init(struct dcb *dcb)
{
dcb->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (dcb->buf == NULL) {
perror("Netlink buffer allocation");
return -1;
}
dcb->nl = mnlu_socket_open(NETLINK_ROUTE);
if (dcb->nl == NULL) {
perror("Open netlink socket");
goto err_socket_open;
}
new_json_obj_plain(dcb->json_output);
return 0;
err_socket_open:
free(dcb->buf);
return -1;
}
static void dcb_fini(struct dcb *dcb)
{
delete_json_obj_plain();
mnl_socket_close(dcb->nl);
free(dcb->buf);
}
static struct dcb *dcb_alloc(void)
{
struct dcb *dcb;
dcb = calloc(1, sizeof(*dcb));
if (!dcb)
return NULL;
return dcb;
}
static void dcb_free(struct dcb *dcb)
{
free(dcb);
}
struct dcb_get_attribute {
struct dcb *dcb;
int attr;
void *payload;
__u16 payload_len;
};
static int dcb_get_attribute_attr_ieee_cb(const struct nlattr *attr, void *data)
{
struct dcb_get_attribute *ga = data;
if (mnl_attr_get_type(attr) != ga->attr)
return MNL_CB_OK;
ga->payload = mnl_attr_get_payload(attr);
ga->payload_len = mnl_attr_get_payload_len(attr);
return MNL_CB_STOP;
}
static int dcb_get_attribute_attr_cb(const struct nlattr *attr, void *data)
{
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE)
return MNL_CB_OK;
return mnl_attr_parse_nested(attr, dcb_get_attribute_attr_ieee_cb, data);
}
static int dcb_get_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_get_attribute_attr_cb, data);
}
static int dcb_get_attribute_bare_cb(const struct nlmsghdr *nlh, void *data)
{
/* Bare attributes (e.g. DCB_ATTR_DCBX) are not wrapped inside an IEEE
* container, so this does not have to go through unpacking in
* dcb_get_attribute_attr_cb().
*/
return mnl_attr_parse(nlh, sizeof(struct dcbmsg),
dcb_get_attribute_attr_ieee_cb, data);
}
struct dcb_set_attribute_response {
int response_attr;
};
static int dcb_set_attribute_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_set_attribute_response *resp = data;
uint16_t len;
uint8_t err;
if (mnl_attr_get_type(attr) != resp->response_attr)
return MNL_CB_OK;
len = mnl_attr_get_payload_len(attr);
if (len != 1) {
fprintf(stderr, "Response attribute expected to have size 1, not %d\n", len);
return MNL_CB_ERROR;
}
err = mnl_attr_get_u8(attr);
if (err) {
fprintf(stderr, "Error when attempting to set attribute: %s\n",
strerror(err));
return MNL_CB_ERROR;
}
return MNL_CB_STOP;
}
static int dcb_set_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_set_attribute_attr_cb, data);
}
static int dcb_talk(struct dcb *dcb, struct nlmsghdr *nlh, mnl_cb_t cb, void *data)
{
int ret;
ret = mnl_socket_sendto(dcb->nl, nlh, nlh->nlmsg_len);
if (ret < 0) {
perror("mnl_socket_sendto");
return -1;
}
return mnlu_socket_recv_run(dcb->nl, nlh->nlmsg_seq, dcb->buf, MNL_SOCKET_BUFFER_SIZE,
cb, data);
}
static struct nlmsghdr *dcb_prepare(struct dcb *dcb, const char *dev,
uint32_t nlmsg_type, uint8_t dcb_cmd)
{
struct dcbmsg dcbm = {
.cmd = dcb_cmd,
};
struct nlmsghdr *nlh;
nlh = mnlu_msg_prepare(dcb->buf, nlmsg_type, NLM_F_REQUEST, &dcbm, sizeof(dcbm));
mnl_attr_put_strz(nlh, DCB_ATTR_IFNAME, dev);
return nlh;
}
static int __dcb_get_attribute(struct dcb *dcb, int command,
const char *dev, int attr,
void **payload_p, __u16 *payload_len_p,
int (*get_attribute_cb)(const struct nlmsghdr *nlh,
void *data))
{
struct dcb_get_attribute ga;
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_GETDCB, command);
ga = (struct dcb_get_attribute) {
.dcb = dcb,
.attr = attr,
.payload = NULL,
};
ret = dcb_talk(dcb, nlh, get_attribute_cb, &ga);
if (ret) {
perror("Attribute read");
return ret;
}
if (ga.payload == NULL) {
perror("Attribute not found");
return -ENOENT;
}
*payload_p = ga.payload;
*payload_len_p = ga.payload_len;
return 0;
}
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, DCB_CMD_IEEE_GET, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_cb);
}
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, cmd, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_bare_cb);
}
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr, void *data, size_t data_len)
{
__u16 payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, attr, &payload, &payload_len);
if (ret)
return ret;
if (payload_len != data_len) {
fprintf(stderr, "Wrong len %d, expected %zd\n", payload_len, data_len);
return -EINVAL;
}
memcpy(data, payload, data_len);
return 0;
}
static int __dcb_set_attribute(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *, struct nlmsghdr *, void *),
void *data, int response_attr)
{
struct dcb_set_attribute_response resp = {
.response_attr = response_attr,
};
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_SETDCB, command);
ret = cb(dcb, nlh, data);
if (ret)
return ret;
ret = dcb_talk(dcb, nlh, dcb_set_attribute_cb, &resp);
if (ret) {
perror("Attribute write");
return ret;
}
return 0;
}
struct dcb_set_attribute_ieee_cb {
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data);
void *data;
};
static int dcb_set_attribute_ieee_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute_ieee_cb *ieee_data = data;
struct nlattr *nest;
int ret;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE);
ret = ieee_data->cb(dcb, nlh, ieee_data->data);
if (ret)
return ret;
mnl_attr_nest_end(nlh, nest);
return 0;
}
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data)
{
struct dcb_set_attribute_ieee_cb ieee_data = {
.cb = cb,
.data = data,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_ieee_cb, &ieee_data,
DCB_ATTR_IEEE);
}
struct dcb_set_attribute {
int attr;
const void *data;
size_t data_len;
};
static int dcb_set_attribute_put(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute *dsa = data;
mnl_attr_put(nlh, dsa->attr, dsa->data_len, dsa->data);
return 0;
}
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr, const void *data, size_t data_len)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return dcb_set_attribute_va(dcb, DCB_CMD_IEEE_SET, dev,
&dcb_set_attribute_put, &dsa);
}
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_put, &dsa, response_attr);
}
void dcb_print_array_u8(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%d ", i);
print_uint(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_u64(const __u64 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%" PRIu64 " ", i);
print_u64(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_on_off(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_on_off(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < array_size; i++) {
__u8 emt = array[i];
snprintf(b, sizeof(b), "%zd:%%s ", i);
if (emt < kw_size && kw[emt])
print_string(PRINT_ANY, NULL, b, kw[emt]);
else
print_string(PRINT_ANY, NULL, b, "???");
}
}
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t))
{
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
print_array(array, size);
close_json_array(PRINT_JSON, json_name);
}
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data)
{
bool is_all = key == (__u32) -1;
if (!is_all && key > max_key) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%d\n",
what_key, what_value, what_key, max_key);
return -EINVAL;
}
if (value > max_value) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%llu\n",
what_key, what_value, what_value, max_value);
return -EINVAL;
}
if (is_all) {
for (key = 0; key <= max_key; key++)
set_array(key, value, set_array_data);
} else {
set_array(key, value, set_array_data);
}
return 0;
}
void dcb_set_u8(__u32 key, __u64 value, void *data)
{
__u8 *array = data;
array[key] = value;
}
void dcb_set_u32(__u32 key, __u64 value, void *data)
{
__u32 *array = data;
array[key] = value;
}
void dcb_set_u64(__u32 key, __u64 value, void *data)
{
__u64 *array = data;
array[key] = value;
}
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void))
{
const char *dev;
if (!argc || matches(*argv, "help") == 0) {
help();
return 0;
} else if (matches(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
if (check_ifname(dev)) {
invarg("not a valid ifname", *argv);
return -EINVAL;
}
NEXT_ARG_FWD();
return and_then(dcb, dev, argc, argv);
} else {
fprintf(stderr, "Expected `dev DEV', not `%s'", *argv);
help();
return -EINVAL;
}
}
static void dcb_help(void)
{
fprintf(stderr,
"Usage: dcb [ OPTIONS ] OBJECT { COMMAND | help }\n"
" dcb [ -f | --force ] { -b | --batch } filename [ -n | --netns ] netnsname\n"
"where OBJECT := { app | buffer | dcbx | ets | maxrate | pfc }\n"
" OPTIONS := [ -V | --Version | -i | --iec | -j | --json\n"
" | -N | --Numeric | -p | --pretty\n"
" | -s | --statistics | -v | --verbose]\n");
}
static int dcb_cmd(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_help();
return 0;
} else if (matches(*argv, "app") == 0) {
return dcb_cmd_app(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "buffer") == 0) {
return dcb_cmd_buffer(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "dcbx") == 0) {
return dcb_cmd_dcbx(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "ets") == 0) {
return dcb_cmd_ets(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "maxrate") == 0) {
return dcb_cmd_maxrate(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "pfc") == 0) {
return dcb_cmd_pfc(dcb, argc - 1, argv + 1);
}
fprintf(stderr, "Object \"%s\" is unknown\n", *argv);
return -ENOENT;
}
static int dcb_batch_cmd(int argc, char *argv[], void *data)
{
struct dcb *dcb = data;
return dcb_cmd(dcb, argc, argv);
}
static int dcb_batch(struct dcb *dcb, const char *name, bool force)
{
return do_batch(name, force, dcb_batch_cmd, dcb);
}
int main(int argc, char **argv)
{
static const struct option long_options[] = {
{ "Version", no_argument, NULL, 'V' },
{ "force", no_argument, NULL, 'f' },
{ "batch", required_argument, NULL, 'b' },
{ "iec", no_argument, NULL, 'i' },
{ "json", no_argument, NULL, 'j' },
{ "Numeric", no_argument, NULL, 'N' },
{ "pretty", no_argument, NULL, 'p' },
{ "statistics", no_argument, NULL, 's' },
{ "netns", required_argument, NULL, 'n' },
{ "help", no_argument, NULL, 'h' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
bool force = false;
struct dcb *dcb;
int opt;
int err;
int ret;
dcb = dcb_alloc();
if (!dcb) {
fprintf(stderr, "Failed to allocate memory for dcb\n");
return EXIT_FAILURE;
}
while ((opt = getopt_long(argc, argv, "b:fhijn:psvNV",
long_options, NULL)) >= 0) {
switch (opt) {
case 'V':
printf("dcb utility, iproute2-%s\n", version);
ret = EXIT_SUCCESS;
goto dcb_free;
case 'f':
force = true;
break;
case 'b':
batch_file = optarg;
break;
case 'j':
dcb->json_output = true;
break;
case 'N':
dcb->numeric = true;
break;
case 'p':
pretty = true;
break;
case 's':
dcb->stats = true;
break;
case 'n':
if (netns_switch(optarg)) {
ret = EXIT_FAILURE;
goto dcb_free;
}
break;
case 'i':
dcb->use_iec = true;
break;
case 'h':
dcb_help();
ret = EXIT_SUCCESS;
goto dcb_free;
default:
fprintf(stderr, "Unknown option.\n");
dcb_help();
ret = EXIT_FAILURE;
goto dcb_free;
}
}
argc -= optind;
argv += optind;
err = dcb_init(dcb);
if (err) {
ret = EXIT_FAILURE;
goto dcb_free;
}
if (batch_file)
err = dcb_batch(dcb, batch_file, force);
else
err = dcb_cmd(dcb, argc, argv);
if (err) {
ret = EXIT_FAILURE;
goto dcb_fini;
}
ret = EXIT_SUCCESS;
dcb_fini:
dcb_fini(dcb);
dcb_free:
dcb_free(dcb);
return ret;
}

81
dcb/dcb.h Normal file
View File

@ -0,0 +1,81 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __DCB_H__
#define __DCB_H__ 1
#include <libmnl/libmnl.h>
#include <stdbool.h>
#include <stddef.h>
/* dcb.c */
struct dcb {
char *buf;
struct mnl_socket *nl;
bool json_output;
bool stats;
bool use_iec;
bool numeric;
};
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data);
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void));
void dcb_set_u8(__u32 key, __u64 value, void *data);
void dcb_set_u32(__u32 key, __u64 value, void *data);
void dcb_set_u64(__u32 key, __u64 value, void *data);
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr,
void *data, size_t data_len);
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr,
const void *data, size_t data_len);
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data);
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr);
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t));
void dcb_print_array_u8(const __u8 *array, size_t size);
void dcb_print_array_u64(const __u64 *array, size_t size);
void dcb_print_array_on_off(const __u8 *array, size_t size);
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size);
/* dcb_app.c */
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv);
/* dcb_buffer.c */
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv);
/* dcb_dcbx.c */
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv);
/* dcb_ets.c */
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv);
/* dcb_maxrate.c */
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv);
/* dcb_pfc.c */
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv);
#endif /* __DCB_H__ */

795
dcb/dcb_app.c Normal file
View File

@ -0,0 +1,795 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <libmnl/libmnl.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
#include "rt_names.h"
static void dcb_app_help_add(void)
{
fprintf(stderr,
"Usage: dcb app { add | del | replace } dev STRING\n"
" [ default-prio PRIO ]\n"
" [ ethtype-prio ET:PRIO ]\n"
" [ stream-port-prio PORT:PRIO ]\n"
" [ dgram-port-prio PORT:PRIO ]\n"
" [ port-prio PORT:PRIO ]\n"
" [ dscp-prio INTEGER:PRIO ]\n"
"\n"
" where PRIO := { 0 .. 7 }\n"
" ET := { 0x600 .. 0xffff }\n"
" PORT := { 1 .. 65535 }\n"
" DSCP := { 0 .. 63 }\n"
"\n"
);
}
static void dcb_app_help_show_flush(void)
{
fprintf(stderr,
"Usage: dcb app { show | flush } dev STRING\n"
" [ default-prio ]\n"
" [ ethtype-prio ]\n"
" [ stream-port-prio ]\n"
" [ dgram-port-prio ]\n"
" [ port-prio ]\n"
" [ dscp-prio ]\n"
"\n"
);
}
static void dcb_app_help(void)
{
fprintf(stderr,
"Usage: dcb app help\n"
"\n"
);
dcb_app_help_show_flush();
dcb_app_help_add();
}
struct dcb_app_table {
struct dcb_app *apps;
size_t n_apps;
};
static void dcb_app_table_fini(struct dcb_app_table *tab)
{
free(tab->apps);
}
static int dcb_app_table_push(struct dcb_app_table *tab, struct dcb_app *app)
{
struct dcb_app *apps = realloc(tab->apps, (tab->n_apps + 1) * sizeof(*tab->apps));
if (apps == NULL) {
perror("Cannot allocate APP table");
return -ENOMEM;
}
tab->apps = apps;
tab->apps[tab->n_apps++] = *app;
return 0;
}
static void dcb_app_table_remove_existing(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol &&
aa->priority == ab->priority) {
found = true;
break;
}
}
if (!found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static void dcb_app_table_remove_replaced(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool present = false;
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol)
present = true;
else
continue;
if (aa->priority == ab->priority) {
found = true;
break;
}
}
/* Entries that remain in A will be removed, so keep in the
* table only APP entries whose sel/pid is mentioned in B,
* but that do not have the full sel/pid/prio match.
*/
if (present && !found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static int dcb_app_table_copy(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t i;
int ret;
for (i = 0; i < b->n_apps; i++) {
ret = dcb_app_table_push(a, &b->apps[i]);
if (ret != 0)
return ret;
}
return 0;
}
static int dcb_app_cmp(const struct dcb_app *a, const struct dcb_app *b)
{
if (a->protocol < b->protocol)
return -1;
if (a->protocol > b->protocol)
return 1;
return a->priority - b->priority;
}
static int dcb_app_cmp_cb(const void *a, const void *b)
{
return dcb_app_cmp(a, b);
}
static void dcb_app_table_sort(struct dcb_app_table *tab)
{
qsort(tab->apps, tab->n_apps, sizeof(*tab->apps), dcb_app_cmp_cb);
}
struct dcb_app_parse_mapping {
__u8 selector;
struct dcb_app_table *tab;
int err;
};
static void dcb_app_parse_mapping_cb(__u32 key, __u64 value, void *data)
{
struct dcb_app_parse_mapping *pm = data;
struct dcb_app app = {
.selector = pm->selector,
.priority = value,
.protocol = key,
};
if (pm->err)
return;
pm->err = dcb_app_table_push(pm->tab, &app);
}
static int dcb_app_parse_mapping_ethtype_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key < 0x600) {
fprintf(stderr, "Protocol IDs < 0x600 are reserved for EtherType\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("ETHTYPE", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_dscp(__u32 *key, const char *arg)
{
if (parse_mapping_num_all(key, arg) == 0)
return 0;
if (rtnl_dsfield_a2n(key, arg) != 0)
return -1;
if (*key & 0x03) {
fprintf(stderr, "The values `%s' uses non-DSCP bits.\n", arg);
return -1;
}
/* Unshift the value to convert it from dsfield to DSCP. */
*key >>= 2;
return 0;
}
static int dcb_app_parse_mapping_dscp_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("DSCP", key, 63,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_mapping_port_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key == 0) {
fprintf(stderr, "Port ID of 0 is invalid\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("PORT", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_default_prio(int *argcp, char ***argvp, struct dcb_app_table *tab)
{
int argc = *argcp;
char **argv = *argvp;
int ret = 0;
while (argc > 0) {
struct dcb_app app;
__u8 prio;
if (get_u8(&prio, *argv, 0)) {
ret = 1;
break;
}
app = (struct dcb_app){
.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE,
.protocol = 0,
.priority = prio,
};
ret = dcb_app_table_push(tab, &app);
if (ret != 0)
break;
argc--, argv++;
}
*argcp = argc;
*argvp = argv;
return ret;
}
static bool dcb_app_is_ethtype(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol != 0;
}
static bool dcb_app_is_default(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol == 0;
}
static bool dcb_app_is_dscp(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DSCP;
}
static bool dcb_app_is_stream_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_STREAM;
}
static bool dcb_app_is_dgram_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DGRAM;
}
static bool dcb_app_is_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ANY;
}
static int dcb_app_print_key_dec(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static int dcb_app_print_key_hex(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%x:", protocol);
}
static int dcb_app_print_key_dscp(__u16 protocol)
{
const char *name = rtnl_dsfield_get_name(protocol << 2);
if (!is_json_context() && name != NULL)
return print_string(PRINT_FP, NULL, "%s:", name);
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static void dcb_app_print_filtered(const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *),
int (*print_key)(__u16 protocol),
const char *json_name,
const char *fp_name)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
struct dcb_app *app = &tab->apps[i];
if (!filter(app))
continue;
if (first) {
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
first = false;
}
open_json_array(PRINT_JSON, NULL);
print_key(app->protocol);
print_uint(PRINT_ANY, NULL, "%d ", app->priority);
close_json_array(PRINT_JSON, NULL);
}
if (!first) {
close_json_array(PRINT_JSON, json_name);
print_nl();
}
}
static void dcb_app_print_ethtype_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_ethtype, dcb_app_print_key_hex,
"ethtype_prio", "ethtype-prio");
}
static void dcb_app_print_dscp_prio(const struct dcb *dcb,
const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dscp,
dcb->numeric ? dcb_app_print_key_dec
: dcb_app_print_key_dscp,
"dscp_prio", "dscp-prio");
}
static void dcb_app_print_stream_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_stream_port, dcb_app_print_key_dec,
"stream_port_prio", "stream-port-prio");
}
static void dcb_app_print_dgram_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dgram_port, dcb_app_print_key_dec,
"dgram_port_prio", "dgram-port-prio");
}
static void dcb_app_print_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_port, dcb_app_print_key_dec,
"port_prio", "port-prio");
}
static void dcb_app_print_default_prio(const struct dcb_app_table *tab)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
if (!dcb_app_is_default(&tab->apps[i]))
continue;
if (first) {
open_json_array(PRINT_JSON, "default_prio");
print_string(PRINT_FP, NULL, "default-prio ", NULL);
first = false;
}
print_uint(PRINT_ANY, NULL, "%d ", tab->apps[i].priority);
}
if (!first) {
close_json_array(PRINT_JSON, "default_prio");
print_nl();
}
}
static void dcb_app_print(const struct dcb *dcb, const struct dcb_app_table *tab)
{
dcb_app_print_ethtype_prio(tab);
dcb_app_print_default_prio(tab);
dcb_app_print_dscp_prio(dcb, tab);
dcb_app_print_stream_port_prio(tab);
dcb_app_print_dgram_port_prio(tab);
dcb_app_print_port_prio(tab);
}
static int dcb_app_get_table_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_app_table *tab = data;
struct dcb_app *app;
int ret;
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE_APP) {
fprintf(stderr, "Unknown attribute in DCB_ATTR_IEEE_APP_TABLE: %d\n",
mnl_attr_get_type(attr));
return MNL_CB_OK;
}
if (mnl_attr_get_payload_len(attr) < sizeof(struct dcb_app)) {
fprintf(stderr, "DCB_ATTR_IEEE_APP payload expected to have size %zd, not %d\n",
sizeof(struct dcb_app), mnl_attr_get_payload_len(attr));
return MNL_CB_OK;
}
app = mnl_attr_get_payload(attr);
ret = dcb_app_table_push(tab, app);
if (ret != 0)
return MNL_CB_ERROR;
return MNL_CB_OK;
}
static int dcb_app_get(struct dcb *dcb, const char *dev, struct dcb_app_table *tab)
{
uint16_t payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, DCB_ATTR_IEEE_APP_TABLE, &payload, &payload_len);
if (ret != 0)
return ret;
ret = mnl_attr_parse_payload(payload, payload_len, dcb_app_get_table_attr_cb, tab);
if (ret != MNL_CB_OK)
return -EINVAL;
return 0;
}
struct dcb_app_add_del {
const struct dcb_app_table *tab;
bool (*filter)(const struct dcb_app *app);
};
static int dcb_app_add_del_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_app_add_del *add_del = data;
struct nlattr *nest;
size_t i;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE_APP_TABLE);
for (i = 0; i < add_del->tab->n_apps; i++) {
const struct dcb_app *app = &add_del->tab->apps[i];
if (add_del->filter == NULL || add_del->filter(app))
mnl_attr_put(nlh, DCB_ATTR_IEEE_APP, sizeof(*app), app);
}
mnl_attr_nest_end(nlh, nest);
return 0;
}
static int dcb_app_add_del(struct dcb *dcb, const char *dev, int command,
const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *))
{
struct dcb_app_add_del add_del = {
.tab = tab,
.filter = filter,
};
if (tab->n_apps == 0)
return 0;
return dcb_set_attribute_va(dcb, command, dev, dcb_app_add_del_cb, &add_del);
}
static int dcb_cmd_app_parse_add_del(struct dcb *dcb, const char *dev,
int argc, char **argv, struct dcb_app_table *tab)
{
struct dcb_app_parse_mapping pm = {
.tab = tab,
};
int ret;
if (!argc) {
dcb_app_help_add();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_add();
return 0;
} else if (matches(*argv, "ethtype-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_ethtype_prio,
&pm);
} else if (matches(*argv, "default-prio") == 0) {
NEXT_ARG();
ret = dcb_app_parse_default_prio(&argc, &argv, pm.tab);
if (ret != 0) {
fprintf(stderr, "Invalid default priority %s\n", *argv);
return ret;
}
} else if (matches(*argv, "dscp-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DSCP;
ret = parse_mapping_gen(&argc, &argv,
&dcb_app_parse_dscp,
&dcb_app_parse_mapping_dscp_prio,
&pm);
} else if (matches(*argv, "stream-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_STREAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "dgram-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ANY;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_add();
return -EINVAL;
}
if (ret != 0) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
if (pm.err)
return pm.err;
} while (argc > 0);
return 0;
}
static int dcb_cmd_app_add(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_del(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
dcb_app_table_sort(&tab);
open_json_object(NULL);
if (!argc) {
dcb_app_print(dcb, &tab);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
dcb_app_print_ethtype_prio(&tab);
} else if (matches(*argv, "dscp-prio") == 0) {
dcb_app_print_dscp_prio(dcb, &tab);
} else if (matches(*argv, "stream-port-prio") == 0) {
dcb_app_print_stream_port_prio(&tab);
} else if (matches(*argv, "dgram-port-prio") == 0) {
dcb_app_print_dgram_port_prio(&tab);
} else if (matches(*argv, "port-prio") == 0) {
dcb_app_print_port_prio(&tab);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_flush(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
if (!argc) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_ethtype);
if (ret != 0)
goto out;
} else if (matches(*argv, "default-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_default);
if (ret != 0)
goto out;
} else if (matches(*argv, "dscp-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_dscp);
if (ret != 0)
goto out;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_replace(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table orig = {};
struct dcb_app_table tab = {};
struct dcb_app_table new = {};
int ret;
ret = dcb_app_get(dcb, dev, &orig);
if (ret != 0)
return ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
goto out;
/* Attempts to add an existing entry would be rejected, so drop
* these entries from tab.
*/
ret = dcb_app_table_copy(&new, &tab);
if (ret != 0)
goto out;
dcb_app_table_remove_existing(&new, &orig);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &new, NULL);
if (ret != 0) {
fprintf(stderr, "Could not add new APP entries\n");
goto out;
}
/* Remove the obsolete entries. */
dcb_app_table_remove_replaced(&orig, &tab);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &orig, NULL);
if (ret != 0) {
fprintf(stderr, "Could not remove replaced APP entries\n");
goto out;
}
out:
dcb_app_table_fini(&new);
dcb_app_table_fini(&tab);
dcb_app_table_fini(&orig);
return 0;
}
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_app_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_show, dcb_app_help_show_flush);
} else if (matches(*argv, "flush") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_flush, dcb_app_help_show_flush);
} else if (matches(*argv, "add") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_add, dcb_app_help_add);
} else if (matches(*argv, "del") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_del, dcb_app_help_add);
} else if (matches(*argv, "replace") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_replace, dcb_app_help_add);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help();
return -EINVAL;
}
}

235
dcb/dcb_buffer.c Normal file
View File

@ -0,0 +1,235 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_buffer_help_set(void)
{
fprintf(stderr,
"Usage: dcb buffer set dev STRING\n"
" [ prio-buffer PRIO-MAP ]\n"
" [ buffer-size SIZE-MAP ]\n"
"\n"
" where PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:BUFFER\n"
" SIZE-MAP := [ SIZE-MAP ] SIZE-MAPPING\n"
" SIZE-MAPPING := { all | BUFFER }:INTEGER\n"
" PRIO := { 0 .. 7 }\n"
" BUFFER := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_buffer_help_show(void)
{
fprintf(stderr,
"Usage: dcb buffer show dev STRING\n"
" [ prio-buffer ] [ buffer-size ] [ total-size ]\n"
"\n"
);
}
static void dcb_buffer_help(void)
{
fprintf(stderr,
"Usage: dcb buffer help\n"
"\n"
);
dcb_buffer_help_show();
dcb_buffer_help_set();
}
static int dcb_buffer_parse_mapping_prio_buffer(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
__u8 buf;
if (get_u8(&buf, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021Q_MAX_PRIORITIES - 1,
"BUFFER", buf, DCBX_MAX_BUFFERS - 1,
dcb_set_u8, buffer->prio2buffer);
}
static int dcb_buffer_parse_mapping_buffer_size(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
unsigned int size;
if (get_size(&size, value)) {
fprintf(stderr, "%d:%s: Illegal value for buffer size\n", key, value);
return -EINVAL;
}
return dcb_parse_mapping("BUFFER", key, DCBX_MAX_BUFFERS - 1,
"INTEGER", size, -1,
dcb_set_u32, buffer->buffer_size);
}
static void dcb_buffer_print_total_size(const struct dcbnl_buffer *buffer)
{
print_size(PRINT_ANY, "total_size", "total-size %s ", buffer->total_size);
}
static void dcb_buffer_print_prio_buffer(const struct dcbnl_buffer *buffer)
{
dcb_print_named_array("prio_buffer", "prio-buffer",
buffer->prio2buffer, ARRAY_SIZE(buffer->prio2buffer),
dcb_print_array_u8);
}
static void dcb_buffer_print_buffer_size(const struct dcbnl_buffer *buffer)
{
size_t size = ARRAY_SIZE(buffer->buffer_size);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "buffer_size");
print_string(PRINT_FP, NULL, "buffer-size ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_size(PRINT_ANY, NULL, b, buffer->buffer_size[i]);
}
close_json_array(PRINT_JSON, "buffer_size");
}
static void dcb_buffer_print(const struct dcbnl_buffer *buffer)
{
dcb_buffer_print_prio_buffer(buffer);
print_nl();
dcb_buffer_print_buffer_size(buffer);
print_nl();
dcb_buffer_print_total_size(buffer);
print_nl();
}
static int dcb_buffer_get(struct dcb *dcb, const char *dev, struct dcbnl_buffer *buffer)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_buffer_set(struct dcb *dcb, const char *dev, const struct dcbnl_buffer *buffer)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_cmd_buffer_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
if (!argc) {
dcb_buffer_help_set();
return 0;
}
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_set();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_prio_buffer, &buffer);
if (ret) {
fprintf(stderr, "Invalid priority mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "buffer-size") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_buffer_size, &buffer);
if (ret) {
fprintf(stderr, "Invalid buffer size mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_buffer_set(dcb, dev, &buffer);
}
static int dcb_cmd_buffer_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_buffer_print(&buffer);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_show();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
dcb_buffer_print_prio_buffer(&buffer);
print_nl();
} else if (matches(*argv, "buffer-size") == 0) {
dcb_buffer_print_buffer_size(&buffer);
print_nl();
} else if (matches(*argv, "total-size") == 0) {
dcb_buffer_print_total_size(&buffer);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_buffer_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_show, dcb_buffer_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_set, dcb_buffer_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help();
return -EINVAL;
}
}

192
dcb/dcb_dcbx.c Normal file
View File

@ -0,0 +1,192 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_dcbx_help_set(void)
{
fprintf(stderr,
"Usage: dcb dcbx set dev STRING\n"
" [ host | lld-managed ]\n"
" [ cee | ieee ] [ static ]\n"
"\n"
);
}
static void dcb_dcbx_help_show(void)
{
fprintf(stderr,
"Usage: dcb dcbx show dev STRING\n"
"\n"
);
}
static void dcb_dcbx_help(void)
{
fprintf(stderr,
"Usage: dcb dcbx help\n"
"\n"
);
dcb_dcbx_help_show();
dcb_dcbx_help_set();
}
struct dcb_dcbx_flag {
__u8 value;
const char *key_fp;
const char *key_json;
};
static struct dcb_dcbx_flag dcb_dcbx_flags[] = {
{DCB_CAP_DCBX_HOST, "host"},
{DCB_CAP_DCBX_LLD_MANAGED, "lld-managed", "lld_managed"},
{DCB_CAP_DCBX_VER_CEE, "cee"},
{DCB_CAP_DCBX_VER_IEEE, "ieee"},
{DCB_CAP_DCBX_STATIC, "static"},
};
static void dcb_dcbx_print(__u8 dcbx)
{
int bit;
int i;
while ((bit = ffs(dcbx))) {
bool found = false;
bit--;
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (flag->value == 1 << bit) {
print_bool(PRINT_JSON, flag->key_json ?: flag->key_fp,
NULL, true);
print_string(PRINT_FP, NULL, "%s ", flag->key_fp);
found = true;
break;
}
}
if (!found)
fprintf(stderr, "Unknown DCBX bit %#x.\n", 1 << bit);
dcbx &= ~(1 << bit);
}
print_nl();
}
static int dcb_dcbx_get(struct dcb *dcb, const char *dev, __u8 *dcbx)
{
__u16 payload_len;
void *payload;
int err;
err = dcb_get_attribute_bare(dcb, DCB_CMD_IEEE_GET, dev, DCB_ATTR_DCBX,
&payload, &payload_len);
if (err != 0)
return err;
if (payload_len != 1) {
fprintf(stderr, "DCB_ATTR_DCBX payload has size %d, expected 1.\n",
payload_len);
return -EINVAL;
}
*dcbx = *(__u8 *) payload;
return 0;
}
static int dcb_dcbx_set(struct dcb *dcb, const char *dev, __u8 dcbx)
{
return dcb_set_attribute_bare(dcb, DCB_CMD_SDCBX, dev, DCB_ATTR_DCBX,
&dcbx, 1, DCB_ATTR_DCBX);
}
static int dcb_cmd_dcbx_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx = 0;
__u8 i;
if (!argc) {
dcb_dcbx_help_set();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_set();
return 0;
}
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (matches(*argv, flag->key_fp) == 0) {
dcbx |= flag->value;
NEXT_ARG_FWD();
goto next;
}
}
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_set();
return -EINVAL;
next:
;
} while (argc > 0);
return dcb_dcbx_set(dcb, dev, dcbx);
}
static int dcb_cmd_dcbx_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx;
int ret;
ret = dcb_dcbx_get(dcb, dev, &dcbx);
if (ret != 0)
return ret;
while (argc > 0) {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_show();
return 0;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
}
open_json_object(NULL);
dcb_dcbx_print(dcbx);
close_json_object();
return 0;
}
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_dcbx_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_show, dcb_dcbx_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_set, dcb_dcbx_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help();
return -EINVAL;
}
}

435
dcb/dcb_ets.c Normal file
View File

@ -0,0 +1,435 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_ets_help_set(void)
{
fprintf(stderr,
"Usage: dcb ets set dev STRING\n"
" [ willing { on | off } ]\n"
" [ { tc-tsa | reco-tc-tsa } TSA-MAP ]\n"
" [ { pg-bw | tc-bw | reco-tc-bw } BW-MAP ]\n"
" [ { prio-tc | reco-prio-tc } PRIO-MAP ]\n"
"\n"
" where TSA-MAP := [ TSA-MAP ] TSA-MAPPING\n"
" TSA-MAPPING := { all | TC }:{ strict | cbs | ets | vendor }\n"
" BW-MAP := [ BW-MAP ] BW-MAPPING\n"
" BW-MAPPING := { all | TC }:INTEGER\n"
" PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:TC\n"
" TC := { 0 .. 7 }\n"
" PRIO := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_ets_help_show(void)
{
fprintf(stderr,
"Usage: dcb ets show dev STRING\n"
" [ willing ] [ ets-cap ] [ cbs ] [ tc-tsa ]\n"
" [ reco-tc-tsa ] [ pg-bw ] [ tc-bw ] [ reco-tc-bw ]\n"
" [ prio-tc ] [ reco-prio-tc ]\n"
"\n"
);
}
static void dcb_ets_help(void)
{
fprintf(stderr,
"Usage: dcb ets help\n"
"\n"
);
dcb_ets_help_show();
dcb_ets_help_set();
}
static const char *const tsa_names[] = {
[IEEE_8021QAZ_TSA_STRICT] = "strict",
[IEEE_8021QAZ_TSA_CB_SHAPER] = "cbs",
[IEEE_8021QAZ_TSA_ETS] = "ets",
[IEEE_8021QAZ_TSA_VENDOR] = "vendor",
};
static int dcb_ets_parse_mapping_tc_tsa(__u32 key, char *value, void *data)
{
__u8 tsa;
int ret;
tsa = parse_one_of("TSA", value, tsa_names, ARRAY_SIZE(tsa_names), &ret);
if (ret)
return ret;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"TSA", tsa, -1U,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_tc_bw(__u32 key, char *value, void *data)
{
__u8 bw;
if (get_u8(&bw, value, 0))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"BW", bw, 100,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_prio_tc(unsigned int key, char *value, void *data)
{
__u8 tc;
if (get_u8(&tc, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"TC", tc, IEEE_8021QAZ_MAX_TCS - 1,
dcb_set_u8, data);
}
static void dcb_print_array_tsa(const __u8 *array, size_t size)
{
dcb_print_array_kw(array, size, tsa_names, ARRAY_SIZE(tsa_names));
}
static void dcb_ets_print_willing(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "willing", "willing %s ", ets->willing);
}
static void dcb_ets_print_ets_cap(const struct ieee_ets *ets)
{
print_uint(PRINT_ANY, "ets_cap", "ets-cap %d ", ets->ets_cap);
}
static void dcb_ets_print_cbs(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "cbs", "cbs %s ", ets->cbs);
}
static void dcb_ets_print_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_bw", "tc-bw",
ets->tc_tx_bw, ARRAY_SIZE(ets->tc_tx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_pg_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("pg_bw", "pg-bw",
ets->tc_rx_bw, ARRAY_SIZE(ets->tc_rx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_tsa", "tc-tsa",
ets->tc_tsa, ARRAY_SIZE(ets->tc_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("prio_tc", "prio-tc",
ets->prio_tc, ARRAY_SIZE(ets->prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_bw", "reco-tc-bw",
ets->tc_reco_bw, ARRAY_SIZE(ets->tc_reco_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_tsa", "reco-tc-tsa",
ets->tc_reco_tsa, ARRAY_SIZE(ets->tc_reco_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_reco_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_prio_tc", "reco-prio-tc",
ets->reco_prio_tc, ARRAY_SIZE(ets->reco_prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print(const struct ieee_ets *ets)
{
dcb_ets_print_willing(ets);
dcb_ets_print_ets_cap(ets);
dcb_ets_print_cbs(ets);
print_nl();
dcb_ets_print_tc_bw(ets);
print_nl();
dcb_ets_print_pg_bw(ets);
print_nl();
dcb_ets_print_tc_tsa(ets);
print_nl();
dcb_ets_print_prio_tc(ets);
print_nl();
dcb_ets_print_reco_tc_bw(ets);
print_nl();
dcb_ets_print_reco_tc_tsa(ets);
print_nl();
dcb_ets_print_reco_prio_tc(ets);
print_nl();
}
static int dcb_ets_get(struct dcb *dcb, const char *dev, struct ieee_ets *ets)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_ets_validate_bw(const __u8 bw[], const __u8 tsa[], const char *what)
{
bool has_ets = false;
unsigned int total = 0;
unsigned int tc;
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (tsa[tc] == IEEE_8021QAZ_TSA_ETS) {
has_ets = true;
break;
}
}
/* TC bandwidth is only intended for ETS, but 802.1Q-2018 only requires
* that the sum be 100, and individual entries 0..100. It explicitly
* notes that non-ETS TCs can have non-0 TC bandwidth during
* reconfiguration.
*/
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (bw[tc] > 100) {
fprintf(stderr, "%d%% for TC %d of %s is not a valid bandwidth percentage, expected 0..100%%\n",
bw[tc], tc, what);
return -EINVAL;
}
total += bw[tc];
}
/* This is what 802.1Q-2018 requires. */
if (total == 100)
return 0;
/* But this requirement does not make sense for all-strict
* configurations. Anything else than 0 does not make sense: either BW
* has not been reconfigured for the all-strict allocation yet, at which
* point we expect sum of 100. Or it has already been reconfigured, at
* which point accept 0.
*/
if (!has_ets && total == 0)
return 0;
fprintf(stderr, "Bandwidth percentages in %s sum to %d%%, expected %d%%\n",
what, total, has_ets ? 100 : 0);
return -EINVAL;
}
static int dcb_ets_set(struct dcb *dcb, const char *dev, const struct ieee_ets *ets)
{
/* Do not validate pg-bw, which is not standard and has unclear
* meaning.
*/
if (dcb_ets_validate_bw(ets->tc_tx_bw, ets->tc_tsa, "tc-bw") ||
dcb_ets_validate_bw(ets->tc_reco_bw, ets->tc_reco_tsa, "reco-tc-bw"))
return -EINVAL;
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_cmd_ets_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
if (!argc) {
dcb_ets_help_set();
return 1;
}
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_set();
return 0;
} else if (matches(*argv, "willing") == 0) {
NEXT_ARG();
ets.willing = parse_on_off("willing", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_tsa);
if (ret) {
fprintf(stderr, "Invalid tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_reco_tsa);
if (ret) {
fprintf(stderr, "Invalid reco-tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_tx_bw);
if (ret) {
fprintf(stderr, "Invalid tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "pg-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_rx_bw);
if (ret) {
fprintf(stderr, "Invalid pg-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_reco_bw);
if (ret) {
fprintf(stderr, "Invalid reco-tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.prio_tc);
if (ret) {
fprintf(stderr, "Invalid prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.reco_prio_tc);
if (ret) {
fprintf(stderr, "Invalid reco-prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_ets_set(dcb, dev, &ets);
}
static int dcb_cmd_ets_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_ets_print(&ets);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_show();
return 0;
} else if (matches(*argv, "willing") == 0) {
dcb_ets_print_willing(&ets);
print_nl();
} else if (matches(*argv, "ets-cap") == 0) {
dcb_ets_print_ets_cap(&ets);
print_nl();
} else if (matches(*argv, "cbs") == 0) {
dcb_ets_print_cbs(&ets);
print_nl();
} else if (matches(*argv, "tc-tsa") == 0) {
dcb_ets_print_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-tsa") == 0) {
dcb_ets_print_reco_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "tc-bw") == 0) {
dcb_ets_print_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "pg-bw") == 0) {
dcb_ets_print_pg_bw(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-bw") == 0) {
dcb_ets_print_reco_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "prio-tc") == 0) {
dcb_ets_print_prio_tc(&ets);
print_nl();
} else if (matches(*argv, "reco-prio-tc") == 0) {
dcb_ets_print_reco_prio_tc(&ets);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_ets_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_show, dcb_ets_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_set, dcb_ets_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help();
return -EINVAL;
}
}

182
dcb/dcb_maxrate.c Normal file
View File

@ -0,0 +1,182 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_maxrate_help_set(void)
{
fprintf(stderr,
"Usage: dcb maxrate set dev STRING\n"
" [ tc-maxrate RATE-MAP ]\n"
"\n"
" where RATE-MAP := [ RATE-MAP ] RATE-MAPPING\n"
" RATE-MAPPING := { all | TC }:RATE\n"
" TC := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_maxrate_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -i ] maxrate show dev STRING\n"
" [ tc-maxrate ]\n"
"\n"
);
}
static void dcb_maxrate_help(void)
{
fprintf(stderr,
"Usage: dcb maxrate help\n"
"\n"
);
dcb_maxrate_help_show();
dcb_maxrate_help_set();
}
static int dcb_maxrate_parse_mapping_tc_maxrate(__u32 key, char *value, void *data)
{
__u64 rate;
if (get_rate64(&rate, value))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"RATE", rate, -1,
dcb_set_u64, data);
}
static void dcb_maxrate_print_tc_maxrate(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
size_t size = ARRAY_SIZE(maxrate->tc_maxrate);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "tc_maxrate");
print_string(PRINT_FP, NULL, "tc-maxrate ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_rate(dcb->use_iec, PRINT_ANY, NULL, b, maxrate->tc_maxrate[i]);
}
close_json_array(PRINT_JSON, "tc_maxrate");
}
static void dcb_maxrate_print(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
dcb_maxrate_print_tc_maxrate(dcb, maxrate);
print_nl();
}
static int dcb_maxrate_get(struct dcb *dcb, const char *dev, struct ieee_maxrate *maxrate)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_maxrate_set(struct dcb *dcb, const char *dev, const struct ieee_maxrate *maxrate)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_cmd_maxrate_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
if (!argc) {
dcb_maxrate_help_set();
return 0;
}
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_set();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_maxrate_parse_mapping_tc_maxrate, &maxrate);
if (ret) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_maxrate_set(dcb, dev, &maxrate);
}
static int dcb_cmd_maxrate_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_maxrate_print(dcb, &maxrate);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_show();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
dcb_maxrate_print_tc_maxrate(dcb, &maxrate);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_maxrate_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_show, dcb_maxrate_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_set, dcb_maxrate_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help();
return -EINVAL;
}
}

286
dcb/dcb_pfc.c Normal file
View File

@ -0,0 +1,286 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_pfc_help_set(void)
{
fprintf(stderr,
"Usage: dcb pfc set dev STRING\n"
" [ prio-pfc PFC-MAP ]\n"
" [ macsec-bypass { on | off } ]\n"
" [ delay INTEGER ]\n"
"\n"
" where PFC-MAP := [ PFC-MAP ] PFC-MAPPING\n"
" PFC-MAPPING := { all | TC }:PFC\n"
" TC := { 0 .. 7 }\n"
" PFC := { on | off }\n"
"\n"
);
}
static void dcb_pfc_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -s ] pfc show dev STRING\n"
" [ pfc-cap ] [ prio-pfc ] [ macsec-bypass ]\n"
" [ delay ] [ requests ] [ indications ]\n"
"\n"
);
}
static void dcb_pfc_help(void)
{
fprintf(stderr,
"Usage: dcb pfc help\n"
"\n"
);
dcb_pfc_help_show();
dcb_pfc_help_set();
}
static void dcb_pfc_to_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 pfc_en)
{
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
array[i] = !!(pfc_en & (1 << i));
}
static void dcb_pfc_from_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 *pfc_en_p)
{
__u8 pfc_en = 0;
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
if (array[i])
pfc_en |= 1 << i;
}
*pfc_en_p = pfc_en;
}
static int dcb_pfc_parse_mapping_prio_pfc(__u32 key, char *value, void *data)
{
struct ieee_pfc *pfc = data;
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
bool enabled;
int ret;
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
enabled = parse_on_off("PFC", value, &ret);
if (ret)
return ret;
ret = dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"PFC", enabled, -1,
dcb_set_u8, pfc_en);
if (ret)
return ret;
dcb_pfc_from_array(pfc_en, &pfc->pfc_en);
return 0;
}
static void dcb_pfc_print_pfc_cap(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "pfc_cap", "pfc-cap %d ", pfc->pfc_cap);
}
static void dcb_pfc_print_macsec_bypass(const struct ieee_pfc *pfc)
{
print_on_off(PRINT_ANY, "macsec_bypass", "macsec-bypass %s ", pfc->mbc);
}
static void dcb_pfc_print_delay(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "delay", "delay %d ", pfc->delay);
}
static void dcb_pfc_print_prio_pfc(const struct ieee_pfc *pfc)
{
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
dcb_print_named_array("prio_pfc", "prio-pfc",
pfc_en, ARRAY_SIZE(pfc_en), &dcb_print_array_on_off);
}
static void dcb_pfc_print_requests(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "requests");
print_string(PRINT_FP, NULL, "requests ", NULL);
dcb_print_array_u64(pfc->requests, ARRAY_SIZE(pfc->requests));
close_json_array(PRINT_JSON, "requests");
}
static void dcb_pfc_print_indications(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "indications");
print_string(PRINT_FP, NULL, "indications ", NULL);
dcb_print_array_u64(pfc->indications, ARRAY_SIZE(pfc->indications));
close_json_array(PRINT_JSON, "indications");
}
static void dcb_pfc_print(const struct dcb *dcb, const struct ieee_pfc *pfc)
{
dcb_pfc_print_pfc_cap(pfc);
dcb_pfc_print_macsec_bypass(pfc);
dcb_pfc_print_delay(pfc);
print_nl();
dcb_pfc_print_prio_pfc(pfc);
print_nl();
if (dcb->stats) {
dcb_pfc_print_requests(pfc);
print_nl();
dcb_pfc_print_indications(pfc);
print_nl();
}
}
static int dcb_pfc_get(struct dcb *dcb, const char *dev, struct ieee_pfc *pfc)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_pfc_set(struct dcb *dcb, const char *dev, const struct ieee_pfc *pfc)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_cmd_pfc_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
if (!argc) {
dcb_pfc_help_set();
return 0;
}
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_set();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_pfc_parse_mapping_prio_pfc, &pfc);
if (ret) {
fprintf(stderr, "Invalid pfc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "macsec-bypass") == 0) {
NEXT_ARG();
pfc.mbc = parse_on_off("macsec-bypass", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "delay") == 0) {
NEXT_ARG();
/* Do not support the size notations for delay.
* Delay is specified in "bit times", not bits, so
* it is not applicable. At the same time it would
* be confusing that 10Kbit does not mean 10240,
* but 1280.
*/
if (get_u16(&pfc.delay, *argv, 0)) {
fprintf(stderr, "Invalid delay `%s', expected an integer 0..65535\n",
*argv);
return -EINVAL;
}
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_pfc_set(dcb, dev, &pfc);
}
static int dcb_cmd_pfc_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_pfc_print(dcb, &pfc);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_show();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
dcb_pfc_print_prio_pfc(&pfc);
print_nl();
} else if (matches(*argv, "pfc-cap") == 0) {
dcb_pfc_print_pfc_cap(&pfc);
print_nl();
} else if (matches(*argv, "macsec-bypass") == 0) {
dcb_pfc_print_macsec_bypass(&pfc);
print_nl();
} else if (matches(*argv, "delay") == 0) {
dcb_pfc_print_delay(&pfc);
print_nl();
} else if (matches(*argv, "requests") == 0) {
dcb_pfc_print_requests(&pfc);
print_nl();
} else if (matches(*argv, "indications") == 0) {
dcb_pfc_print_indications(&pfc);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_pfc_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_show, dcb_pfc_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_set, dcb_pfc_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help();
return -EINVAL;
}
}

1
devlink/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
devlink

25
devlink/Makefile Normal file
View File

@ -0,0 +1,25 @@
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DEVLINKOBJ = devlink.o mnlg.o
TARGETS += devlink
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
devlink: $(DEVLINKOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DEVLINKOBJ) $(TARGETS)

9189
devlink/devlink.c Normal file

File diff suppressed because it is too large Load Diff

155
devlink/mnlg.c Normal file
View File

@ -0,0 +1,155 @@
/*
* mnlg.c Generic Netlink helpers for libmnl
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
* Authors: Jiri Pirko <jiri@mellanox.com>
*/
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <libmnl/libmnl.h>
#include <linux/genetlink.h>
#include "libnetlink.h"
#include "mnl_utils.h"
#include "utils.h"
#include "mnlg.h"
struct mnlg_socket {
struct mnl_socket *nl;
char *buf;
uint32_t id;
uint8_t version;
unsigned int seq;
};
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh)
{
return mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
}
struct group_info {
bool found;
uint32_t id;
const char *name;
};
static int parse_mc_grps_cb(const struct nlattr *attr, void *data)
{
const struct nlattr **tb = data;
int type = mnl_attr_get_type(attr);
if (mnl_attr_type_valid(attr, CTRL_ATTR_MCAST_GRP_MAX) < 0)
return MNL_CB_OK;
switch (type) {
case CTRL_ATTR_MCAST_GRP_ID:
if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
return MNL_CB_ERROR;
break;
case CTRL_ATTR_MCAST_GRP_NAME:
if (mnl_attr_validate(attr, MNL_TYPE_STRING) < 0)
return MNL_CB_ERROR;
break;
}
tb[type] = attr;
return MNL_CB_OK;
}
static void parse_genl_mc_grps(struct nlattr *nested,
struct group_info *group_info)
{
struct nlattr *pos;
const char *name;
mnl_attr_for_each_nested(pos, nested) {
struct nlattr *tb[CTRL_ATTR_MCAST_GRP_MAX + 1] = {};
mnl_attr_parse_nested(pos, parse_mc_grps_cb, tb);
if (!tb[CTRL_ATTR_MCAST_GRP_NAME] ||
!tb[CTRL_ATTR_MCAST_GRP_ID])
continue;
name = mnl_attr_get_str(tb[CTRL_ATTR_MCAST_GRP_NAME]);
if (strcmp(name, group_info->name) != 0)
continue;
group_info->id = mnl_attr_get_u32(tb[CTRL_ATTR_MCAST_GRP_ID]);
group_info->found = true;
}
}
static int get_group_id_attr_cb(const struct nlattr *attr, void *data)
{
const struct nlattr **tb = data;
int type = mnl_attr_get_type(attr);
if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0)
return MNL_CB_ERROR;
if (type == CTRL_ATTR_MCAST_GROUPS &&
mnl_attr_validate(attr, MNL_TYPE_NESTED) < 0)
return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
}
static int get_group_id_cb(const struct nlmsghdr *nlh, void *data)
{
struct group_info *group_info = data;
struct nlattr *tb[CTRL_ATTR_MAX + 1] = {};
struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
mnl_attr_parse(nlh, sizeof(*genl), get_group_id_attr_cb, tb);
if (!tb[CTRL_ATTR_MCAST_GROUPS])
return MNL_CB_ERROR;
parse_genl_mc_grps(tb[CTRL_ATTR_MCAST_GROUPS], group_info);
return MNL_CB_OK;
}
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name)
{
struct nlmsghdr *nlh;
struct group_info group_info;
int err;
nlh = _mnlu_gen_socket_cmd_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK,
GENL_ID_CTRL, 1);
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->family);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
return err;
group_info.found = false;
group_info.name = group_name;
err = mnlu_gen_socket_recv_run(nlg, get_group_id_cb, &group_info);
if (err < 0)
return err;
if (!group_info.found) {
errno = ENOENT;
return -1;
}
err = mnl_socket_setsockopt(nlg->nl, NETLINK_ADD_MEMBERSHIP,
&group_info.id, sizeof(group_info.id));
if (err < 0)
return err;
return 0;
}
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg)
{
return mnl_socket_get_fd(nlg->nl);
}

23
devlink/mnlg.h Normal file
View File

@ -0,0 +1,23 @@
/*
* mnlg.h Generic Netlink helpers for libmnl
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
* Authors: Jiri Pirko <jiri@mellanox.com>
*/
#ifndef _MNLG_H_
#define _MNLG_H_
#include <libmnl/libmnl.h>
struct mnlu_gen_socket;
int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh);
int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name);
int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg);
#endif /* _MNLG_H_ */

View File

@ -1,55 +0,0 @@
PSFILES=ip-cref.ps ip-tunnels.ps api-ip6-flowlabels.ps ss.ps nstat.ps arpd.ps rtstat.ps
# tc-cref.ps
# api-rtnl.tex api-pmtudisc.tex api-news.tex
# iki-netdev.ps iki-neighdst.ps
LATEX=latex
DVIPS=dvips
SGML2DVI=sgml2latex --output=dvi
SGML2HTML=sgml2html -s 0
LPR=lpr -Zsduplex
SHELL=bash
PAGESIZE=a4
PAGESPERPAGE=2
HTMLFILES=$(subst .sgml,.html,$(shell echo *.sgml))
DVIFILES=$(subst .ps,.dvi,$(PSFILES))
all: pstwocol
pstwocol: $(PSFILES)
html: $(HTMLFILES)
dvi: $(DVIFILES)
print: $(PSFILES)
$(LPR) $(PSFILES)
%.dvi: %.sgml
$(SGML2DVI) $<
%.dvi: %.tex
@set -e; pass=2; echo "Running LaTeX $<"; \
while [ `$(LATEX) $< </dev/null 2>&1 | \
grep -c '^\(LaTeX Warning: Label(s) may\|No file \|! Emergency stop\)'` -ge 1 ]; do \
if [ $$pass -gt 3 ]; then \
echo "Seems, something is wrong. Try by hands." ; exit 1 ; \
fi; \
echo "Re-running LaTeX $<, $${pass}d pass"; pass=$$[$$pass + 1]; \
done
%.ps: %.dvi
$(DVIPS) $< -o $@
%.html: %.sgml
$(SGML2HTML) $<
install:
install -m 0644 $(shell echo *.tex) $(DESTDIR)$(DOCDIR)
install -m 0644 $(shell echo *.sgml) $(DESTDIR)$(DOCDIR)
clean:
rm -f *.aux *.log *.toc $(PSFILES) $(DVIFILES) *.html

View File

@ -1,16 +0,0 @@
Partially finished work.
1. User Reference manuals.
1.1 IP Command reference (ip-cref.tex, published)
1.2 TC Command reference (tc-cref.tex)
1.3 IP tunnels (ip-tunnels.tex, published)
2. Linux-2.2 Networking API
2.1 RTNETLINK (api-rtnl.tex)
2.2 Path MTU Discovery (api-pmtudisc.tex)
2.3 IPv6 Flow Labels (api-ip6-flowlabels.tex, published)
2.4 Miscellaneous extensions (api-misc.tex)
3. Linux-2.2 Networking Intra-Kernel Interfaces
3.1 NetDev --- Networking Devices and netdev... (iki-netdev.tex)
3.2 Neighbour cache and destination cache. (iki-neighdst.tex)

View File

@ -1 +0,0 @@
\def\Draft{020116}

View File

@ -6,8 +6,8 @@ What is it?
-----------
An extension to the filtering/classification architecture of Linux Traffic
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
Control.
Up to 2.6.8 the only action that could be "attached" to a filter was policing.
i.e you could say something like:
-----
@ -17,11 +17,11 @@ tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
which implies "if a packet is seen on the ingress of the lo device with
a source IP address of 127.0.0.1/32 we give it a classification id of 1:1 and
we execute a policing action which rate limits its bandwidth utilization
we execute a policing action which rate limits its bandwidth utilization
to 1.5Mbps".
The new extensions allow for more than just policing actions to be added.
They are also fully backward compatible. If you have a kernel that doesnt
They are also fully backward compatible. If you have a kernel that doesn't
understand them, then the effect is null i.e if you have a newer tc
but older kernel, the actions are not installed. Likewise if you
have a newer kernel but older tc, obviously the tc will use current
@ -29,9 +29,9 @@ syntax which will work fine. Of course to get the required effect you need
both newer tc and kernel. If you are reading this you have the
right tc ;->
A side effect is that we can now get stateless firewalling to work with tc.
A side effect is that we can now get stateless firewalling to work with tc.
Essentially this is now an alternative to iptables.
I wont go into details of my dislike for iptables at times, but
I won't go into details of my dislike for iptables at times, but
scalability is one of the main issues; however, if you need stateful
classification - use netfilter (for now).
@ -61,7 +61,7 @@ tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
match ip src 127.0.0.1/32 flowid 1:1 \
action police mtu 4000 rate 1500kbit burst 90k
" generic Actions" (gact) at the moment are:
" generic Actions" (gact) at the moment are:
{ drop, pass, reclassify, continue}
(If you have others, no listed here give me a reason and we will add them)
+drop says to drop the packet
@ -77,7 +77,7 @@ iptable target. I have only tested with mangler targets up to now.
In terms of hooks:
*ingress is mapped to pre-routing hook
*egress is mapped to post-routing hook
I dont see much value in the other hooks, if you see it and email me good
I don't see much value in the other hooks, if you see it and email me good
reasons, the addition is trivial.
Example syntax for iptables targets usage becomes:
@ -93,43 +93,43 @@ decimal 12, then use flowid 1:c.
3) A feature i call pipe
The motivation is derived from Unix pipe mechanism but applied to packets.
Essentially take a matching packet and pass it through
Essentially take a matching packet and pass it through
action1 | action2 | action3 etc.
You could do something similar to this with the tc policer and the "continue"
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
operator but this rather restricts it to just the policer and requires
multiple rules (and lookups, hence quiet inefficient);
as an example -- and please note that this is just an example _not_ The
as an example -- and please note that this is just an example _not_ The
Word Youve Been Waiting For (yes i have had problems giving examples
which ended becoming dogma in documents and people modifying them a little
to look clever);
to look clever);
i selected the metering rates to be small so that i can show better how
i selected the metering rates to be small so that i can show better how
things work.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesnt exceed rate, this is where we terminate action execution.
The script below does the following:
- an incoming packet from 10.0.0.21 is first given a firewall mark of 1.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
- It is then metered to make sure it does not exceed its allocated rate of
1Kbps. If it doesn't exceed rate, this is where we terminate action execution.
- If it does exceed its rate, its "color" changes to a mark of 2 and it is
then passed through a second meter.
-The second meter is shared across all flows on that device [i am suprised
that this seems to be not a well know feature of the policer; Bert was telling
-The second meter is shared across all flows on that device [i am surpised
that this seems to be not a well know feature of the policer; Bert was telling
me that someone was writing a qdisc just to do sharing across multiple devices;
it must be the summer heat again; weve had someone doing that every year around
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
summer -- the key to sharing is to use a operator "index" in your policer
rules (example "index 20"). All your rules have to use the same index to
share.]
-If the second meter is exceeded the color of the flow changes further to 3.
-We then pass the packet to another meter which is shared across all devices
in the system. If this meter is exceeded we drop the packet.
Note the mark can be used further up the system to do things like policy
Note the mark can be used further up the system to do things like policy
or more interesting things on the egress.
------------------ cut here -------------------------------
@ -145,7 +145,7 @@ u32 match ip src 10.0.0.21/32 flowid 1:15 \
action ipt -j mark --set-mark 1 index 2 \
#
# then pass it through a policer which allows 1kbps; if the flow
# doesnt exceed that rate, this is where we stop, if it exceeds we
# doesn't exceed that rate, this is where we stop, if it exceeds we
# pipe the packet to the next action
action police rate 1kbit burst 9k pipe \
#
@ -161,31 +161,31 @@ action ipt -j mark --set-mark 3 \
# and then attempt to borrow from a meter used by all devices in the
# system. Should this be exceeded, drop the packet on the floor.
action police index 20 mtu 5000 rate 1kbit burst 90k drop
---------------------------------
---------------------------------
Now lets see the actions installed with
Now lets see the actions installed with
"tc filter show parent ffff: dev eth0"
-------- output -----------
jroot# tc filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
match 0a000015/ffffffff at 12
-------------------------------
@ -209,31 +209,31 @@ Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0"
--------------
jroot# tc -s filter show parent ffff: dev eth0
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
5
5
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1 index 2
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb
Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122)
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 3: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x2 index 1
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b
Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945)
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
action order 5: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x3 index 3
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b
Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437)
match 0a000015/ffffffff at 12
-------------------------------
@ -241,7 +241,7 @@ filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
Neat, eh?
Wanna write an action module?
Want to write an action module?
------------------------------
Its easy. Either look at the code or send me email. I will document at
some point; will also accept documentation.
@ -254,4 +254,3 @@ At the moment the focus has been on getting the architecture in place.
Expect new things in the spurious time i have to work on this
(particularly around end of year when i have typically get time off
from work).

View File

@ -1,16 +1,16 @@
gact <ACTION> [RAND] [INDEX]
Where:
ACTION := reclassify | drop | continue | pass | ok
Where:
ACTION := reclassify | drop | continue | pass | ok
RAND := random <RANDTYPE> <ACTION> <VAL>
RANDTYPE := netrand | determ
VAL : = value not exceeding 10000
INDEX := index value used
ACTION semantics
- pass and ok are equivalent to accept
- continue allows to restart classification lookup
- continue allows one to restart classification lookup
- drop drops packets
- reclassify implies continue classification where we left off
@ -42,14 +42,14 @@ filter u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 32 suc
random type none pass val 0
index 1 ref 1 bind 1 installed 59 sec used 35 sec
Sent 1680 bytes 20 pkts (dropped 20, overlimits 0 )
----
# example 2
#allow 1 out 10 randomly using the netrand generator
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random netrand ok 10
ping -c 20 10.0.0.9
----
@ -59,14 +59,14 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
random type netrand pass val 10
index 5 ref 1 bind 1 installed 49 sec used 25 sec
Sent 1680 bytes 20 pkts (dropped 16, overlimits 0 )
--------
#alternative: deterministically accept every second packet
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
10.0.0.9/32 flowid 1:16 action drop random determ ok 2
ping -c 20 10.0.0.9
tc -s filter show parent ffff: dev eth0
-----
filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1filter protocol ip pref 6 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 (rule hit 20 success 20)
@ -76,4 +76,3 @@ filter protocol ip pref 6 u32 filter protocol ip pref 6 u32 fh 800: ht divisor 1
index 4 ref 1 bind 1 installed 118 sec used 82 sec
Sent 1680 bytes 20 pkts (dropped 10, overlimits 0 )
-----

View File

@ -6,47 +6,47 @@ with a _lot_ less code.
Known IMQ/IFB USES
------------------
As far as i know the reasons listed below is why people use IMQ.
As far as i know the reasons listed below is why people use IMQ.
It would be nice to know of anything else that i missed.
1) qdiscs/policies that are per device as opposed to system wide.
IFB allows for sharing.
2) Allows for queueing incoming traffic for shaping instead of
dropping. I am not aware of any study that shows policing is
dropping. I am not aware of any study that shows policing is
worse than shaping in achieving the end goal of rate control.
I would be interested if anyone is experimenting.
3) Very interesting use: if you are serving p2p you may wanna give
preference to your own localy originated traffic (when responses come back)
3) Very interesting use: if you are serving p2p you may want to give
preference to your own locally originated traffic (when responses come back)
vs someone using your system to do bittorent. So QoSing based on state
comes in as the solution. What people did to achive this was stick
comes in as the solution. What people did to achieve this was stick
the IMQ somewhere prelocal hook.
I think this is a pretty neat feature to have in Linux in general.
(i.e not just for IMQ).
But i wont go back to putting netfilter hooks in the device to satisfy
this. I also dont think its worth it hacking ifb some more to be
But i won't go back to putting netfilter hooks in the device to satisfy
this. I also don't think its worth it hacking ifb some more to be
aware of say L3 info and play ip rule tricks to achieve this.
--> Instead the plan is to have a contrack related action. This action will
selectively either query/create contrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didnt have existing state. This
--> Instead the plan is to have a conntrack related action. This action will
selectively either query/create conntrack state on incoming packets.
Packets could then be redirected to ifb based on what happens -> eg
on incoming packets; if we find they are of known state we could send to
a different queue than one which didn't have existing state. This
all however is dependent on whatever rules the admin enters.
At the moment this 3rd function does not exist yet. I have decided that
instead of sitting on the patch for another year, to release it and then
if theres pressure i will add this feature.
instead of sitting on the patch for another year, to release it and then
if there is pressure i will add this feature.
An example, to provide functionality that most people use IMQ for below:
--------
export TC="/sbin/tc"
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 root handle 1: prio
$TC qdisc add dev ifb0 parent 1:1 handle 10: sfq
$TC qdisc add dev ifb0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq
$TC filter add dev ifb0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1
$TC filter add dev ifb0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2
@ -54,7 +54,7 @@ ifconfig ifb0 up
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving in eth0 to ifb0
# redirect all IP packets arriving in eth0 to ifb0
# use mark 1 --> puts them onto class 1:1
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 \
@ -77,44 +77,44 @@ PING 10.22 (10.0.0.22): 56 data bytes
--- 10.22 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.6/1.3/2.8 ms
[root@jzny action-tests]#
[root@jzny action-tests]#
-----
Now look at some stats:
---
[root@jmandrake]:~# $TC -s filter show parent ffff: dev eth0
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
filter protocol ip pref 10 u32
filter protocol ip pref 10 u32 fh 800: ht divisor 1
filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
match 00000000/00000000 at 0
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target MARK set 0x1
index 1 ref 1 bind 1 installed 4195sec used 27sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
action order 2: mirred (Egress Redirect to device ifb0) stolen
index 1 ref 1 bind 1 installed 165 sec used 27 sec
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
Sent 252 bytes 3 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# $TC -s qdisc
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 30: dev ifb0 limit 128p quantum 1514b
Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s
Sent 210 bytes 3 pkts (dropped 0, overlimits 0)
qdisc sfq 10: dev ifb0 limit 128p quantum 1514b
Sent 294 bytes 3 pkts (dropped 0, overlimits 0)
qdisc prio 1: dev ifb0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
Sent 504 bytes 6 pkts (dropped 0, overlimits 0)
qdisc ingress ffff: dev eth0 ----------------
Sent 308 bytes 5 pkts (dropped 0, overlimits 0)
[root@jmandrake]:~# ifconfig ifb0
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
ifb0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:6 errors:0 dropped:3 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:32
collisions:0 txqueuelen:32
RX bytes:504 (504.0 b) TX bytes:252 (252.0 b)
-----

View File

@ -7,10 +7,10 @@ flow to be mirrored. High end switches typically can select based
on more than just a port (eg a 5 tuple classifier). They may also be
capable of redirecting.
Usage:
Usage:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
mirred <DIRECTION> <ACTION> [index INDEX] <dev DEVICENAME>
where:
DIRECTION := <ingress | egress>
ACTION := <mirror | redirect>
INDEX is the specific policy instance id
@ -18,7 +18,7 @@ DEVICENAME is the devicename
Direction:
- Ingress is not supported at the moment. It will be in the
future as well as mirror/redirecting to a socket.
future as well as mirror/redirecting to a socket.
Action:
- Mirror takes a copy of the packet and sends it to specified
@ -26,17 +26,17 @@ dev ("port" in ethernet switch/bridging terminology)
- redirect
steals the packet and redirects to specified destination dev.
What NOT to do if you dont want your machine to crash:
What NOT to do if you don't want your machine to crash:
------------------------------------------------------
Do not create loops!
Do not create loops!
Loops are not hard to create in the egress qdiscs.
Here are simple rules to follow if you dont want to get
Here are simple rules to follow if you don't want to get
hurt:
A) Do not have the same packet go to same netdevice twice
in a single graph of policies. Your machine will just hang!
This is design intent _not a bug_ to teach you some lessons.
This is design intent _not a bug_ to teach you some lessons.
In the future if there are easy ways to do this in the kernel
without affecting other packets not interested in this feature
@ -51,7 +51,7 @@ B) Do not redirect from one IFB device to another.
Remember that IFB is a very specialized case of packet redirecting
device. Instead of redirecting it puts packets at the exact spot
on the stack it found them from.
Redirecting from ifbX->ifbY will actually not crash your machine but your
Redirecting from ifbX->ifbY will actually not crash your machine but your
packets will all be dropped (this is much simpler to detect
and resolve and is only affecting users of ifb as opposed to the
whole stack).
@ -64,7 +64,7 @@ Some examples:
1) Mirror all packets arriving on eth0 to be sent out on eth1.
You may have a sniffer or some accounting box hooked up on eth1.
---
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
@ -100,7 +100,7 @@ stack (i.e ping would work).
3) Even more funky example:
#
#allow 1 out 10 packets on ingress of lo to randomly make it to the
#allow 1 out 10 packets on ingress of lo to randomly make it to the
# host A (Randomness uses the netrand generator)
#
---
@ -111,9 +111,9 @@ action mirred egress mirror dev eth0
---
4)
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
# for packets from 10.0.0.9 going out on eth0 (could be local
# IP or something # we are forwarding) -
# if exceeding a 100Kbps rate, then redirect to eth1
#
---
@ -129,7 +129,7 @@ so you could tcpdump them (dummy by defaults drops all packets it sees).
This is a very useful debug feature.
Lets say you are policing packets from alias 192.168.200.200/32
you dont want those to exceed 100kbps going out.
you don't want those to exceed 100kbps going out.
---
tc qdisc add dev eth0 handle 1:0 root prio
@ -158,7 +158,7 @@ Essentially a good debugging/logging interface (sort of like
BSDs speacialized log device does without needing one).
If you replace mirror with redirect, those packets will be
blackholed and will never make it out.
blackholed and will never make it out.
cheers,
jamal

View File

@ -1,429 +0,0 @@
\documentstyle[12pt,twoside]{article}
\def\TITLE{IPv6 Flow Labels}
\input preamble
\begin{center}
\Large\bf IPv6 Flow Labels in Linux-2.2.
\end{center}
\begin{center}
{ \large Alexey~N.~Kuznetsov } \\
\em Institute for Nuclear Research, Moscow \\
\verb|kuznet@ms2.inr.ac.ru| \\
\rm April 11, 1999
\end{center}
\vspace{5mm}
\tableofcontents
\section{Introduction.}
Every IPv6 packet carries 28 bits of flow information. RFC2460 splits
these bits to two fields: 8 bits of traffic class (or DS field, if you
prefer this term) and 20 bits of flow label. Currently there exist
no well-defined API to manage IPv6 flow information. In this document
I describe an attempt to design the API for Linux-2.2 IPv6 stack.
\vskip 1mm
The API must solve the following tasks:
\begin{enumerate}
\item To allow user to set traffic class bits.
\item To allow user to read traffic class bits of received packets.
This feature is not so useful as the first one, however it will be
necessary f.e.\ to implement ECN [RFC2481] for datagram oriented services
or to implement receiver side of SRP or another end-to-end protocol
using traffic class bits.
\item To assign flow labels to packets sent by user.
\item To get flow labels of received packets. I do not know
any applications of this feature, but it is possible that receiver will
want to use flow labels to distinguish sub-flows.
\item To allocate flow labels in the way, compliant to RFC2460. Namely:
\begin{itemize}
\item
Flow labels must be uniformly distributed (pseudo-)random numbers,
so that any subset of 20 bits can be used as hash key.
\item
Flows with coinciding source address and flow label must have identical
destination address and not-fragmentable extensions headers (i.e.\
hop by hop options and all the headers up to and including routing header,
if it is present.)
\begin{NB}
There is a hole in specs: some hop-by-hop options can be
defined only on per-packet base (f.e.\ jumbo payload option).
Essentially, it means that such options cannot present in packets
with flow labels.
\end{NB}
\begin{NB}
NB notes here and below reflect only my personal opinion,
they should be read with smile or should not be read at all :-).
\end{NB}
\item
Flow labels have finite lifetime and source is not allowed to reuse
flow label for another flow within the maximal lifetime has expired,
so that intermediate nodes will be able to invalidate flow state before
the label is taken over by another flow.
Flow state, including lifetime, is propagated along datagram path
by some application specific methods
(f.e.\ in RSVP PATH messages or in some hop-by-hop option).
\end{itemize}
\end{enumerate}
\section{Sending/receiving flow information.}
\paragraph{Discussion.}
\addcontentsline{toc}{subsection}{Discussion}
It was proposed (Where? I do not remember any explicit statement)
to solve the first four tasks using
\verb|sin6_flowinfo| field added to \verb|struct| \verb|sockaddr_in6|
(see RFC2553).
\begin{NB}
This method is difficult to consider as reasonable, because it
puts additional overhead to all the services, despite of only
very small subset of them (none, to be more exact) really use it.
It contradicts both to IETF spirit and the letter. Before RFC2553
one justification existed, IPv6 address alignment left 4 byte
hole in \verb|sockaddr_in6| in any case. Now it has no justification.
\end{NB}
We have two problems with this method. The first one is common for all OSes:
if \verb|recvmsg()| initializes \verb|sin6_flowinfo| to flow info
of received packet, we loose one very important property of BSD socket API,
namely, we are not allowed to use received address for reply directly
and have to mangle it, even if we are not interested in flowinfo subtleties.
\begin{NB}
RFC2553 adds new requirement: to clear \verb|sin6_flowinfo|.
Certainly, it is not solution but rather attempt to force applications
to make unnecessary work. Well, as usually, one mistake in design
is followed by attempts to patch the hole and more mistakes...
\end{NB}
Another problem is Linux specific. Historically Linux IPv6 did not
initialize \verb|sin6_flowinfo| at all, so that, if kernel does not
support flow labels, this field is not zero, but a random number.
Some applications also did not take care about it.
\begin{NB}
Following RFC2553 such applications can be considered as broken,
but I still think that they are right: clearing all the address
before filling known fields is robust but stupid solution.
Useless wasting CPU cycles and
memory bandwidth is not a good idea. Such patches are acceptable
as temporary hacks, but not as standard of the future.
\end{NB}
\paragraph{Implementation.}
\addcontentsline{toc}{subsection}{Implementation}
By default Linux IPv6 does not read \verb|sin6_flowinfo| field
assuming that common applications are not obliged to initialize it
and are permitted to consider it as pure alignment padding.
In order to tell kernel that application
is aware of this field, it is necessary to set socket option
\verb|IPV6_FLOWINFO_SEND|.
\begin{verbatim}
int on = 1;
setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO_SEND,
(void*)&on, sizeof(on));
\end{verbatim}
Linux kernel never fills \verb|sin6_flowinfo| field, when passing
message to user space, though the kernels which support flow labels
initialize it to zero. If user wants to get received flowinfo, he
will set option \verb|IPV6_FLOWINFO| and after this he will receive
flowinfo as ancillary data object of type \verb|IPV6_FLOWINFO|
(cf.\ RFC2292).
\begin{verbatim}
int on = 1;
setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO, (void*)&on, sizeof(on));
\end{verbatim}
Flowinfo received and latched by a connected TCP socket also may be fetched
with \verb|getsockopt()| \verb|IPV6_PKTOPTIONS| together with
another optional information.
Besides that, in the spirit of RFC2292 the option \verb|IPV6_FLOWINFO|
may be used as alternative way to send flowinfo with \verb|sendmsg()| or
to latch it with \verb|IPV6_PKTOPTIONS|.
\paragraph{Note about IPv6 options and destination address.}
\addcontentsline{toc}{subsection}{IPv6 options and destination address}
If \verb|sin6_flowinfo| does contain not zero flow label,
destination address in \verb|sin6_addr| and non-fragmentable
extension headers are ignored. Instead, kernel uses the values
cached at flow setup (see below). However, for connected sockets
kernel prefers the values set at connection time.
\paragraph{Example.}
\addcontentsline{toc}{subsection}{Example}
After setting socket option \verb|IPV6_FLOWINFO|
flowlabel and DS field are received as ancillary data object
of type \verb|IPV6_FLOWINFO| and level \verb|SOL_IPV6|.
In the cases when it is convenient to use \verb|recvfrom(2)|,
it is possible to replace library variant with your own one,
sort of:
\begin{verbatim}
#include <sys/socket.h>
#include <netinet/in6.h>
size_t recvfrom(int fd, char *buf, size_t len, int flags,
struct sockaddr *addr, int *addrlen)
{
size_t cc;
char cbuf[128];
struct cmsghdr *c;
struct iovec iov = { buf, len };
struct msghdr msg = { addr, *addrlen,
&iov, 1,
cbuf, sizeof(cbuf),
0 };
cc = recvmsg(fd, &msg, flags);
if (cc < 0)
return cc;
((struct sockaddr_in6*)addr)->sin6_flowinfo = 0;
*addrlen = msg.msg_namelen;
for (c=CMSG_FIRSTHDR(&msg); c; c = CMSG_NEXTHDR(&msg, c)) {
if (c->cmsg_level != SOL_IPV6 ||
c->cmsg_type != IPV6_FLOWINFO)
continue;
((struct sockaddr_in6*)addr)->sin6_flowinfo = *(__u32*)CMSG_DATA(c);
}
return cc;
}
\end{verbatim}
\section{Flow label management.}
\paragraph{Discussion.}
\addcontentsline{toc}{subsection}{Discussion}
Requirements of RFC2460 are pretty tough. Particularly, lifetimes
longer than boot time require to store allocated labels at stable
storage, so that the full implementation necessarily includes user space flow
label manager. There are at least three different approaches:
\begin{enumerate}
\item {\bf ``Cooperative''. } We could leave flow label allocation wholly
to user space. When user needs label he requests manager directly. The approach
is valid, but as any ``cooperative'' approach it suffers of security problems.
\begin{NB}
One idea is to disallow not privileged user to allocate flow
labels, but instead to pass the socket to manager via \verb|SCM_RIGHTS|
control message, so that it will allocate label and assign it to socket
itself. Hmm... the idea is interesting.
\end{NB}
\item {\bf ``Indirect''.} Kernel redirects requests to user level daemon
and does not install label until the daemon acknowledged the request.
The approach is the most promising, it is especially pleasant to recognize
parallel with IPsec API [RFC2367,Craig]. Actually, it may share API with
IPsec.
\item {\bf ``Stupid''.} To allocate labels in kernel space. It is the simplest
method, but it suffers of two serious flaws: the first,
we cannot lease labels with lifetimes longer than boot time, the second,
it is sensitive to DoS attacks. Kernel have to remember all the obsolete
labels until their expiration and malicious user may fastly eat all the
flow label space.
\end{enumerate}
Certainly, I choose the most ``stupid'' method. It is the cheapest one
for implementor (i.e.\ me), and taking into account that flow labels
still have no serious applications it is not useful to work on more
advanced API, especially, taking into account that eventually we
will get it for no fee together with IPsec.
\paragraph{Implementation.}
\addcontentsline{toc}{subsection}{Implementation}
Socket option \verb|IPV6_FLOWLABEL_MGR| allows to
request flow label manager to allocate new flow label, to reuse
already allocated one or to delete old flow label.
Its argument is \verb|struct| \verb|in6_flowlabel_req|:
\begin{verbatim}
struct in6_flowlabel_req
{
struct in6_addr flr_dst;
__u32 flr_label;
__u8 flr_action;
__u8 flr_share;
__u16 flr_flags;
__u16 flr_expires;
__u16 flr_linger;
__u32 __flr_reserved;
/* Options in format of IPV6_PKTOPTIONS */
};
\end{verbatim}
\begin{itemize}
\item \verb|dst| is IPv6 destination address associated with the label.
\item \verb|label| is flow label value in network byte order. If it is zero,
kernel will allocate new pseudo-random number. Otherwise, kernel will try
to lease flow label ordered by user. In this case, it is user task to provide
necessary flow label randomness.
\item \verb|action| is requested operation. Currently, only three operations
are defined:
\begin{verbatim}
#define IPV6_FL_A_GET 0 /* Get flow label */
#define IPV6_FL_A_PUT 1 /* Release flow label */
#define IPV6_FL_A_RENEW 2 /* Update expire time */
\end{verbatim}
\item \verb|flags| are optional modifiers. Currently
only \verb|IPV6_FL_A_GET| has modifiers:
\begin{verbatim}
#define IPV6_FL_F_CREATE 1 /* Allowed to create new label */
#define IPV6_FL_F_EXCL 2 /* Do not create new label */
\end{verbatim}
\item \verb|share| defines who is allowed to reuse the same flow label.
\begin{verbatim}
#define IPV6_FL_S_NONE 0 /* Not defined */
#define IPV6_FL_S_EXCL 1 /* Label is private */
#define IPV6_FL_S_PROCESS 2 /* May be reused by this process */
#define IPV6_FL_S_USER 3 /* May be reused by this user */
#define IPV6_FL_S_ANY 255 /* Anyone may reuse it */
\end{verbatim}
\item \verb|linger| is time in seconds. After the last user releases flow
label, it will not be reused with different destination and options at least
during this time. If \verb|share| is not \verb|IPV6_FL_S_EXCL| the label
still can be shared by another sockets. Current implementation does not allow
unprivileged user to set linger longer than 60 sec.
\item \verb|expires| is time in seconds. Flow label will be kept at least
for this time, but it will not be destroyed before user released it explicitly
or closed all the sockets using it. Current implementation does not allow
unprivileged user to set timeout longer than 60 sec. Proviledged applications
MAY set longer lifetimes, but in this case they MUST save allocated
labels at stable storage and restore them back after reboot before the first
application allocates new flow.
\end{itemize}
This structure is followed by optional extension headers associated
with this flow label in format of \verb|IPV6_PKTOPTIONS|. Only
\verb|IPV6_HOPOPTS|, \verb|IPV6_RTHDR| and, if \verb|IPV6_RTHDR| presents,
\verb|IPV6_DSTOPTS| are allowed.
\paragraph{Example.}
\addcontentsline{toc}{subsection}{Example}
The function \verb|get_flow_label| allocates
private flow label.
\begin{verbatim}
int get_flow_label(int fd, struct sockaddr_in6 *dst, __u32 fl)
{
int on = 1;
struct in6_flowlabel_req freq;
memset(&freq, 0, sizeof(freq));
freq.flr_label = htonl(fl);
freq.flr_action = IPV6_FL_A_GET;
freq.flr_flags = IPV6_FL_F_CREATE | IPV6_FL_F_EXCL;
freq.flr_share = IPV6_FL_S_EXCL;
memcpy(&freq.flr_dst, &dst->sin6_addr, 16);
if (setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
&freq, sizeof(freq)) == -1) {
perror ("can't lease flowlabel");
return -1;
}
dst->sin6_flowinfo |= freq.flr_label;
if (setsockopt(fd, SOL_IPV6, IPV6_FLOWINFO_SEND,
&on, sizeof(on)) == -1) {
perror ("can't send flowinfo");
freq.flr_action = IPV6_FL_A_PUT;
setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
&freq, sizeof(freq));
return -1;
}
return 0;
}
\end{verbatim}
A bit more complicated example using routing header can be found
in \verb|ping6| utility (\verb|iputils| package). Linux rsvpd backend
contains an example of using operation \verb|IPV6_FL_A_RENEW|.
\paragraph{Listing flow labels.}
\addcontentsline{toc}{subsection}{Listing flow labels}
List of currently allocated
flow labels may be read from \verb|/proc/net/ip6_flowlabel|.
\begin{verbatim}
Label S Owner Users Linger Expires Dst Opt
A1BE5 1 0 0 6 3 3ffe2400000000010a0020fffe71fb30 0
\end{verbatim}
\begin{itemize}
\item \verb|Label| is hexadecimal flow label value.
\item \verb|S| is sharing style.
\item \verb|Owner| is ID of creator, it is zero, pid or uid, depending on
sharing style.
\item \verb|Users| is number of applications using the label now.
\item \verb|Linger| is \verb|linger| of this label in seconds.
\item \verb|Expires| is time until expiration of the label in seconds. It may
be negative, if the label is in use.
\item \verb|Dst| is IPv6 destination address.
\item \verb|Opt| is length of options, associated with the label. Option
data are not accessible.
\end{itemize}
\paragraph{Flow labels and RSVP.}
\addcontentsline{toc}{subsection}{Flow labels and RSVP}
RSVP daemon supports IPv6 flow labels
without any modifications to standard ISI RAPI. Sender must allocate
flow label, fill corresponding sender template and submit it to local rsvp
daemon. rsvpd will check the label and start to announce it in PATH
messages. Rsvpd on sender node will renew the flow label, so that it will not
be reused before path state expires and all the intermediate
routers and receiver purge flow state.
\verb|rtap| utility is modified to parse flow labels. F.e.\ if user allocated
flow label \verb|0xA1234|, he may write:
\begin{verbatim}
RTAP> sender 3ffe:2400::1/FL0xA1234 <Tspec>
\end{verbatim}
Receiver makes reservation with command:
\begin{verbatim}
RTAP> reserve ff 3ffe:2400::1/FL0xA1234 <Flowspec>
\end{verbatim}
\end{document}

View File

@ -1,130 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>ARPD Daemon
<author>Alexey Kuznetsov, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/arpd/ is daemon collecting gratuitous ARP information, saving
it on local disk and feeding it to kernel on demand to avoid
redundant broadcasting due to limited size of kernel ARP cache.
</abstract>
<p><bf/Description/
<p>The format of the command is:
<tscreen><verb>
arpd OPTIONS [ INTERFACE [ INTERFACE ... ] ]
</verb></tscreen>
<p> <tt/OPTIONS/ are:
<itemize>
<item><tt/-l/ - dump <tt/arpd/ database to stdout and exit. Output consists
of three columns: interface index, IP address and MAC address.
Negative entries for dead hosts are also shown, in this case MAC address
is replaced by word <tt/FAILED/ followed by colon and time when the fact
that host is dead was proven the last time.
<item><tt/-f FILE/ - read and load <tt/arpd/ database from <tt/FILE/
in text format similar dumped by option <tt/-l/. Exit after load,
probably listing resulting database, if option <tt/-l/ is also given.
If <tt/FILE/ is <tt/-/, <tt/stdin/ is read to get ARP table.
<item><tt/-b DATABASE/ - location of database file. Default location is
<tt>/var/lib/arpd/arpd.db</tt>.
<item><tt/-a NUMBER/ - <tt/arpd/ not only passively listens ARP on wire, but
also send brodcast queries itself. <tt/NUMBER/ is number of such queries
to make before destination is considered as dead. When <tt/arpd/ is started
as kernel helper (i.e. with <tt/app_solicit/ enabled in <tt/sysctl/
or even with option <tt/-k/) without this option and still did not learn enough
information, you can observe 1 second gaps in service. Not fatal, but
not good.
<item><tt/-k/ - suppress sending broadcast queries by kernel. It takes
sense together with option <tt/-a/.
<item><tt/-n TIME/ - timeout of negative cache. When resolution fails <tt/arpd/
suppresses further attempts to resolve for this period. It makes sense
only together with option <tt/-k/. This timeout should not be too much
longer than boot time of a typical host not supporting gratuitous ARP.
Default value is 60 seconds.
<item><tt/-R RATE/ - maximal steady rate of broadcasts sent by <tt/arpd/
in packets per second. Default value is 1.
<item><tt/-B NUMBER/ - number of broadcasts sent by <tt/arpd/ back to back.
Default value is 3. Together with option <tt/-R/ this option allows
to police broadcasting not to exceed <tt/B+R*T/ over any interval
of time <tt/T/.
</itemize>
<p><tt/INTERFACE/ is name of networking inteface to watch.
If no interfaces given, <tt/arpd/ monitors all the interfaces.
In this case <tt/arpd/ does not adjust <tt/sysctl/ parameters,
it is supposed user does this himself after <tt/arpd/ is started.
<p> Signals
<p> <tt/arpd/ exits gracefully syncing database and restoring adjusted
<tt/sysctl/ parameters, when receives <tt/SIGINT/ or <tt/SIGTERM/.
<tt/SIGHUP/ syncs database to disk. <tt/SIGUSR1/ sends some statistics
to <tt/syslog/. Effect of another signals is undefined, they may corrupt
database and leave <tt/sysctl/ parameters in an unpredictable state.
<p> Note
<p> In order to <tt/arpd/ be able to serve as ARP resolver, kernel must be
compiled with the option <tt/CONFIG_ARPD/ and, in the case when interface list
is not given on command line, variable <tt/app_solicit/
on interfaces of interest should be set in <tt>/proc/sys/net/ipv4/neigh/*</tt>.
If this is not made <tt/arpd/ still collects gratuitous ARP information
in its database.
<p> Examples
<enum>
<item> Start <tt/arpd/ to collect gratuitous ARP, but not messing
with kernel functionality:
<tscreen><verb>
arpd -b /var/tmp/arpd.db
</verb></tscreen>
<item> Look at result after some time:
<tscreen><verb>
killall arpd
arpd -l -b /var/tmp/arpd.db
</verb></tscreen>
<item> To enable kernel helper, leaving leading role to kernel:
<tscreen><verb>
arpd -b /var/tmp/arpd.db -a 1 eth0 eth1
</verb></tscreen>
<item> Completely replace kernel resolution on interfaces <tt/eth0/
and <tt/eth1/. In this case kernel still does unicast probing to
validate entries, but all the broadcast activity is suppressed
and made under authority of <tt/arpd/:
<tscreen><verb>
arpd -b /var/tmp/arpd.db -a 3 -k eth0 eth1
</verb></tscreen>
This is mode which <tt/arpd/ is supposed to work normally.
It is not default just to prevent occasional enabling of too aggressive
mode occasionally.
</enum>
</article>

View File

@ -1,16 +0,0 @@
#! /bin/bash
# $1 = Temporary file . "string"
# $2 = File to process . "string"
# $3 = Page size . ie: a4 , letter ... "string"
# $4 = Number of pages to fit on a single sheet . "numeric"
if type psnup >&/dev/null; then
echo "psnup -$4 -p$3 $1 $2"
psnup -$4 -p$3 $1 $2
elif type psmulti >&/dev/null; then
echo "psmulti $1 > $2"
psmulti $1 > $2
else
echo "cp $1 $2"
cp $1 $2
fi

File diff suppressed because it is too large Load Diff

View File

@ -1,469 +0,0 @@
\documentstyle[12pt,twoside]{article}
\def\TITLE{Tunnels over IP}
\input preamble
\begin{center}
\Large\bf Tunnels over IP in Linux-2.2
\end{center}
\begin{center}
{ \large Alexey~N.~Kuznetsov } \\
\em Institute for Nuclear Research, Moscow \\
\verb|kuznet@ms2.inr.ac.ru| \\
\rm March 17, 1999
\end{center}
\vspace{5mm}
\tableofcontents
\section{Instead of introduction: micro-FAQ.}
\begin{itemize}
\item
Q: In linux-2.0.36 I used:
\begin{verbatim}
ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65
\end{verbatim}
to create tunnel. It does not work in 2.2.0!
A: You are right, it does not work. The command written above is split to two commands.
\begin{verbatim}
ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65
\end{verbatim}
will create tunnel device with name \verb|MY-TUNNEL|. Now you may configure
it with:
\begin{verbatim}
ifconfig MY-TUNNEL 10.0.0.1
\end{verbatim}
Certainly, if you prefer name \verb|tunl1| to \verb|MY-TUNNEL|,
you still may use it.
\item
Q: In linux-2.0.36 I used:
\begin{verbatim}
ifconfig tunl0 10.0.0.1
route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0
\end{verbatim}
to tunnel net 10.0.0.0 via router 193.233.7.65. It does not
work in 2.2.0! Moreover, \verb|route| prints a funny error sort of
``network unreachable'' and after this I found a strange direct route
to 10.0.0.0 via \verb|tunl0| in routing table.
A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly
connected network has not any exceptions. You may tell kernel, that
this particular route is {\em abnormal}:
\begin{verbatim}
ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink
\end{verbatim}
Note keyword \verb|onlink|, it is the magic key that orders kernel
not to check for consistency of gateway address.
Probably, after this explanation you have already guessed another method
to cheat kernel:
\begin{verbatim}
ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
route add -host 193.233.7.65 dev tunl0
route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65
route del -host 193.233.7.65 dev tunl0
\end{verbatim}
Well, if you like such tricks, nobody may prohibit you to use them.
Only do not forget
that between \verb|route add| and \verb|route del| host 193.233.7.65 is
unreachable.
\item
Q: In 2.0.36 I used to load \verb|tunnel| device module and \verb|ipip| module.
I cannot find any \verb|tunnel| in 2.2!
A: Linux-2.2 has single module \verb|ipip| for both directions of tunneling
and for all IPIP tunnel devices.
\item
Q: \verb|traceroute| does not work over tunnel! Well, stop... It works,
only skips some number of hops.
A: Yes. By default tunnel driver copies \verb|ttl| value from
inner packet to outer one. It means that path traversed by tunneled
packets to another endpoint is not hidden. If you dislike this, or if you
are going to use some routing protocol expecting that packets
with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP)
and you are not afraid of
tunnel loops, you may append option \verb|ttl 64|, when creating tunnel
with \verb|ip tunnel add|.
\item
Q: ... Well, list of things, which 2.0 was able to do finishes.
\end{itemize}
\paragraph{Summary of differences between 2.2 and 2.0.}
\begin{itemize}
\item {\bf In 2.0} you could compile tunnel device into kernel
and got set of 4 devices \verb|tunl0| ... \verb|tunl3| or,
alternatively, compile it as module and load new module
for each new tunnel. Also, module \verb|ipip| was necessary
to receive tunneled packets.
{\bf 2.2} has {\em one\/} module \verb|ipip|. Loading it you get base
tunnel device \verb|tunl0| and another tunnels may be created with command
\verb|ip tunnel add|. These new devices may have arbitrary names.
\item {\bf In 2.0} you set remote tunnel endpoint address with
the command \verb|ifconfig| ... \verb|pointopoint A|.
{\bf In 2.2} this command has the same semantics on all
the interfaces, namely it sets not tunnel endpoint,
but address of peering host, which is directly reachable
via this tunnel,
rather than via Internet. Actual tunnel endpoint address \verb|A|
should be set with \verb|ip tunnel add ... remote A|.
\item {\bf In 2.0} you create tunnel routes with the command:
\begin{verbatim}
route add -net 10.0.0.0 gw A dev tunl0
\end{verbatim}
{\bf 2.2} interprets this command equally for all device
kinds and gateway is required to be directly reachable via this tunnel,
rather than via Internet. You still may use \verb|ip route add ... onlink|
to override this behaviour.
\end{itemize}
\section{Tunnel setup: basics}
Standard Linux-2.2 kernel supports three flavor of tunnels,
listed in the following table:
\vspace{2mm}
\begin{tabular}{lll}
\vrule depth 0.8ex width 0pt\relax
Mode & Description & Base device \\
ipip & IP over IP & tunl0 \\
sit & IPv6 over IP & sit0 \\
gre & ANY over GRE over IP & gre0
\end{tabular}
\vspace{2mm}
\noindent All the kinds of tunnels are created with one command:
\begin{verbatim}
ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ]
\end{verbatim}
This command creates new tunnel device with name \verb|<NAME>|.
The \verb|<NAME>| is an arbitrary string. Particularly,
it may be even \verb|eth0|. The rest of parameters set
different tunnel characteristics.
\begin{itemize}
\item
\verb|mode <MODE>| sets tunnel mode. Three modes are available now
\verb|ipip|, \verb|sit| and \verb|gre|.
\item
\verb|remote <D>| sets remote endpoint of the tunnel to IP
address \verb|<D>|.
\item
\verb|local <S>| sets fixed local address for tunneled
packets. It must be an address on another interface of this host.
\end{itemize}
\let\thefootnote\oldthefootnote
Both \verb|remote| and \verb|local| may be omitted. In this case we
say that they are zero or wildcard. Two tunnels of one mode cannot
have the same \verb|remote| and \verb|local|. Particularly it means
that base device or fallback tunnel cannot be replicated.\footnote{
This restriction is relaxed for keyed GRE tunnels.}
Tunnels are divided to two classes: {\bf pointopoint} tunnels, which
have some not wildcard \verb|remote| address and deliver all the packets
to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels,
which have no \verb|remote|. Particularly, base devices (f.e.\ \verb|tunl0|)
are NBMA, because they have neither \verb|remote| nor
\verb|local| addresses.
After tunnel device is created you should configure it as you did
it with another devices. Certainly, the configuration of tunnels has
some features related to the fact that they work over existing Internet
routing infrastructure and simultaneously create new virtual links,
which changes this infrastructure. The danger that not enough careful
tunnel setup will result in formation of tunnel loops,
collapse of routing or flooding network with exponentially
growing number of tunneled fragments is very real.
Protocol setup on pointopoint tunnels does not differ of configuration
of another devices. You should set a protocol address with \verb|ifconfig|
and add routes with \verb|route| utility.
NBMA tunnels are different. To route something via NBMA tunnel
you have to explain to driver, where it should deliver packets to.
The only way to make it is to create special routes with gateway
address pointing to desired endpoint. F.e.\
\begin{verbatim}
ip route add 10.0.0.0/24 via <A> dev tunl0 onlink
\end{verbatim}
It is important to use option \verb|onlink|, otherwise
kernel will refuse request to create route via gateway not directly
reachable over device \verb|tunl0|. With IPv6 the situation is much simpler:
when you start device \verb|sit0|, it automatically configures itself
with all IPv4 addresses mapped to IPv6 space, so that all IPv4
Internet is {\em really reachable} via \verb|sit0|! Excellent, the command
\begin{verbatim}
ip route add 3FFE::/16 via ::193.233.7.65 dev sit0
\end{verbatim}
will route \verb|3FFE::/16| via \verb|sit0|, sending all the packets
destined to this prefix to 193.233.7.65.
\section{Tunnel setup: options}
Command \verb|ip tunnel add| has several additional options.
\begin{itemize}
\item \verb|ttl N| --- set fixed TTL \verb|N| on tunneled packets.
\verb|N| is number in the range 1--255. 0 is special value,
meaning that packets inherit TTL value.
Default value is: \verb|inherit|.
\item \verb|tos T| --- set fixed tos \verb|T| on tunneled packets.
Default value is: \verb|inherit|.
\item \verb|dev DEV| --- bind tunnel to device \verb|DEV|, so that
tunneled packets will be routed only via this device and will
not be able to escape to another device, when route to endpoint changes.
\item \verb|nopmtudisc| --- disable Path MTU Discovery on this tunnel.
It is enabled by default. Note that fixed ttl is incompatible
with this option: tunnels with fixed ttl always make pmtu discovery.
\end{itemize}
\verb|ipip| and \verb|sit| tunnels have no more options. \verb|gre|
tunnels are more complicated:
\begin{itemize}
\item \verb|key K| --- use keyed GRE with key \verb|K|. \verb|K| is
either number or IP address-like dotted quad.
\item \verb|csum| --- checksum tunneled packets.
\item \verb|seq| --- serialize packets.
\begin{NB}
I think this option does not
work. At least, I did not test it, did not debug it and
even do not understand, how it is supposed to work and for what
purpose Cisco planned to use it.
\end{NB}
\end{itemize}
Actually, these GRE options can be set separately for input and
output directions by prefixing corresponding keywords with letter
\verb|i| or \verb|o|. F.e.\ \verb|icsum| orders to accept only
packets with correct checksum and \verb|ocsum| means, that
our host will calculate and send checksum.
Command \verb|ip tunnel add| is not the only operation,
which can be made with tunnels. Certainly, you may get short help page
with:
\begin{verbatim}
ip tunnel help
\end{verbatim}
Besides that, you may view list of installed tunnels with the help of command:
\begin{verbatim}
ip tunnel ls
\end{verbatim}
Also you may look at statistics:
\begin{verbatim}
ip -s tunnel ls Cisco
\end{verbatim}
where \verb|Cisco| is name of tunnel device. Command
\begin{verbatim}
ip tunnel del Cisco
\end{verbatim}
destroys tunnel \verb|Cisco|. And, finally,
\begin{verbatim}
ip tunnel change Cisco mode sit local ME remote HE ttl 32
\end{verbatim}
changes its parameters.
\section{Differences 2.2 and 2.0 tunnels revisited.}
Now we can discuss more subtle differences between tunneling in 2.0
and 2.2.
\begin{itemize}
\item In 2.0 all tunneled packets were received promiscuously
as soon as you loaded module \verb|ipip|. 2.2 tries to select the best
tunnel device and packet looks as received on this. F.e.\ if host
received \verb|ipip| packet from host \verb|D| destined to our
local address \verb|S|, kernel searches for matching tunnels
in order:
\begin{tabular}{ll}
1 & \verb|remote| is \verb|D| and \verb|local| is \verb|S| \\
2 & \verb|remote| is \verb|D| and \verb|local| is wildcard \\
3 & \verb|remote| is wildcard and \verb|local| is \verb|S| \\
4 & \verb|tunl0|
\end{tabular}
If tunnel exists, but it is not in \verb|UP| state, the tunnel is ignored.
Note, that if \verb|tunl0| is \verb|UP| it receives all the IPIP packets,
not acknowledged by more specific tunnels.
Be careful, it means that without carefully installed firewall rules
anyone on the Internet may inject to your network any packets with
source addresses indistinguishable from local ones. It is not so bad idea
to design tunnels in the way enforcing maximal route symmetry
and to enable reversed path filter (\verb|rp_filter| sysctl option) on
tunnel devices.
\item In 2.2 you can monitor and debug tunnels with \verb|tcpdump|.
F.e.\ \verb|tcpdump| \verb|-i Cisco| \verb|-nvv| will dump packets,
which kernel output, via tunnel \verb|Cisco| and the packets received on it
from kernel viewpoint.
\end{itemize}
\section{Linux and Cisco IOS tunnels.}
Among another tunnels Cisco IOS supports IPIP and GRE.
Essentially, Cisco setup is subset of options, available for Linux.
Let us consider the simplest example:
\begin{verbatim}
interface Tunnel0
tunnel mode gre ip
tunnel source 10.10.14.1
tunnel destination 10.10.13.2
\end{verbatim}
This command set translates to:
\begin{verbatim}
ip tunnel add Tunnel0 \
mode gre \
local 10.10.14.1 \
remote 10.10.13.2
\end{verbatim}
Any questions? No questions.
\section{Interaction IPIP tunnels and DVMRP.}
DVMRP exploits IPIP tunnels to route multicasts via Internet.
\verb|mrouted| creates
IPIP tunnels listed in its configuration file automatically.
From kernel and user viewpoints there are no differences between
tunnels, created in this way, and tunnels created by \verb|ip tunnel|.
I.e.\ if \verb|mrouted| created some tunnel, it may be used to
route unicast packets, provided appropriate routes are added.
And vice versa, if administrator has already created a tunnel,
it will be reused by \verb|mrouted|, if it requests DVMRP
tunnel with the same local and remote addresses.
Do not wonder, if your manually configured tunnel is
destroyed, when mrouted exits.
\section{Broadcast GRE ``tunnels''.}
It is possible to set \verb|remote| for GRE tunnel to a multicast
address. Such tunnel becomes {\bf broadcast} tunnel (though word
tunnel is not quite appropriate in this case, it is rather virtual network).
\begin{verbatim}
ip tunnel add Universe local 193.233.7.65 \
remote 224.66.66.66 ttl 16
ip addr add 10.0.0.1/16 dev Universe
ip link set Universe up
\end{verbatim}
This tunnel is true broadcast network and broadcast packets are
sent to multicast group 224.66.66.66. By default such tunnel starts
to resolve both IP and IPv6 addresses via ARP/NDISC, so that
if multicast routing is supported in surrounding network, all GRE nodes
will find one another automatically and will form virtual Ethernet-like
broadcast network. If multicast routing does not work, it is unpleasant
but not fatal flaw. The tunnel becomes NBMA rather than broadcast network.
You may disable dynamic ARPing by:
\begin{verbatim}
echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit
\end{verbatim}
and to add required information to ARP tables manually:
\begin{verbatim}
ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent
\end{verbatim}
In this case packets sent to 10.0.0.2 will be encapsulated in GRE
and sent to 128.6.190.2. It is possible to facilitate address resolution
using methods typical for another NBMA networks f.e.\ to start user
level \verb|arpd| daemon, which will maintain database of hosts attached
to GRE virtual network or ask for information
dedicated ARP or NHRP server.
Actually, such setup is the most natural for tunneling,
it is really flexible, scalable and easily managable, so that
it is strongly recommended to be used with GRE tunnels instead of ugly
hack with NBMA mode and \verb|onlink| modifier. Unfortunately,
by historical reasons broadcast mode is not supported by IPIP tunnels,
but this probably will change in future.
\section{Traffic control issues.}
Tunnels are devices, hence all the power of Linux traffic control
applies to them. The simplest (and the most useful in practice)
example is limiting tunnel bandwidth. The following command:
\begin{verbatim}
tc qdisc add dev tunl0 root tbf \
rate 128Kbit burst 4K limit 10K
\end{verbatim}
will limit tunneled traffic to 128Kbit with maximal burst size of 4K
and queuing not more than 10K.
However, you should remember, that tunnels are {\em virtual} devices
implemented in software and true queue management is impossible for them
just because they have no queues. Instead, it is better to create classes
on real physical interfaces and to map tunneled packets to them.
In general case of dynamic routing you should create such classes
on all outgoing interfaces, or, alternatively,
to use option \verb|dev DEV| to bind tunnel to a fixed physical device.
In the last case packets will be routed only via specified device
and you need to setup corresponding classes only on it.
Though you have to pay for this convenience,
if routing will change, your tunnel will fail.
Suppose that CBQ class \verb|1:ABC| has been created on device \verb|eth0|
specially for tunnel \verb|Cisco| with endpoints \verb|S| and \verb|D|.
Now you can select IPIP packets with addresses \verb|S| and \verb|D|
with some classifier and map them to class \verb|1:ABC|. F.e.\
it is easy to make with \verb|rsvp| classifier:
\begin{verbatim}
tc filter add dev eth0 pref 100 proto ip rsvp \
session D ipproto ipip filter S \
classid 1:ABC
\end{verbatim}
If you want to make more detailed classification of sub-flows
transmitted via tunnel, you can build CBQ subtree,
rooted at \verb|1:ABC| and attach to subroot set of rules parsing
IPIP packets more deeply.
\end{document}

View File

@ -1,110 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>NSTAT, IFSTAT and RTACCT Utilities
<author>Alexey Kuznetosv, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/nstat/, <tt/ifstat/ and <tt/rtacct/ are simple tools helping
to monitor kernel snmp counters and network interface statistics.
</abstract>
<p> These utilities are very similar, so that I describe
them simultaneously, using name <tt/Xstat/ in the places which apply
to all of them.
<p>The format of the command is:
<tscreen><verb>
Xstat [ OPTIONS ] [ PATTERN [ PATTERN ... ] ]
</verb></tscreen>
<p>
<tt/PATTERN/ is shell style pattern, selecting identifier
of SNMP variables or interfaces to show. Variable is displayed
if one of patterns matches its name. If no patterns are given,
<tt/Xstat/ assumes that user wants to see all the variables.
<p> <tt/OPTIONS/ is list of single letter options, using common unix
conventions.
<itemize>
<item><tt/-h/ - show help page
<item><tt/-?/ - the same, of course
<item><tt/-v/, <tt/-V/ - print version of <tt/Xstat/ and exit
<item><tt/-z/ - dump zero counters too. By default they are not shown.
<item><tt/-a/ - dump absolute values of counters. By default <tt/Xstat/
calculates increments since the previous use.
<item><tt/-s/ - do not update history, so that the next time you will
see counters including values accumulated to the moment
of this measurement too.
<item><tt/-n/ - do not display anything, only update history.
<item><tt/-r/ - reset history.
<item><tt/-d INTERVAL/ - <tt/Xstat/ is run in daemon mode collecting
statistics. <tt/INTERVAL/ is interval between measurements
in seconds.
<item><tt/-t INTERVAL/ - time interval to average rates. Default value
is 60 seconds.
<item><tt/-e/ - display extended information about errors (<tt/ifstat/ only).
</itemize>
<p>
History is just dump saved in file <tt>/tmp/.Xstat.uUID</tt>
or in file given by environment variables <tt/NSTAT_HISTORY/,
<tt/IFSTAT_HISTORY/ and <tt/RTACCT_HISTORY/.
Each time when you use <tt/Xstat/ values there are updated.
If you use patterns, only the values which you _really_ see
are updated. If you want to skip an unintersting period,
use option <tt/-n/, or just output to <tt>/dev/null</tt>.
<p>
<tt/Xstat/ understands when history is invalidated by system reboot
or source of information switched between different instances
of daemonic <tt/Xstat/ and kernel SNMP tables and does not
use invalid history.
<p> Beware, <tt/Xstat/ will not produce sane output,
when many processes use it simultaneously. If several processes
under single user need this utility they should use environment
variables to put their history in safe places
or to use it with options <tt/-a -s/.
<p>
Well, that's all. The utility is very simple, but nevertheless
very handy.
<p> <bf/Output of XSTAT/
<p> The first line of output is <tt/#/ followed by identifier
of source of information, it may be word <tt/kernel/, when <tt/Xstat/
gets information from kernel or some dotted decimal number followed
by parameters, when it obtains information from running <tt/Xstat/ daemon.
<p>In the case of <tt/nstat/ the rest of output consists of three columns:
SNMP MIB identifier,
its value (or increment since previous measurement) and average
rate of increase of the counter per second. <tt/ifstat/ outputs
interface name followed by pairs of counter and rate of its change.
<p> <bf/Daemonic Xstat/
<p> <tt/Xstat/ may be started as daemon by any user. This makes sense
to avoid wrapped counters and to obtain reasonable long counters
for large time. Also <tt/Xstat/ daemon calculates average rates.
For the first goal sampling interval (option <tt/-d/) may be large enough,
f.e. for gigabit rates byte counters overflow not more frequently than
each 40 seconds and you may select interval of 20 seconds.
From the other hand, when <tt/Xstat/ is used for estimating rates
interval should be less than averaging period (option <tt/-t/), otherwise
estimation loses in quality.
Client <tt/Xstat/, before trying to get information from the kernel,
contacts daemon started by this user, then it tries system wide
daemon, which is supposed to be started by superuser. And only if
none of them replied it gets information from kernel.
<p> <bf/Environment/
<p> <tt/NSTAT_HISTORY/ - name of history file for <tt/nstat/.
<p> <tt/IFSTAT_HISTORY/ - name of history file for <tt/ifstat/.
<p> <tt/RTACCT_HISTORY/ - name of history file for <tt/rtacct/.
</article>

View File

@ -1,26 +0,0 @@
\textwidth 6.0in
\textheight 8.5in
\input SNAPSHOT
\pagestyle{myheadings}
\markboth{\protect\TITLE}{}
\markright{{\protect\sc iproute2-ss\Draft}}
% To print it in compact form: both sides on one sheet (psnup -2)
\evensidemargin=\oddsidemargin
\newenvironment{NB}{\bgroup \vskip 1mm\leftskip 1cm \footnotesize \noindent NB.
}{\par\egroup \vskip 1mm}
\def\threeonly{[2.3.15+ only] }
\begin{document}
\makeatletter
\renewcommand{\@oddhead}{{\protect\sc iproute2-ss\Draft} \hfill \protect\arabic{page}}
\makeatother
\let\oldthefootnote\thefootnote
\def\thefootnote{}
\footnotetext{Copyright \copyright~1999 A.N.Kuznetsov}

View File

@ -1,52 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>RTACCT Utility
<author>Robert Olsson
<date>some_negative_number, 20 Dec 2001
<p>
Here is some code for monitoring the route cache. For systems handling high
network load, servers, routers, firewalls etc the route cache and its garbage
collection is crucial. Linux has a solid implementation.
<p>
The kernel patch (not required since linux-2.4.7) adds statistics counters
from route cache process into
/proc/net/rt_cache_stat. A companion user mode program presents the statistics
in a vmstat or iostat manner. The ratio between cache hits and misses gives
the flow length.
<p>
Hopefully it can help understanding performance and DoS and other related
issues.
<p> An URL where newer versions of this utility can be (probably) found
is ftp://robur.slu.se/pub/Linux/net-development/rt_cache_stat/
<p><bf/Description/
<p>The format of the command is:
<tscreen><verb>
rtstat [ OPTIONS ]
</verb></tscreen>
<p> <tt/OPTIONS/ are:
<itemize>
<item><tt/-h/, <tt/-help/ - show help page and version of the utility.
<item><tt/-i INTERVAL/ - interval between snapshots, default value is
2 seconds.
<item><tt/-s NUMBER/ - whether to print header line. 0 inhibits header line,
1 prescribes to print it once and 2 (this is default setting) forces header
line each 20 lines.
</itemize>
</article>

View File

@ -1,525 +0,0 @@
<!doctype linuxdoc system>
<article>
<title>SS Utility: Quick Intro
<author>Alexey Kuznetosv, <tt/kuznet@ms2.inr.ac.ru/
<date>some_negative_number, 20 Sep 2001
<abstract>
<tt/ss/ is one another utility to investigate sockets.
Functionally it is NOT better than <tt/netstat/ combined
with some perl/awk scripts and though it is surely faster
it is not enough to make it much better. :-)
So, stop reading this now and do not waste your time.
Well, certainly, it proposes some functionality, which current
netstat is still not able to do, but surely will soon.
</abstract>
<sect>Why?
<p> <tt>/proc</tt> interface is inadequate, unfortunately.
When amount of sockets is enough large, <tt/netstat/ or even
plain <tt>cat /proc/net/tcp/</tt> cause nothing but pains and curses.
In linux-2.4 the desease became worse: even if amount
of sockets is small reading <tt>/proc/net/tcp/</tt> is slow enough.
This utility presents a new approach, which is supposed to scale
well. I am not going to describe technical details here and
will concentrate on description of the command.
The only important thing to say is that it is not so bad idea
to load module <tt/tcp_diag/, which can be found in directory
<tt/Modules/ of <tt/iproute2/. If you do not make this <tt/ss/
will work, but it falls back to <tt>/proc</tt> and becomes slow
like <tt/netstat/, well, a bit faster yet (see section "Some numbers").
<sect>Old news
<p>
In the simplest form <tt/ss/ is equivalent to netstat
with some small deviations.
<itemize>
<item><tt/ss -t -a/ dumps all TCP sockets
<item><tt/ss -u -a/ dumps all UDP sockets
<item><tt/ss -w -a/ dumps all RAW sockets
<item><tt/ss -x -a/ dumps all UNIX sockets
</itemize>
<p>
Option <tt/-o/ shows TCP timers state.
Option <tt/-e/ shows some extended information.
Etc. etc. etc. Seems, all the options of netstat related to sockets
are supported. Though not AX.25 and other bizarres. :-)
If someone wants, he can make support for decnet and ipx.
Some rudimentary support for them is already present in iproute2 libutils,
and I will be glad to see these new members.
<p>
However, standard functionality is a bit different:
<p>
The first: without option <tt/-a/ sockets in states
<tt/TIME-WAIT/ and <tt/SYN-RECV/ are skipped too.
It is more reasonable default, I think.
<p>
The second: format of UNIX sockets is different. It coincides
with tcp/udp. Though standard kernel still does not allow to
see write/read queues and peer address of connected UNIX sockets,
the patch doing this exists.
<p>
The third: default is to dump only TCP sockets, rather than all of the types.
<p>
The next: by default it does not resolve numeric host addresses (like <tt/ip/)!
Resolving is enabled with option <tt/-r/. Service names, usually stored
in local files, are resolved by default. Also, if service database
does not contain references to a port, <tt/ss/ queries system
<tt/rpcbind/. RPC services are prefixed with <tt/rpc./
Resolution of services may be suppressed with option <tt/-n/.
<p>
It does not accept "long" options (I dislike them, sorry).
So, address family is given with family identifier following
option <tt/-f/ to be algined to iproute2 conventions.
Mostly, it is to allow option parser to parse
addresses correctly, but as side effect it really limits dumping
to sockets supporting only given family. Option <tt/-A/ followed
by list of socket tables to dump is also supported.
Logically, id of socket table is different of _address_ family, which is
another point of incompatibility. So, id is one of
<tt/all/, <tt/tcp/, <tt/udp/,
<tt/raw/, <tt/inet/, <tt/unix/, <tt/packet/, <tt/netlink/. See?
Well, <tt/inet/ is just abbreviation for <tt/tcp|udp|raw/
and it is not difficult to guess that <tt/packet/ allows
to look at packet sockets. Actually, there are also some other abbreviations,
f.e. <tt/unix_dgram/ selects only datagram UNIX sockets.
<p>
The next: well, I still do not know. :-)
<sect>Time to talk about new functionality.
<p>It is builtin filtering of socket lists.
<sect1> Filtering by state.
<p>
<tt/ss/ allows to filter socket states, using keywords
<tt/state/ and <tt/exclude/, followed by some state
identifier.
<p>
State identifier are standard TCP state names (not listed,
they are useless for you if you already do not know them)
or abbreviations:
<itemize>
<item><tt/all/ - for all the states
<item><tt/bucket/ - for TCP minisockets (<tt/TIME-WAIT|SYN-RECV/)
<item><tt/big/ - all except for minisockets
<item><tt/connected/ - not closed and not listening
<item><tt/synchronized/ - connected and not <tt/SYN-SENT/
</itemize>
<p>
F.e. to dump all tcp sockets except <tt/SYN-RECV/:
<tscreen><verb>
ss exclude SYN-RECV
</verb></tscreen>
<p>
If neither <tt/state/ nor <tt/exclude/ directives
are present,
state filter defaults to <tt/all/ with option <tt/-a/
or to <tt/all/,
excluding listening, syn-recv, time-wait and closed sockets.
<sect1> Filtering by addresses and ports.
<p>
Option list may contain address/port filter.
It is boolean expression which consists of boolean operation
<tt/or/, <tt/and/, <tt/not/ and predicates.
Actually, all the flavors of names for boolean operations are eaten:
<tt/&amp/, <tt/&amp&amp/, <tt/|/, <tt/||/, <tt/!/, but do not forget
about special sense given to these symbols by unix shells and escape
them correctly, when used from command line.
<p>
Predicates may be of the folowing kinds:
<itemize>
<item>A. Address/port match, where address is checked against mask
and port is either wildcard or exact. It is one of:
<tscreen><verb>
dst prefix:port
src prefix:port
src unix:STRING
src link:protocol:ifindex
src nl:channel:pid
</verb></tscreen>
Both prefix and port may be absent or replaced with <tt/*/,
which means wildcard. UNIX socket use more powerful scheme
matching to socket names by shell wildcards. Also, prefixes
unix: and link: may be omitted, if address family is evident
from context (with option <tt/-x/ or with <tt/-f unix/
or with <tt/unix/ keyword)
<p>
F.e.
<tscreen><verb>
dst 10.0.0.1
dst 10.0.0.1:
dst 10.0.0.1/32:
dst 10.0.0.1:*
</verb></tscreen>
are equivalent and mean socket connected to
any port on host 10.0.0.1
<tscreen><verb>
dst 10.0.0.0/24:22
</verb></tscreen>
sockets connected to port 22 on network
10.0.0.0...255.
<p>
Note that port separated of address with colon, which creates
troubles with IPv6 addresses. Generally, we interpret the last
colon as splitting port. To allow to give IPv6 addresses,
trick like used in IPv6 HTTP URLs may be used:
<tscreen><verb>
dst [::1]
</verb></tscreen>
are sockets connected to ::1 on any port
<p>
Another way is <tt/dst ::1/128/. / helps to understand that
colon is part of IPv6 address.
<p>
Now we can add another alias for <tt/dst 10.0.0.1/:
<tt/dst [10.0.0.1]/. :-)
<p> Address may be a DNS name. In this case all the addresses are looked
up (in all the address families, if it is not limited by option <tt/-f/
or special address prefix <tt/inet:/, <tt/inet6/) and resulting
expression is <tt/or/ over all of them.
<item> B. Port expressions:
<tscreen><verb>
dport &gt= :1024
dport != :22
sport &lt :32000
</verb></tscreen>
etc.
All the relations: <tt/&lt/, <tt/&gt/, <tt/=/, <tt/>=/, <tt/=/, <tt/==/,
<tt/!=/, <tt/eq/, <tt/ge/, <tt/lt/, <tt/ne/...
Use variant which you like more, but not forget to escape special
characters when typing them in command line. :-)
Note that port number syntactically coincides to the case A!
You may even add an IP address, but it will not participate
incomparison, except for <tt/==/ and <tt/!=/, which are equivalent
to corresponding predicates of type A. F.e.
<p>
<tt/dst 10.0.0.1:22/
is equivalent to <tt/dport eq 10.0.0.1:22/
and
<tt/not dst 10.0.0.1:22/ is equivalent to
<tt/dport neq 10.0.0.1:22/
<item>C. Keyword <tt/autobound/. It matches to sockets bound automatically
on local system.
</itemize>
<sect> Examples
<p>
<itemize>
<item>1. List all the tcp sockets in state <tt/FIN-WAIT-1/ for our apache
to network 193.233.7/24 and look at their timers:
<tscreen><verb>
ss -o state fin-wait-1 \( sport = :http or sport = :https \) \
dst 193.233.7/24
</verb></tscreen>
Oops, forgot to say that missing logical operation is
equivalent to <tt/and/.
<item> 2. Well, now look at the rest...
<tscreen><verb>
ss -o excl fin-wait-1
ss state fin-wait-1 \( sport neq :http and sport neq :https \) \
or not dst 193.233.7/24
</verb></tscreen>
Note that we have to do _two_ calls of ss to do this.
State match is always anded to address/port match.
The reason for this is purely technical: ss does fast skip of
not matching states before parsing addresses and I consider the
ability to skip fastly gobs of time-wait and syn-recv sockets
as more important than logical generality.
<item> 3. So, let's look at all our sockets using autobound ports:
<tscreen><verb>
ss -a -A all autobound
</verb></tscreen>
<item> 4. And eventually find all the local processes connected
to local X servers:
<tscreen><verb>
ss -xp dst "/tmp/.X11-unix/*"
</verb></tscreen>
Pardon, this does not work with current kernel, patching is required.
But we still can look at server side:
<tscreen><verb>
ss -x src "/tmp/.X11-unix/*"
</verb></tscreen>
</itemize>
<sect> Returning to ground: real manual
<p>
<sect1> Command arguments
<p> General format of arguments to <tt/ss/ is:
<tscreen><verb>
ss [ OPTIONS ] [ STATE-FILTER ] [ ADDRESS-FILTER ]
</verb></tscreen>
<sect2><tt/OPTIONS/
<p> <tt/OPTIONS/ is list of single letter options, using common unix
conventions.
<itemize>
<item><tt/-h/ - show help page
<item><tt/-?/ - the same, of course
<item><tt/-v/, <tt/-V/ - print version of <tt/ss/ and exit
<item><tt/-s/ - print summary statistics. This option does not parse
socket lists obtaining summary from various sources. It is useful
when amount of sockets is so huge that parsing <tt>/proc/net/tcp</tt>
is painful.
<item><tt/-D FILE/ - do not display anything, just dump raw information
about TCP sockets to <tt/FILE/ after applying filters. If <tt/FILE/ is <tt/-/
<tt/stdout/ is used.
<item><tt/-F FILE/ - read continuation of filter from <tt/FILE/.
Each line of <tt/FILE/ is interpreted like single command line option.
If <tt/FILE/ is <tt/-/ <tt/stdin/ is used.
<item><tt/-r/ - try to resolve numeric address/ports
<item><tt/-n/ - do not try to resolve ports
<item><tt/-o/ - show some optional information, f.e. TCP timers
<item><tt/-i/ - show some infomration specific to TCP (RTO, congestion
window, slow start threshould etc.)
<item><tt/-e/ - show even more optional information
<item><tt/-m/ - show extended information on memory used by the socket.
It is available only with <tt/tcp_diag/ enabled.
<item><tt/-p/ - show list of processes owning the socket
<item><tt/-f FAMILY/ - default address family used for parsing addresses.
Also this option limits listing to sockets supporting
given address family. Currently the following families
are supported: <tt/unix/, <tt/inet/, <tt/inet6/, <tt/link/,
<tt/netlink/.
<item><tt/-4/ - alias for <tt/-f inet/
<item><tt/-6/ - alias for <tt/-f inet6/
<item><tt/-0/ - alias for <tt/-f link/
<item><tt/-A LIST-OF-TABLES/ - list of socket tables to dump, separated
by commas. The following identifiers are understood:
<tt/all/, <tt/inet/, <tt/tcp/, <tt/udp/, <tt/raw/,
<tt/unix/, <tt/packet/, <tt/netlink/, <tt/unix_dgram/,
<tt/unix_stream/, <tt/packet_raw/, <tt/packet_dgram/.
<item><tt/-x/ - alias for <tt/-A unix/
<item><tt/-t/ - alias for <tt/-A tcp/
<item><tt/-u/ - alias for <tt/-A udp/
<item><tt/-w/ - alias for <tt/-A raw/
<item><tt/-a/ - show sockets of all the states. By default sockets
in states <tt/LISTEN/, <tt/TIME-WAIT/, <tt/SYN_RECV/
and <tt/CLOSE/ are skipped.
<item><tt/-l/ - show only sockets in state <tt/LISTEN/
</itemize>
<sect2><tt/STATE-FILTER/
<p><tt/STATE-FILTER/ allows to construct arbitrary set of
states to match. Its syntax is sequence of keywords <tt/state/
and <tt/exclude/ followed by identifier of state.
Available identifiers are:
<p>
<itemize>
<item> All standard TCP states: <tt/established/, <tt/syn-sent/,
<tt/syn-recv/, <tt/fin-wait-1/, <tt/fin-wait-2/, <tt/time-wait/,
<tt/closed/, <tt/close-wait/, <tt/last-ack/, <tt/listen/ and <tt/closing/.
<item><tt/all/ - for all the states
<item><tt/connected/ - all the states except for <tt/listen/ and <tt/closed/
<item><tt/synchronized/ - all the <tt/connected/ states except for
<tt/syn-sent/
<item><tt/bucket/ - states, which are maintained as minisockets, i.e.
<tt/time-wait/ and <tt/syn-recv/.
<item><tt/big/ - opposite to <tt/bucket/
</itemize>
<sect2><tt/ADDRESS_FILTER/
<p><tt/ADDRESS_FILTER/ is boolean expression with operations <tt/and/, <tt/or/
and <tt/not/, which can be abbreviated in C style f.e. as <tt/&amp/,
<tt/&amp&amp/.
<p>
Predicates check socket addresses, both local and remote.
There are the following kinds of predicates:
<itemize>
<item> <tt/dst ADDRESS_PATTERN/ - matches remote address and port
<item> <tt/src ADDRESS_PATTERN/ - matches local address and port
<item> <tt/dport RELOP PORT/ - compares remote port to a number
<item> <tt/sport RELOP PORT/ - compares local port to a number
<item> <tt/autobound/ - checks that socket is bound to an ephemeral
port
</itemize>
<p><tt/RELOP/ is some of <tt/&lt=/, <tt/&gt=/, <tt/==/ etc.
To make this more convinient for use in unix shell, alphabetic
FORTRAN-like notations <tt/le/, <tt/gt/ etc. are accepted as well.
<p>The format and semantics of <tt/ADDRESS_PATTERN/ depends on address
family.
<itemize>
<item><tt/inet/ - <tt/ADDRESS_PATTERN/ consists of IP prefix, optionally
followed by colon and port. If prefix or port part is absent or replaced
with <tt/*/, this means wildcard match.
<item><tt/inet6/ - The same as <tt/inet/, only prefix refers to an IPv6
address. Unlike <tt/inet/ colon becomes ambiguous, so that <tt/ss/ allows
to use scheme, like used in URLs, where address is suppounded with
<tt/[/ ... <tt/]/.
<item><tt/unix/ - <tt/ADDRESS_PATTERN/ is shell-style wildcard.
<item><tt/packet/ - format looks like <tt/inet/, only interface index
stays instead of port and link layer protocol id instead of address.
<item><tt/netlink/ - format looks like <tt/inet/, only socket pid
stays instead of port and netlink channel instead of address.
</itemize>
<p><tt/PORT/ is syntactically <tt/ADDRESS_PATTERN/ with wildcard
address part. Certainly, it is undefined for UNIX sockets.
<sect1> Environment variables
<p>
<tt/ss/ allows to change source of information using various
environment variables:
<p>
<itemize>
<item> <tt/PROC_SLABINFO/ to override <tt>/proc/slabinfo</tt>
<item> <tt/PROC_NET_TCP/ to override <tt>/proc/net/tcp</tt>
<item> <tt/PROC_NET_UDP/ to override <tt>/proc/net/udp</tt>
<item> etc.
</itemize>
<p>
Variable <tt/PROC_ROOT/ allows to change root of all the <tt>/proc/</tt>
hierarchy.
<p>
Variable <tt/TCPDIAG_FILE/ prescribes to open a file instead of
requesting kernel to dump information about TCP sockets.
<p> This option is used mainly to investigate bug reports,
when dumps of files usually found in <tt>/proc/</tt> are recevied
by e-mail.
<sect1> Output format
<p>Six columns. The first is <tt/Netid/, it denotes socket type and
transport protocol, when it is ambiguous: <tt/tcp/, <tt/udp/, <tt/raw/,
<tt/u_str/ is abbreviation for <tt/unix_stream/, <tt/u_dgr/ for UNIX
datagram sockets, <tt/nl/ for netlink, <tt/p_raw/ and <tt/p_dgr/ for
raw and datagram packet sockets. This column is optional, it will
be hidden, if filter selects an unique netid.
<p>
The second column is <tt/State/. Socket state is displayed here.
The names are standard TCP names, except for <tt/UNCONN/, which
cannot happen for TCP, but normal for not connected sockets
of another types. Again, this column can be hidden.
<p>
Then two columns (<tt/Recv-Q/ and <tt/Send-Q/) showing amount of data
queued for receive and transmit.
<p>
And the last two columns display local address and port of the socket
and its peer address, if the socket is connected.
<p>
If options <tt/-o/, <tt/-e/ or <tt/-p/ were given, options are
displayed not in fixed positions but separated by spaces pairs:
<tt/option:value/. If value is not a single number, it is presented
as list of values, enclosed to <tt/(/ ... <tt/)/ and separated with
commas. F.e.
<tscreen><verb>
timer:(keepalive,111min,0)
</verb></tscreen>
is typical format for TCP timer (option <tt/-o/).
<tscreen><verb>
users:((X,113,3))
</verb></tscreen>
is typical for list of users (option <tt/-p/).
<sect>Some numbers
<p>
Well, let us use <tt/pidentd/ and a tool <tt/ibench/ to measure
its performance. It is 30 requests per second here. Nothing to test,
it is too slow. OK, let us patch pidentd with patch from directory
Patches. After this it handles about 4300 requests per second
and becomes handy tool to pollute socket tables with lots of timewait
buckets.
<p>
So, each test starts from pollution tables with 30000 sockets
and then doing full dump of the table piped to wc and measuring
timings with time:
<p>Results:
<itemize>
<item> <tt/netstat -at/ - 15.6 seconds
<item> <tt/ss -atr/, but without <tt/tcp_diag/ - 5.4 seconds
<item> <tt/ss -atr/ with <tt/tcp_diag/ - 0.47 seconds
</itemize>
No comments. Though one comment is necessary, most of time
without <tt/tcp_diag/ is wasted inside kernel with completely
blocked networking. More than 10 seconds, yes. <tt/tcp_diag/
does the same work for 100 milliseconds of system time.
</article>

6
etc/iproute2/bpf_pinning Normal file
View File

@ -0,0 +1,6 @@
#
# subpath mappings from mount point for pinning
#
#3 tracing
#4 foo/bar
#5 tc/cls1

View File

@ -3,3 +3,6 @@
2 nbyte
3 u32
4 meta
7 canid
8 ipset
9 ipt

2
etc/iproute2/group Normal file
View File

@ -0,0 +1,2 @@
# device group names
0 default

23
etc/iproute2/nl_protos Normal file
View File

@ -0,0 +1,23 @@
# Netlink protocol names mapping
0 rtnl
1 unused
2 usersock
3 fw
4 tcpdiag
5 nflog
6 xfrm
7 selinux
8 iscsi
9 audit
10 fiblookup
11 connector
12 nft
13 ip6fw
14 dec-rt
15 uevent
16 genl
18 scsi-trans
19 ecryptfs
20 rdma
21 crypto

View File

@ -1,17 +1,6 @@
0x00 default
0x10 lowdelay
0x08 throughput
0x04 reliability
# This value overlap with ECT, do not use it!
0x02 mincost
# These values seems do not want to die, Cisco likes them by a strange reason.
0x20 priority
0x40 immediate
0x60 flash
0x80 flash-override
0xa0 critical
0xc0 internet
0xe0 network
# Differentiated field values
# These include the DSCP and unused bits
0x0 default
# Newer RFC2597 values
0x28 AF11
0x30 AF12
@ -25,3 +14,13 @@
0x88 AF41
0x90 AF42
0x98 AF43
# Older values RFC2474
0x20 CS1
0x40 CS2
0x60 CS3
0x80 CS4
0xA0 CS5
0xC0 CS6
0xE0 CS7
# RFC 2598
0xB8 EF

View File

@ -14,17 +14,12 @@
13 dnrouted
14 xorp
15 ntk
16 dhcp
#
# Used by me for gated
#
254 gated/aggr
253 gated/bgp
252 gated/ospf
251 gated/ospfase
250 gated/rip
249 gated/static
248 gated/conn
247 gated/inet
246 gated/default
16 dhcp
18 keepalived
42 babel
99 openr
186 bgp
187 isis
188 ospf
189 rip
192 eigrp

View File

@ -0,0 +1,2 @@
Each file in this directory is an rt_protos configuration file. iproute2
commands scan this directory processing all files that end in '.conf'.

View File

@ -0,0 +1,2 @@
Each file in this directory is an rt_tables configuration file. iproute2
commands scan this directory processing all files that end in '.conf'.

View File

@ -1,122 +0,0 @@
# CHANGES
# -------
# v0.3a2- fixed bug in "if" operator. Thanks kad@dgtu.donetsk.ua.
# v0.3a- added TIME parameter. Example:
# TIME=00:00-19:00;64Kbit/6Kbit
# So, between 00:00 and 19:00 RATE will be 64Kbit.
# Just start "cbq.init timecheck" periodically from cron (every 10
# minutes for example).
# !!! Anyway you MUST start "cbq.init start" for CBQ initialize.
# v0.2 - Some cosmetique changes. Now it more compatible with
# old bash version. Thanks to Stanislav V. Voronyi
# <stas@cnti.uanet.kharkov.ua>.
# v0.1 - First public release
#
# README
# ------
#
# First of all - this is just a SIMPLE EXAMPLE of CBQ power.
# Don't ask me "why" and "how" :)
#
# This is an example of using CBQ (Class Based Queueing) and policy-based
# filter for building smart ethernet shapers. All CBQ parameters are
# correct only for ETHERNET (eth0,1,2..) linux interfaces. It works for
# ARCNET too (just set bandwidth parameter to 2Mbit). It was tested
# on 2.1.125-2.1.129 linux kernels (KSI linux, Nostromo version) and
# ip-route utility by A.Kuznetsov (iproute2-ss981101 version).
# You can download ip-route from ftp://ftp.inr.ac.ru/ip-routing or
# get iproute2*.rpm (compiled with glibc) from ftp.ksi-linux.com.
#
#
# HOW IT WORKS
#
# Each shaper must be described by config file in $CBQ_PATH
# (/etc/sysconfig/cbq/) directory - one config file for each CBQ shaper.
#
# Some words about config file name:
# Each shaper has its personal ID - two byte HEX number. Really ID is
# CBQ class.
# So, filename looks like:
#
# cbq-1280.My_first_shaper
# ^^^ ^^^ ^^^^^^^^^^^^^
# | | |______ Shaper name - any word
# | |___________________ ID (0000-FFFF), let ID looks like shaper's rate
# |______________________ Filename must begin from "cbq-"
#
#
# Config file describes shaper parameters and source[destination]
# address[port].
# For example let's prepare /etc/sysconfig/cbq/cbq-1280.My_first_shaper:
#
# ----------8<---------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.0/24
# ----------8<---------------------
#
# This is minimal configuration, where:
# DEVICE: eth0 - device where we do control our traffic
# 10Mbit - REAL ethernet card bandwidth
# 1Mbit - "weight" of :1 class (parent for all shapers for eth0),
# as a rule of thumb weight=batdwidth/10.
# 100Mbit adapter's example: DEVICE=eth0,100Mbit,10Mbit
# *** If you want to build more than one shaper per device it's
# enough to describe bandwidth and weight once - cbq.init
# is smart :) You can put only 'DEVICE=eth0' into cbq-*
# config file for eth0.
#
# RATE: Shaper's speed - Kbit,Mbit or bps (bytes per second)
#
# WEIGHT: "weight" of shaper (CBQ class). Like for DEVICE - approx. RATE/10
#
# PRIO: shaper's priority from 1 to 8 where 1 is the highest one.
# I do always use "5" for all my shapers.
#
# RULE: [source addr][:source port],[dest addr][:dest port]
# Some examples:
# RULE=10.1.1.0/24:80 - all traffic for network 10.1.1.0 to port 80
# will be shaped.
# RULE=10.2.2.5 - shaper works only for IP address 10.2.2.5
# RULE=:25,10.2.2.128/25:5000 - all traffic from any address and port 25 to
# address 10.2.2.128 - 10.2.2.255 and port 5000
# will be shaped.
# RULE=10.5.5.5:80, - shaper active only for traffic from port 80 of
# address 10.5.5.5
# Multiple RULE fields per one config file are allowed. For example:
# RULE=10.1.1.2:80
# RULE=10.1.1.2:25
# RULE=10.1.1.2:110
#
# *** ATTENTION!!!
# All shapers do work only for outgoing traffic!
# So, if you want to build bidirectional shaper you must set it up for
# both ethernet card. For example let's build shaper for our linux box like:
#
# --------- 192.168.1.1
# BACKBONE -----eth0-| linux |-eth1------*[our client]
# ---------
#
# Let all traffic from backbone to client will be shaped at 28Kbit and
# traffic from client to backbone - at 128Kbit. We need two config files:
#
# ---8<-----/etc/sysconfig/cbq/cbq-28.client-out----
# DEVICE=eth1,10Mbit,1Mbit
# RATE=28Kbit
# WEIGHT=2Kbit
# PRIO=5
# RULE=192.168.1.1
# ---8<---------------------------------------------
#
# ---8<-----/etc/sysconfig/cbq/cbq-128.client-in----
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.1,
# ---8<---------------------------------------------
# ^pay attention to "," - this is source address!
#
# Enjoy.

View File

@ -1,49 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# this script shows how one can rate limit incoming SYNs
# Useful for TCP-SYN attack protection. You can use
# IPchains to have more powerful additions to the SYN (eg
# in addition the subnet)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
#
# tag all incoming SYN packets through $INDEV as mark value 1
############################################################
$IPCHAINS -A input -i $INDEV -y -m 1
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
#
# SYN packets are 40 bytes (320 bits) so three SYNs equals
# 960 bits (approximately 1kbit); so we rate limit below
# the incoming SYNs to 3/sec (not very sueful really; but
#serves to show the point - JHS
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 1 fw \
police rate 1kbit burst 40 mtu 9k drop flowid :1
############################################################
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

18
examples/bpf/README Normal file
View File

@ -0,0 +1,18 @@
eBPF toy code examples (running in kernel) to familiarize yourself
with syntax and features:
- BTF defined map examples
- bpf_graft.c -> Demo on altering runtime behaviour
- bpf_shared.c -> Ingress/egress map sharing example
- bpf_map_in_map.c -> Using map in map example
- legacy struct bpf_elf_map defined map examples
- legacy/bpf_shared.c -> Ingress/egress map sharing example
- legacy/bpf_tailcall.c -> Using tail call chains
- legacy/bpf_cyclic.c -> Simple cycle as tail calls
- legacy/bpf_graft.c -> Demo on altering runtime behaviour
- legacy/bpf_map_in_map.c -> Using map in map example
Note: Users should use new BTF way to defined the maps, the examples
in legacy folder which is using struct bpf_elf_map defined maps is not
recommanded.

66
examples/bpf/bpf_graft.c Normal file
View File

@ -0,0 +1,66 @@
#include "../../include/bpf_api.h"
/* This example demonstrates how classifier run-time behaviour
* can be altered with tail calls. We start out with an empty
* jmp_tc array, then add section aaa to the array slot 0, and
* later on atomically replace it with section bbb. Note that
* as shown in other examples, the tc loader can prepopulate
* tail called sections, here we start out with an empty one
* on purpose to show it can also be done this way.
*
* tc filter add dev foo parent ffff: bpf obj graft.o
* tc exec bpf dbg
* [...]
* Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
* <idle>-0 [001] ..s. 138993.202265: : fallthrough
* Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139012.053587: : aaa
* <idle>-0 [002] ..s. 139012.172359: : aaa
* Socket Thread-19818 [001] ..s. 139012.173556: : aaa
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139022.102967: : bbb
* <idle>-0 [002] ..s. 139022.155640: : bbb
* Socket Thread-19818 [001] ..s. 139022.156730: : bbb
* [...]
*/
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} jmp_tc __section(".maps");
__section("aaa")
int cls_aaa(struct __sk_buff *skb)
{
printt("aaa\n");
return TC_H_MAKE(1, 42);
}
__section("bbb")
int cls_bbb(struct __sk_buff *skb)
{
printt("bbb\n");
return TC_H_MAKE(1, 43);
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,55 @@
#include "../../include/bpf_api.h"
struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
} map_inner __section(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
__array(values, struct inner_map);
} map_outer __section(".maps") = {
.values = {
[0] = &map_inner,
},
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

53
examples/bpf/bpf_shared.c Normal file
View File

@ -0,0 +1,53 @@
#include "../../include/bpf_api.h"
/* Minimal, stand-alone toy map pinning example:
*
* clang -target bpf -O2 [...] -o bpf_shared.o -c bpf_shared.c
* tc filter add dev foo parent 1: bpf obj bpf_shared.o sec egress
* tc filter add dev foo parent ffff: bpf obj bpf_shared.o sec ingress
*
* Both classifier will share the very same map instance in this example,
* so map content can be accessed from ingress *and* egress side!
*
* This example has a pinning of PIN_OBJECT_NS, so it's private and
* thus shared among various program sections within the object.
*
* A setting of PIN_GLOBAL_NS would place it into a global namespace,
* so that it can be shared among different object files. A setting
* of PIN_NONE (= 0) means no sharing, so each tc invocation a new map
* instance is being created.
*/
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME); /* or LIBBPF_PIN_NONE */
} map_sh __section(".maps");
__section("egress")
int emain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
lock_xadd(val, 1);
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("map val: %d\n", *val);
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,35 @@
#include "../../../include/bpf_api.h"
/* Cyclic dependency example to test the kernel's runtime upper
* bound on loops. Also demonstrates on how to use direct-actions,
* loaded as: tc filter add [...] bpf da obj [...]
*/
#define JMP_MAP_ID 0xabccba
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.id = JMP_MAP_ID,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS,
.max_elem = 1,
};
__section_tail(JMP_MAP_ID, 0)
int cls_loop(struct __sk_buff *skb)
{
printt("cb: %u\n", skb->cb[0]++);
tail_call(skb, &jmp_tc, 0);
skb->tc_classid = TC_H_MAKE(1, 42);
return TC_ACT_OK;
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
return TC_ACT_SHOT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,66 @@
#include "../../../include/bpf_api.h"
/* This example demonstrates how classifier run-time behaviour
* can be altered with tail calls. We start out with an empty
* jmp_tc array, then add section aaa to the array slot 0, and
* later on atomically replace it with section bbb. Note that
* as shown in other examples, the tc loader can prepopulate
* tail called sections, here we start out with an empty one
* on purpose to show it can also be done this way.
*
* tc filter add dev foo parent ffff: bpf obj graft.o
* tc exec bpf dbg
* [...]
* Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
* <idle>-0 [001] ..s. 138993.202265: : fallthrough
* Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139012.053587: : aaa
* <idle>-0 [002] ..s. 139012.172359: : aaa
* Socket Thread-19818 [001] ..s. 139012.173556: : aaa
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139022.102967: : bbb
* <idle>-0 [002] ..s. 139022.155640: : bbb
* Socket Thread-19818 [001] ..s. 139022.156730: : bbb
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("aaa")
int cls_aaa(struct __sk_buff *skb)
{
printt("aaa\n");
return TC_H_MAKE(1, 42);
}
__section("bbb")
int cls_bbb(struct __sk_buff *skb)
{
printt("bbb\n");
return TC_H_MAKE(1, 43);
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,56 @@
#include "../../../include/bpf_api.h"
#define MAP_INNER_ID 42
struct bpf_elf_map __section_maps map_inner = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.id = MAP_INNER_ID,
.inner_idx = 0,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_outer = {
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,53 @@
#include "../../../include/bpf_api.h"
/* Minimal, stand-alone toy map pinning example:
*
* clang -target bpf -O2 [...] -o bpf_shared.o -c bpf_shared.c
* tc filter add dev foo parent 1: bpf obj bpf_shared.o sec egress
* tc filter add dev foo parent ffff: bpf obj bpf_shared.o sec ingress
*
* Both classifier will share the very same map instance in this example,
* so map content can be accessed from ingress *and* egress side!
*
* This example has a pinning of PIN_OBJECT_NS, so it's private and
* thus shared among various program sections within the object.
*
* A setting of PIN_GLOBAL_NS would place it into a global namespace,
* so that it can be shared among different object files. A setting
* of PIN_NONE (= 0) means no sharing, so each tc invocation a new map
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
lock_xadd(val, 1);
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("map val: %d\n", *val);
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,117 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include "../../../include/bpf_api.h"
#define ENTRY_INIT 3
#define ENTRY_0 0
#define ENTRY_1 1
#define MAX_JMP_SIZE 2
#define FOO 42
#define BAR 43
/* This example doesn't really do anything useful, but it's purpose is to
* demonstrate eBPF tail calls on a very simple example.
*
* cls_entry() is our classifier entry point, from there we jump based on
* skb->hash into cls_case1() or cls_case2(). They are both part of the
* program array jmp_tc. Indicated via __section_tail(), the tc loader
* populates the program arrays with the loaded file descriptors already.
*
* To demonstrate nested jumps, cls_case2() jumps within the same jmp_tc
* array to cls_case1(). And whenever we arrive at cls_case1(), we jump
* into cls_exit(), part of the jump array jmp_ex.
*
* Also, to show it's possible, all programs share map_sh and dump the value
* that the entry point incremented. The sections that are loaded into a
* program array can be atomically replaced during run-time, e.g. to change
* classifier behaviour.
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.id = FOO,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS,
.max_elem = MAX_JMP_SIZE,
};
struct bpf_elf_map __section_maps jmp_ex = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.id = BAR,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS,
.max_elem = 1,
};
__section_tail(FOO, ENTRY_0)
int cls_case1(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("case1: map-val: %d from:%u\n", *val, skb->cb[0]);
skb->cb[0] = ENTRY_0;
tail_call(skb, &jmp_ex, ENTRY_0);
return BPF_H_DEFAULT;
}
__section_tail(FOO, ENTRY_1)
int cls_case2(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("case2: map-val: %d from:%u\n", *val, skb->cb[0]);
skb->cb[0] = ENTRY_1;
tail_call(skb, &jmp_tc, ENTRY_0);
return BPF_H_DEFAULT;
}
__section_tail(BAR, ENTRY_0)
int cls_exit(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("exit: map-val: %d from:%u\n", *val, skb->cb[0]);
/* Termination point. */
return BPF_H_DEFAULT;
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
int key = 0, *val;
/* For transferring state, we can use skb->cb[0] ... skb->cb[4]. */
val = map_lookup_elem(&map_sh, &key);
if (val) {
lock_xadd(val, 1);
skb->cb[0] = ENTRY_INIT;
tail_call(skb, &jmp_tc, skb->hash & (MAX_JMP_SIZE - 1));
}
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -1,984 +0,0 @@
#!/bin/bash
#
# cbq.init v0.7.3
# Copyright (C) 1999 Pavel Golubev <pg@ksi-linux.com>
# Copyright (C) 2001-2004 Lubomir Bulej <pallas@kadan.cz>
#
# chkconfig: 2345 11 89
# description: sets up CBQ-based traffic control
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
# To get the latest version, check on Freshmeat for actual location:
#
# http://freshmeat.net/projects/cbq.init
#
#
# VERSION HISTORY
# ---------------
# v0.7.3- Deepak Singhal <singhal at users.sourceforge.net>
# - fix timecheck to not ignore regular TIME rules after
# encountering a TIME rule that spans over midnight
# - Nathan Shafer <nicodemus at users.sourceforge.net>
# - allow symlinks to class files
# - Seth J. Blank <antifreeze at users.sourceforge.net>
# - replace hardcoded ip/tc location with variables
# - Mark Davis <mark.davis at gmx.de>
# - allow setting of PRIO_{MARK,RULE,REALM} in class file
# - Fernando Sanch <toptnc at users.sourceforge.net>
# - allow underscores in interface names
# v0.7.2- Paulo Sedrez
# - fix time2abs to allow hours with leading zero in TIME rules
# - Svetlin Simeonov <zvero at yahoo.com>
# - fix cbq_device_list to allow VLAN interfaces
# - Mark Davis <mark.davis at gmx.de>
# - ignore *~ backup files when looking for classes
# - Mike Boyer <boyer at administrative.com>
# - fix to allow arguments to be passed to "restart" command
# v0.7.1- Lubomir Bulej <pallas at kadan.cz>
# - default value for PERTURB
# - fixed small bug in RULE parser to correctly parse rules with
# identical source and destination fields
# - faster initial scanning of DEVICE fields
# v0.7 - Lubomir Bulej <pallas at kadan.cz>
# - lots of various cleanups and reorganizations; the parsing is now
# some 40% faster, but the class ID must be in range 0x0002-0xffff
# (again). Because of the number of internal changes and the above
# class ID restriction, I bumped the version to 0.7 to indicate
# something might have got broken :)
# - changed PRIO_{U32,FW,ROUTE} to PRIO_{RULE,MARK,REALM}
# for consistency with filter keywords
# - exposed "compile" command
# - Catalin Petrescu <taz at dntis.ro>
# - support for port masks in RULE (u32) filter
# - Jordan Vrtanoski <obeliks at mt.net.mk>
# - support for week days in TIME rules
# v0.6.4- Lubomir Bulej <pallas at kadan.cz>
# - added PRIO_* variables to allow easy control of filter priorities
# - added caching to speed up CBQ start, the cache is invalidated
# whenever any of the configuration files changes
# - updated the readme section + some cosmetic fixes
# v0.6.3- Lubomir Bulej <pallas at kadan.cz>
# - removed setup of (unnecessary) class 1:1 - all classes
# now use qdisc's default class 1:0 as their parent
# - minor fix in the timecheck branch - classes
# without leaf qdisc were not updated
# - minor fix to avoid timecheck failure when run
# at time with minutes equal to 08 or 09
# - respect CBQ_PATH setting in environment
# - made PRIO=5 default, rendering it optional in configs
# - added support for route filter, see notes about REALM keyword
# - added support for fw filter, see notes about MARK keyword
# - added filter display to "list" and "stats" commands
# - readme section update + various cosmetic fixes
# v0.6.2- Catalin Petrescu <taz at dntis.ro>
# - added tunnels interface handling
# v0.6.1- Pavel Golubev <pg at ksi-linux.com>
# - added sch_prio module loading
# (thanks johan at iglo.virtual.or.id for reminding)
# - resolved errors resulting from stricter syntax checking in bash2
# - Lubomir Bulej <pallas at kadan.cz>
# - various cosmetic fixes
# v0.6 - Lubomir Bulej <pallas at kadan.cz>
# - attempt to limit number of spawned processes by utilizing
# more of sed power (use sed instead of grep+cut)
# - simplified TIME parser, using bash builtins
# - added initial support for SFQ as leaf qdisc
# - reworked the documentation part a little
# - incorporated pending patches and ideas submitted by
# following people for versions 0.3 into version 0.6
# - Miguel Freitas <miguel at cetuc.puc-rio.br>
# - in case of overlapping TIME parameters, the last match is taken
# - Juanjo Ciarlante <jjo at mendoza.gov.ar>
# - chkconfig tags, list + stats startup parameters
# - optional tc & ip command logging (into /var/run/cbq-*)
# - Rafal Maszkowski <rzm at icm.edu.pl>
# - PEAK parameter for setting TBF's burst peak rate
# - fix for many config files (use find instead of ls)
# v0.5.1- Lubomir Bulej <pallas at kadan.cz>
# - fixed little but serious bug in RULE parser
# v0.5 - Lubomir Bulej <pallas at kadan.cz>
# - added options PARENT, LEAF, ISOLATED and BOUNDED. This allows
# (with some attention to config file ordering) for creating
# hierarchical structures of shapers with classes able (or unable)
# to borrow bandwidth from their parents.
# - class ID check allows hexadecimal numbers
# - rewritten & simplified RULE parser
# - cosmetic changes to improve readability
# - reorganization to avoid duplicate code (timecheck etc.)
# - timecheck doesn't check classes without TIME fields anymore
# v0.4 - Lubomir Bulej <pallas at kadan.cz>
# - small bugfix in RULE parsing code
# - simplified configuration parsing code
# - several small cosmetic changes
# - TIME parameter can be now specified more than once allowing you to
# differentiate RATE throughout the whole day. Time overlapping is
# not checked, first match is taken. Midnight wrap (eg. 20:00-6:00)
# is allowed and taken care of.
# v0.3a4- fixed small bug in IF operator. Thanks to
# Rafal Maszkowski <rzm at icm.edu.pl>
# v0.3a3- fixed grep bug when using more than 10 eth devices. Thanks to David
# Trcka <trcka at poda.cz>.
# v0.3a2- fixed bug in "if" operator. Thanks kad at dgtu.donetsk.ua.
# v0.3a - added TIME parameter. Example: TIME=00:00-19:00;64Kbit/6Kbit
# So, between 00:00 and 19:00 the RATE will be 64Kbit.
# Just start "cbq.init timecheck" periodically from cron
# (every 10 minutes for example). DON'T FORGET though, to run
# "cbq.init start" for CBQ to initialize.
# v0.2 - Some cosmetic changes. Now it is more compatible with old bash
# version. Thanks to Stanislav V. Voronyi <stas at cnti.uanet.kharkov.ua>.
# v0.1 - First public release
#
#
# README
# ------
#
# First of all - this is just a SIMPLE EXAMPLE of CBQ power.
# Don't ask me "why" and "how" :)
#
# This script is meant to simplify setup and management of relatively simple
# CBQ-based traffic control on Linux. Access to advanced networking features
# of Linux kernel is provided by "ip" and "tc" utilities from A. Kuznetsov's
# iproute2 package, available at ftp://ftp.inr.ac.ru/ip-routing. Because the
# utilities serve primarily to translate user wishes to RTNETLINK commands,
# their interface is rather spartan, intolerant and requires quite a lot of
# typing. And typing is what this script attempts to reduce :)
#
# The advanced networking stuff in Linux is pretty flexible and this script
# aims to bring some of its features to the not-so-hard-core Linux users. Of
# course, there is a tradeoff between simplicity and flexibility and you may
# realize that the flexibility suffered too much for your needs -- time to
# face "ip" and "tc" interface.
#
# To speed up the "start" command, simple caching was introduced in version
# 0.6.4. The caching works so that the sequence of "tc" commands for given
# configuration is stored in a file (/var/cache/cbq.init by default) which
# is used next time the "start" command is run to avoid repeated parsing of
# configuration files. This cache is invalidated whenever any of the CBQ
# configuration files changes. If you want to run "cbq.init start" without
# caching, run it as "cbq.init start nocache". If you want to force cache
# invalidation, run it as "cbq.init start invalidate". Caching is disabled
# if you have logging enabled (ie. CBQ_DEBUG is not empty).
#
# If you only want cqb.init to translate your configuration to "tc" commands,
# use "compile" command which will output "tc" commands required to build
# your configuration. Bear in mind that "compile" does not check if the "tc"
# commands were successful - this is done (in certain places) only when the
# "start nocache" command is used, which is also useful when creating the
# configuration to check whether it is completely valid.
#
# All CBQ parameters are valid for Ethernet interfaces only, The script was
# tested on various Linux kernel versions from series 2.1 to 2.4 and several
# distributions with KSI Linux (Nostromo version) as the premier one.
#
#
# HOW DOES IT WORK?
# -----------------
#
# Every traffic class must be described by a file in the $CBQ_PATH directory
# (/etc/sysconfig/cbq by default) - one file per class.
#
# The config file names must obey mandatory format: cbq-<clsid>.<name> where
# <clsid> is two-byte hexadecimal number in range <0002-FFFF> (which in fact
# is a CBQ class ID) and <name> is the name of the class -- anything to help
# you distinguish the configuration files. For small amount of classes it is
# often possible (and convenient) to let <clsid> resemble bandwidth of the
# class.
#
# Example of valid config name:
# cbq-1280.My_first_shaper
#
#
# The configuration file may contain the following parameters:
#
### Device parameters
#
# DEVICE=<ifname>,<bandwidth>[,<weight>] mandatory
# DEVICE=eth0,10Mbit,1Mbit
#
# <ifname> is the name of the interface you want to control
# traffic on, e.g. eth0
# <bandwidth> is the physical bandwidth of the device, e.g. for
# ethernet 10Mbit or 100Mbit, for arcnet 2Mbit
# <weight> is tuning parameter that should be proportional to
# <bandwidth>. As a rule of thumb: <weight> = <bandwidth> / 10
#
# When you have more classes on one interface, it is enough to specify
# <bandwidth> [and <weight>] only once, therefore in other files you only
# need to set DEVICE=<ifname>.
#
### Class parameters
#
# RATE=<speed> mandatory
# RATE=5Mbit
#
# Bandwidth allocated to the class. Traffic going through the class is
# shaped to conform to specified rate. You can use Kbit, Mbit or bps,
# Kbps and Mbps as suffices. If you don't specify any unit, bits/sec
# are used. Also note that "bps" means "bytes per second", not bits.
#
# WEIGHT=<speed> mandatory
# WEIGHT=500Kbit
#
# Tuning parameter that should be proportional to RATE. As a rule
# of thumb, use WEIGHT ~= RATE / 10.
#
# PRIO=<1-8> optional, default 5
# PRIO=5
#
# Priority of class traffic. The higher the number, the lesser
# the priority. Priority of 5 is just fine.
#
# PARENT=<clsid> optional, default not set
# PARENT=1280
#
# Specifies ID of the parent class to which you want this class be
# attached. You might want to use LEAF=none for the parent class as
# mentioned below. By using this parameter and carefully ordering the
# configuration files, it is possible to create simple hierarchical
# structures of CBQ classes. The ordering is important so that parent
# classes are constructed prior to their children.
#
# LEAF=none|tbf|sfq optional, default "tbf"
#
# Tells the script to attach specified leaf queueing discipline to CBQ
# class. By default, TBF is used. Note that attaching TBF to CBQ class
# shapes the traffic to conform to TBF parameters and prevents the class
# from borrowing bandwidth from its parent even if you have BOUNDED set
# to "no". To allow the class to borrow bandwith (provided it is not
# bounded), you must set LEAF to "none" or "sfq".
#
# If you want to ensure (approximately) fair sharing of bandwidth among
# several hosts in the same class, you might want to specify LEAF=sfq to
# attach SFQ as leaf queueing discipline to that class.
#
# BOUNDED=yes|no optional, default "yes"
#
# If set to "yes", the class is not allowed to borrow bandwidth from
# its parent class in overlimit situation. If set to "no", the class
# will be allowed to borrow bandwidth from its parent.
#
# Note: Don't forget to set LEAF to "none" or "sfq", otherwise the class will
# have TBF attached to itself and will not be able to borrow unused
# bandwith from its parent.
#
# ISOLATED=yes|no optional, default "no"
#
# If set to "yes", the class will not lend unused bandwidth to
# its children.
#
### TBF qdisc parameters
#
# BUFFER=<bytes>[/<bytes>] optional, default "10Kb/8"
#
# This parameter controls the depth of the token bucket. In other
# words it represents the maximal burst size the class can send.
# The optional part of parameter is used to determine the length
# of intervals in packet sizes, for which the transmission times
# are kept.
#
# LIMIT=<bytes> optional, default "15Kb"
#
# This parameter determines the maximal length of backlog. If
# the queue contains more data than specified by LIMIT, the
# newly arriving packets are dropped. The length of backlog
# determines queue latency in case of congestion.
#
# PEAK=<speed> optional, default not set
#
# Maximal peak rate for short-term burst traffic. This allows you
# to control the absolute peak rate the class can send at, because
# single TBF that allows 256Kbit/s would of course allow rate of
# 512Kbit for half a second or 1Mbit for a quarter of second.
#
# MTU=<bytes> optional, default "1500"
#
# Maximum number of bytes that can be sent at once over the
# physical medium. This parameter is required when you specify
# PEAK parameter. It defaults to MTU of ethernet - for other
# media types you might want to change it.
#
# Note: Setting TBF as leaf qdisc will effectively prevent the class from
# borrowing bandwidth from the ancestor class, because even if the
# class allows more traffic to pass through, it is then shaped to
# conform to TBF.
#
### SFQ qdisc parameters
#
# The SFQ queueing discipline is a cheap way for sharing class bandwidth
# among several hosts. As it is stochastic, the fairness is approximate but
# it will do the job in most cases. If you want real fairness, you should
# probably use WRR (weighted round robin) or WFQ queueing disciplines. Note
# that SFQ does not do any traffic shaping - the shaping is done by the CBQ
# class the SFQ is attached to.
#
# QUANTUM=<bytes> optional, default not set
#
# This parameter should not be set lower than link MTU, for ethernet
# it is 1500b, or (with MAC header) 1514b which is the value used
# in Alexey Kuznetsov's examples.
#
# PERTURB=<seconds> optional, default "10"
#
# Period of hash function perturbation. If unset, hash reconfiguration
# will never take place which is what you probably don't want. The
# default value of 10 seconds is probably a good one.
#
### Filter parameters
#
# RULE=[[saddr[/prefix]][:port[/mask]],][daddr[/prefix]][:port[/mask]]
#
# These parameters make up "u32" filter rules that select traffic for
# each of the classes. You can use multiple RULE fields per config.
#
# The optional port mask should only be used by advanced users who
# understand how the u32 filter works.
#
# Some examples:
#
# RULE=10.1.1.0/24:80
# selects traffic going to port 80 in network 10.1.1.0
#
# RULE=10.2.2.5
# selects traffic going to any port on single host 10.2.2.5
#
# RULE=10.2.2.5:20/0xfffe
# selects traffic going to ports 20 and 21 on host 10.2.2.5
#
# RULE=:25,10.2.2.128/26:5000
# selects traffic going from anywhere on port 50 to
# port 5000 in network 10.2.2.128
#
# RULE=10.5.5.5:80,
# selects traffic going from port 80 of single host 10.5.5.5
#
#
#
# REALM=[srealm,][drealm]
#
# These parameters make up "route" filter rules that classify traffic
# according to packet source/destination realms. For information about
# realms, see Alexey Kuznetsov's IP Command Reference. This script
# does not define any realms, it justs builds "tc filter" commands
# for you if you need to classify traffic this way.
#
# Realm is either a decimal number or a string referencing entry in
# /etc/iproute2/rt_realms (usually).
#
# Some examples:
#
# REALM=russia,internet
# selects traffic going from realm "russia" to realm "internet"
#
# REALM=freenet,
# selects traffic going from realm "freenet"
#
# REALM=10
# selects traffic going to realm 10
#
#
#
# MARK=<mark>
#
# These parameters make up "fw" filter rules that select traffic for
# each of the classes accoring to firewall "mark". Mark is a decimal
# number packets are tagged with if firewall rules say so. You can
# use multiple MARK fields per config.
#
#
# Note: Rules for different filter types can be combined. Attention must be
# paid to the priority of filter rules, which can be set below using
# PRIO_{RULE,MARK,REALM} variables.
#
### Time ranging parameters
#
# TIME=[<dow>,<dow>, ...,<dow>/]<from>-<till>;<rate>/<weight>[/<peak>]
# TIME=0,1,2,5/18:00-06:00;256Kbit/25Kbit
# TIME=60123/18:00-06:00;256Kbit/25Kbit
# TIME=18:00-06:00;256Kbit/25Kbit
#
# This parameter allows you to differentiate the class bandwidth
# throughout the day. You can specify multiple TIME parameters, if
# the times overlap, last match is taken. The fields <rate>, <weight>
# and <peak> correspond to parameters RATE, WEIGHT and PEAK (which
# is optional and applies to TBF leaf qdisc only).
#
# You can also specify days of week when the TIME rule applies. <dow>
# is numeric, 0 corresponds to sunday, 1 corresponds to monday, etc.
#
###
#
# Sample configuration file: cbq-1280.My_first_shaper
#
# --------------------------------------------------------------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.128.1.0/24
# --------------------------------------------------------------------------
#
# The configuration says that we will control traffic on 10Mbit ethernet
# device eth0 and the traffic going to network 192.168.1.0 will be
# processed with priority 5 and shaped to rate of 128Kbit.
#
# Note that you can control outgoing traffic only. If you want to control
# traffic in both directions, you must set up CBQ for both interfaces.
#
# Consider the following example:
#
# +---------+ 192.168.1.1
# BACKBONE -----eth0-| linux |-eth1------*-[client]
# +---------+
#
# Imagine you want to shape traffic from backbone to the client to 28Kbit
# and traffic in the opposite direction to 128Kbit. You need to setup CBQ
# on both eth0 and eth1 interfaces, thus you need two config files:
#
# cbq-028.backbone-client
# --------------------------------------------------------------------------
# DEVICE=eth1,10Mbit,1Mbit
# RATE=28Kbit
# WEIGHT=2Kbit
# PRIO=5
# RULE=192.168.1.1
# --------------------------------------------------------------------------
#
# cbq-128.client-backbone
# --------------------------------------------------------------------------
# DEVICE=eth0,10Mbit,1Mbit
# RATE=128Kbit
# WEIGHT=10Kbit
# PRIO=5
# RULE=192.168.1.1,
# --------------------------------------------------------------------------
#
# Pay attention to comma "," in the RULE field - it denotes source address!
#
# Enjoy.
#
#############################################################################
export LC_ALL=C
### Command locations
TC=/sbin/tc
IP=/sbin/ip
MP=/sbin/modprobe
### Default filter priorities (must be different)
PRIO_RULE_DEFAULT=${PRIO_RULE:-100}
PRIO_MARK_DEFAULT=${PRIO_MARK:-200}
PRIO_REALM_DEFAULT=${PRIO_REALM:-300}
### Default CBQ_PATH & CBQ_CACHE settings
CBQ_PATH=${CBQ_PATH:-/etc/sysconfig/cbq}
CBQ_CACHE=${CBQ_CACHE:-/var/cache/cbq.init}
### Uncomment to enable logfile for debugging
#CBQ_DEBUG="/var/run/cbq-$1"
### Modules to probe for. Uncomment the last CBQ_PROBE
### line if you have QoS support compiled into kernel
CBQ_PROBE="sch_cbq sch_tbf sch_sfq sch_prio"
CBQ_PROBE="$CBQ_PROBE cls_fw cls_u32 cls_route"
#CBQ_PROBE=""
### Keywords required for qdisc & class configuration
CBQ_WORDS="DEVICE|RATE|WEIGHT|PRIO|PARENT|LEAF|BOUNDED|ISOLATED"
CBQ_WORDS="$CBQ_WORDS|PRIO_MARK|PRIO_RULE|PRIO_REALM|BUFFER"
CBQ_WORDS="$CBQ_WORDS|LIMIT|PEAK|MTU|QUANTUM|PERTURB"
### Source AVPKT if it exists
[ -r /etc/sysconfig/cbq/avpkt ] && . /etc/sysconfig/cbq/avpkt
AVPKT=${AVPKT:-3000}
#############################################################################
############################# SUPPORT FUNCTIONS #############################
#############################################################################
### Get list of network devices
cbq_device_list () {
ip link show| sed -n "/^[0-9]/ \
{ s/^[0-9]\+: \([a-z0-9._]\+\)[:@].*/\1/; p; }"
} # cbq_device_list
### Remove root class from device $1
cbq_device_off () {
tc qdisc del dev $1 root 2> /dev/null
} # cbq_device_off
### Remove CBQ from all devices
cbq_off () {
for dev in `cbq_device_list`; do
cbq_device_off $dev
done
} # cbq_off
### Prefixed message
cbq_message () {
echo -e "**CBQ: $@"
} # cbq_message
### Failure message
cbq_failure () {
cbq_message "$@"
exit 1
} # cbq_failure
### Failure w/ cbq-off
cbq_fail_off () {
cbq_message "$@"
cbq_off
exit 1
} # cbq_fail_off
### Convert time to absolute value
cbq_time2abs () {
local min=${1##*:}; min=${min##0}
local hrs=${1%%:*}; hrs=${hrs##0}
echo $[hrs*60 + min]
} # cbq_time2abs
### Display CBQ setup
cbq_show () {
for dev in `cbq_device_list`; do
[ `tc qdisc show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: queueing disciplines\n"
tc $1 qdisc show dev $dev; echo
[ `tc class show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: traffic classes\n"
tc $1 class show dev $dev; echo
[ `tc filter show dev $dev| wc -l` -eq 0 ] && continue
echo -e "### $dev: filtering rules\n"
tc $1 filter show dev $dev; echo
done
} # cbq_show
### Check configuration and load DEVICES, DEVFIELDS and CLASSLIST from $1
cbq_init () {
### Get a list of configured classes
CLASSLIST=`find $1 \( -type f -or -type l \) -name 'cbq-*' \
-not -name '*~' -maxdepth 1 -printf "%f\n"| sort`
[ -z "$CLASSLIST" ] &&
cbq_failure "no configuration files found in $1!"
### Gather all DEVICE fields from $1/cbq-*
DEVFIELDS=`find $1 \( -type f -or -type l \) -name 'cbq-*' \
-not -name '*~' -maxdepth 1| xargs sed -n 's/#.*//; \
s/[[:space:]]//g; /^DEVICE=[^,]*,[^,]*\(,[^,]*\)\?/ \
{ s/.*=//; p; }'| sort -u`
[ -z "$DEVFIELDS" ] &&
cbq_failure "no DEVICE field found in $1/cbq-*!"
### Check for different DEVICE fields for the same device
DEVICES=`echo "$DEVFIELDS"| sed 's/,.*//'| sort -u`
[ `echo "$DEVICES"| wc -l` -ne `echo "$DEVFIELDS"| wc -l` ] &&
cbq_failure "different DEVICE fields for single device!\n$DEVFIELDS"
} # cbq_init
### Load class configuration from $1/$2
cbq_load_class () {
CLASS=`echo $2| sed 's/^cbq-0*//; s/^\([0-9a-fA-F]\+\).*/\1/'`
CFILE=`sed -n 's/#.*//; s/[[:space:]]//g; /^[[:alnum:]_]\+=[[:alnum:].,:;/*@-_]\+$/ p' $1/$2`
### Check class number
IDVAL=`/usr/bin/printf "%d" 0x$CLASS 2> /dev/null`
[ $? -ne 0 -o $IDVAL -lt 2 -o $IDVAL -gt 65535 ] &&
cbq_fail_off "class ID of $2 must be in range <0002-FFFF>!"
### Set defaults & load class
RATE=""; WEIGHT=""; PARENT=""; PRIO=5
LEAF=tbf; BOUNDED=yes; ISOLATED=no
BUFFER=10Kb/8; LIMIT=15Kb; MTU=1500
PEAK=""; PERTURB=10; QUANTUM=""
PRIO_RULE=$PRIO_RULE_DEFAULT
PRIO_MARK=$PRIO_MARK_DEFAULT
PRIO_REALM=$PRIO_REALM_DEFAULT
eval `echo "$CFILE"| grep -E "^($CBQ_WORDS)="`
### Require RATE/WEIGHT
[ -z "$RATE" -o -z "$WEIGHT" ] &&
cbq_fail_off "missing RATE or WEIGHT in $2!"
### Class device
DEVICE=${DEVICE%%,*}
[ -z "$DEVICE" ] && cbq_fail_off "missing DEVICE field in $2!"
BANDWIDTH=`echo "$DEVFIELDS"| sed -n "/^$DEVICE,/ \
{ s/[^,]*,\([^,]*\).*/\1/; p; q; }"`
### Convert to "tc" options
PEAK=${PEAK:+peakrate $PEAK}
PERTURB=${PERTURB:+perturb $PERTURB}
QUANTUM=${QUANTUM:+quantum $QUANTUM}
[ "$BOUNDED" = "no" ] && BOUNDED="" || BOUNDED="bounded"
[ "$ISOLATED" = "yes" ] && ISOLATED="isolated" || ISOLATED=""
} # cbq_load_class
#############################################################################
#################################### INIT ###################################
#############################################################################
### Check for presence of ip-route2 in usual place
[ -x $TC -a -x $IP ] ||
cbq_failure "ip-route2 utilities not installed or executable!"
### ip/tc wrappers
if [ "$1" = "compile" ]; then
### no module probing
CBQ_PROBE=""
ip () {
$IP "$@"
} # ip
### echo-only version of "tc" command
tc () {
echo "$TC $@"
} # tc
elif [ -n "$CBQ_DEBUG" ]; then
echo -e "# `date`" > $CBQ_DEBUG
### Logging version of "ip" command
ip () {
echo -e "\n# ip $@" >> $CBQ_DEBUG
$IP "$@" 2>&1 | tee -a $CBQ_DEBUG
} # ip
### Logging version of "tc" command
tc () {
echo -e "\n# tc $@" >> $CBQ_DEBUG
$TC "$@" 2>&1 | tee -a $CBQ_DEBUG
} # tc
else
### Default wrappers
ip () {
$IP "$@"
} # ip
tc () {
$TC "$@"
} # tc
fi # ip/tc wrappers
case "$1" in
#############################################################################
############################### START/COMPILE ###############################
#############################################################################
start|compile)
### Probe QoS modules (start only)
for module in $CBQ_PROBE; do
$MP $module || cbq_failure "failed to load module $module"
done
### If we are in compile/nocache/logging mode, don't bother with cache
if [ "$1" != "compile" -a "$2" != "nocache" -a -z "$CBQ_DEBUG" ]; then
VALID=1
### validate the cache
[ "$2" = "invalidate" -o ! -f $CBQ_CACHE ] && VALID=0
if [ $VALID -eq 1 ]; then
[ `find $CBQ_PATH -maxdepth 1 -newer $CBQ_CACHE| \
wc -l` -gt 0 ] && VALID=0
fi
### compile the config if the cache is invalid
if [ $VALID -ne 1 ]; then
$0 compile > $CBQ_CACHE ||
cbq_fail_off "failed to compile CBQ configuration!"
fi
### run the cached commands
exec /bin/sh $CBQ_CACHE 2> /dev/null
fi
### Load DEVICES, DEVFIELDS and CLASSLIST
cbq_init $CBQ_PATH
### Setup root qdisc on all configured devices
for dev in $DEVICES; do
### Retrieve device bandwidth and, optionally, weight
DEVTEMP=`echo "$DEVFIELDS"| sed -n "/^$dev,/ { s/$dev,//; p; q; }"`
DEVBWDT=${DEVTEMP%%,*}; DEVWGHT=${DEVTEMP##*,}
[ "$DEVBWDT" = "$DEVWGHT" ] && DEVWGHT=""
### Device bandwidth is required
if [ -z "$DEVBWDT" ]; then
cbq_message "could not determine bandwidth for device $dev!"
cbq_failure "please set up the DEVICE fields properly!"
fi
### Check if the device is there
ip link show $dev &> /dev/null ||
cbq_fail_off "device $dev not found!"
### Remove old root qdisc from device
cbq_device_off $dev
### Setup root qdisc + class for device
tc qdisc add dev $dev root handle 1 cbq \
bandwidth $DEVBWDT avpkt $AVPKT cell 8
### Set weight of the root class if set
[ -n "$DEVWGHT" ] &&
tc class change dev $dev root cbq weight $DEVWGHT allot 1514
[ "$1" = "compile" ] && echo
done # dev
### Setup traffic classes
for classfile in $CLASSLIST; do
cbq_load_class $CBQ_PATH $classfile
### Create the class
tc class add dev $DEVICE parent 1:$PARENT classid 1:$CLASS cbq \
bandwidth $BANDWIDTH rate $RATE weight $WEIGHT prio $PRIO \
allot 1514 cell 8 maxburst 20 avpkt $AVPKT $BOUNDED $ISOLATED ||
cbq_fail_off "failed to add class $CLASS with parent $PARENT on $DEVICE!"
### Create leaf qdisc if set
if [ "$LEAF" = "tbf" ]; then
tc qdisc add dev $DEVICE parent 1:$CLASS handle $CLASS tbf \
rate $RATE buffer $BUFFER limit $LIMIT mtu $MTU $PEAK
elif [ "$LEAF" = "sfq" ]; then
tc qdisc add dev $DEVICE parent 1:$CLASS handle $CLASS sfq \
$PERTURB $QUANTUM
fi
### Create fw filter for MARK fields
for mark in `echo "$CFILE"| sed -n '/^MARK/ { s/.*=//; p; }'`; do
### Attach fw filter to root class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_MARK handle $mark fw classid 1:$CLASS
done ### mark
### Create route filter for REALM fields
for realm in `echo "$CFILE"| sed -n '/^REALM/ { s/.*=//; p; }'`; do
### Split realm into source & destination realms
SREALM=${realm%%,*}; DREALM=${realm##*,}
[ "$SREALM" = "$DREALM" ] && SREALM=""
### Convert asterisks to empty strings
SREALM=${SREALM#\*}; DREALM=${DREALM#\*}
### Attach route filter to the root class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_REALM route ${SREALM:+from $SREALM} \
${DREALM:+to $DREALM} classid 1:$CLASS
done ### realm
### Create u32 filter for RULE fields
for rule in `echo "$CFILE"| sed -n '/^RULE/ { s/.*=//; p; }'`; do
### Split rule into source & destination
SRC=${rule%%,*}; DST=${rule##*,}
[ "$SRC" = "$rule" ] && SRC=""
### Split destination into address, port & mask fields
DADDR=${DST%%:*}; DTEMP=${DST##*:}
[ "$DADDR" = "$DST" ] && DTEMP=""
DPORT=${DTEMP%%/*}; DMASK=${DTEMP##*/}
[ "$DPORT" = "$DTEMP" ] && DMASK="0xffff"
### Split up source (if specified)
SADDR=""; SPORT=""
if [ -n "$SRC" ]; then
SADDR=${SRC%%:*}; STEMP=${SRC##*:}
[ "$SADDR" = "$SRC" ] && STEMP=""
SPORT=${STEMP%%/*}; SMASK=${STEMP##*/}
[ "$SPORT" = "$STEMP" ] && SMASK="0xffff"
fi
### Convert asterisks to empty strings
SADDR=${SADDR#\*}; DADDR=${DADDR#\*}
### Compose u32 filter rules
u32_s="${SPORT:+match ip sport $SPORT $SMASK}"
u32_s="${SADDR:+match ip src $SADDR} $u32_s"
u32_d="${DPORT:+match ip dport $DPORT $DMASK}"
u32_d="${DADDR:+match ip dst $DADDR} $u32_d"
### Uncomment the following if you want to see parsed rules
#echo "$rule: $u32_s $u32_d"
### Attach u32 filter to the appropriate class
tc filter add dev $DEVICE parent 1:0 protocol ip \
prio $PRIO_RULE u32 $u32_s $u32_d classid 1:$CLASS
done ### rule
[ "$1" = "compile" ] && echo
done ### classfile
;;
#############################################################################
################################# TIME CHECK ################################
#############################################################################
timecheck)
### Get time + weekday
TIME_TMP=`date +%w/%k:%M`
TIME_DOW=${TIME_TMP%%/*}
TIME_NOW=${TIME_TMP##*/}
### Load DEVICES, DEVFIELDS and CLASSLIST
cbq_init $CBQ_PATH
### Run through all classes
for classfile in $CLASSLIST; do
### Gather all TIME rules from class config
TIMESET=`sed -n 's/#.*//; s/[[:space:]]//g; /^TIME/ { s/.*=//; p; }' \
$CBQ_PATH/$classfile`
[ -z "$TIMESET" ] && continue
MATCH=0; CHANGE=0
for timerule in $TIMESET; do
TIME_ABS=`cbq_time2abs $TIME_NOW`
### Split TIME rule to pieces
TIMESPEC=${timerule%%;*}; PARAMS=${timerule##*;}
WEEKDAYS=${TIMESPEC%%/*}; INTERVAL=${TIMESPEC##*/}
BEG_TIME=${INTERVAL%%-*}; END_TIME=${INTERVAL##*-}
### Check the day-of-week (if present)
[ "$WEEKDAYS" != "$INTERVAL" -a \
-n "${WEEKDAYS##*$TIME_DOW*}" ] && continue
### Compute interval boundaries
BEG_ABS=`cbq_time2abs $BEG_TIME`
END_ABS=`cbq_time2abs $END_TIME`
### Midnight wrap fixup
if [ $BEG_ABS -gt $END_ABS ]; then
[ $TIME_ABS -le $END_ABS ] &&
TIME_ABS=$[TIME_ABS + 24*60]
END_ABS=$[END_ABS + 24*60]
fi
### If the time matches, remember params and set MATCH flag
if [ $TIME_ABS -ge $BEG_ABS -a $TIME_ABS -lt $END_ABS ]; then
TMP_RATE=${PARAMS%%/*}; PARAMS=${PARAMS#*/}
TMP_WGHT=${PARAMS%%/*}; TMP_PEAK=${PARAMS##*/}
[ "$TMP_PEAK" = "$TMP_WGHT" ] && TMP_PEAK=""
TMP_PEAK=${TMP_PEAK:+peakrate $TMP_PEAK}
MATCH=1
fi
done ### timerule
cbq_load_class $CBQ_PATH $classfile
### Get current RATE of CBQ class
RATE_NOW=`tc class show dev $DEVICE| sed -n \
"/cbq 1:$CLASS / { s/.*rate //; s/ .*//; p; q; }"`
[ -z "$RATE_NOW" ] && continue
### Time interval matched
if [ $MATCH -ne 0 ]; then
### Check if there is any change in class RATE
if [ "$RATE_NOW" != "$TMP_RATE" ]; then
NEW_RATE="$TMP_RATE"
NEW_WGHT="$TMP_WGHT"
NEW_PEAK="$TMP_PEAK"
CHANGE=1
fi
### Match not found, reset to default RATE if necessary
elif [ "$RATE_NOW" != "$RATE" ]; then
NEW_WGHT="$WEIGHT"
NEW_RATE="$RATE"
NEW_PEAK="$PEAK"
CHANGE=1
fi
### If there are no changes, go for next class
[ $CHANGE -eq 0 ] && continue
### Replace CBQ class
tc class replace dev $DEVICE classid 1:$CLASS cbq \
bandwidth $BANDWIDTH rate $NEW_RATE weight $NEW_WGHT prio $PRIO \
allot 1514 cell 8 maxburst 20 avpkt $AVPKT $BOUNDED $ISOLATED
### Replace leaf qdisc (if any)
if [ "$LEAF" = "tbf" ]; then
tc qdisc replace dev $DEVICE handle $CLASS tbf \
rate $NEW_RATE buffer $BUFFER limit $LIMIT mtu $MTU $NEW_PEAK
fi
cbq_message "$TIME_NOW: class $CLASS on $DEVICE changed rate ($RATE_NOW -> $NEW_RATE)"
done ### class file
;;
#############################################################################
################################## THE REST #################################
#############################################################################
stop)
cbq_off
;;
list)
cbq_show
;;
stats)
cbq_show -s
;;
restart)
shift
$0 stop
$0 start "$@"
;;
*)
echo "Usage: `basename $0` {start|compile|stop|restart|timecheck|list|stats}"
esac

View File

@ -1,76 +0,0 @@
#! /bin/sh
TC=/home/root/tc
IP=/home/root/ip
DEVICE=eth1
BANDWIDTH="bandwidth 10Mbit"
# Attach CBQ on $DEVICE. It will have handle 1:.
# $BANDWIDTH is real $DEVICE bandwidth (10Mbit).
# avpkt is average packet size.
# mpu is minimal packet size.
$TC qdisc add dev $DEVICE root handle 1: cbq \
$BANDWIDTH avpkt 1000 mpu 64
# Create root class with classid 1:1. This step is not necessary.
# bandwidth is the same as on CBQ itself.
# rate == all the bandwidth
# allot is MTU + MAC header
# maxburst measure allowed class burstiness (please,read S.Floyd and VJ papers)
# est 1sec 8sec means, that kernel will evaluate average rate
# on this class with period 1sec and time constant 8sec.
# This rate is viewed with "tc -s class ls dev $DEVICE"
$TC class add dev $DEVICE parent 1:0 classid :1 est 1sec 8sec cbq \
$BANDWIDTH rate 10Mbit allot 1514 maxburst 50 avpkt 1000
# Bulk.
# New parameters are:
# weight, which is set to be proportional to
# "rate". It is not necessary, weight=1 will work as well.
# defmap and split say that best effort ttraffic, not classfied
# by another means will fall to this class.
$TC class add dev $DEVICE parent 1:1 classid :2 est 1sec 8sec cbq \
$BANDWIDTH rate 4Mbit allot 1514 weight 500Kbit \
prio 6 maxburst 50 avpkt 1000 split 1:0 defmap ff3d
# OPTIONAL.
# Attach "sfq" qdisc to this class, quantum is MTU, perturb
# gives period of hash function perturbation in seconds.
#
$TC qdisc add dev $DEVICE parent 1:2 sfq quantum 1514b perturb 15
# Interactive-burst class
$TC class add dev $DEVICE parent 1:1 classid :3 est 2sec 16sec cbq \
$BANDWIDTH rate 1Mbit allot 1514 weight 100Kbit \
prio 2 maxburst 100 avpkt 1000 split 1:0 defmap c0
$TC qdisc add dev $DEVICE parent 1:3 sfq quantum 1514b perturb 15
# Background.
$TC class add dev $DEVICE parent 1:1 classid :4 est 1sec 8sec cbq \
$BANDWIDTH rate 100Kbit allot 1514 weight 10Mbit \
prio 7 maxburst 10 avpkt 1000 split 1:0 defmap 2
$TC qdisc add dev $DEVICE parent 1:4 sfq quantum 1514b perturb 15
# Realtime class for RSVP
$TC class add dev $DEVICE parent 1:1 classid 1:7FFE cbq \
rate 5Mbit $BANDWIDTH allot 1514b avpkt 1000 \
maxburst 20
# Reclassified realtime traffic
#
# New element: split is not 1:0, but 1:7FFE. It means,
# that only real-time packets, which violated policing filters
# or exceeded reshaping buffers will fall to it.
$TC class add dev $DEVICE parent 1:7FFE classid 1:7FFF est 4sec 32sec cbq \
rate 1Mbit $BANDWIDTH allot 1514b avpkt 1000 weight 10Kbit \
prio 6 maxburst 10 split 1:7FFE defmap ffff

View File

@ -1,446 +0,0 @@
#!/bin/bash
#
# dhclient-script for Linux.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version
# 2 of the License, or (at your option) any later version.
#
# Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
#
# Probably, I did not understand, what this funny feature as "alias"
# means exactly. For now I suppose, that it is a static address, which
# we should install and preserve.
#
exec >> /tmp/DHS.log 2>&1
echo dhc-script $* reason=$reason
set | grep "^\(old_\|new_\|check_\)"
LOG () {
echo LOG $* ;
}
# convert 8bit mask to length
# arg: $1 = mask
#
Mask8ToLen() {
local l=0;
while [ $l -le 7 ]; do
if [ $[ ( 1 << $l ) + $1 ] -eq 256 ]; then
return $[ 8 - $l ]
fi
l=$[ $l + 1 ]
done
return 0;
}
# convert inet dotted quad mask to length
# arg: $1 = dotquad mask
#
MaskToLen() {
local masklen=0
local mask8=$1
case $1 in
0.0.0.0)
return 0;
;;
255.*.0.0)
masklen=8
mask8=${mask8#255.}
mask8=${mask8%.0.0}
;;
255.255.*.0)
masklen=16
mask8=${mask8#255.255.}
mask8=${mask8%.0}
;;
255.255.255.*)
masklen=24
mask8=${mask8#255.255.255.}
;;
*)
return 255
;;
esac
Mask8ToLen $mask8
return $[ $? + $masklen ]
}
# calculate ABC "natural" mask
# arg: $1 = dotquad address
#
ABCMask () {
local class;
class=${1%%.*}
if [ "$1" = "255.255.255.255" ]; then
echo $1
elif [ "$1" = "0.0.0.0" ]; then
echo $1
elif [ $class -ge 224 ]; then
echo 240.0.0.0
elif [ $class -ge 192 ]; then
echo 255.255.255.0
elif [ $class -ge 128 ]; then
echo 255.255.0.0
else
echo 255.0.0.0
fi
}
# calculate ABC "natural" mask length
# arg: $1 = dotquad address
#
ABCMaskLen () {
local class;
class=${1%%.*}
if [ "$1" = "255.255.255.255" ]; then
return 32
elif [ "$1" = "0.0.0.0" ]; then
return 0
elif [ $class -ge 224 ]; then
return 4;
elif [ $class -ge 192 ]; then
return 24;
elif [ $class -ge 128 ]; then
return 16;
else
return 8;
fi
}
# Delete IP address
# args: $1 = interface
# $2 = address
# $3 = mask
# $4 = broadcast
# $5 = label
#
DelINETAddr () {
local masklen=32
local addrid=$1
LOG DelINETAddr $*
if [ "$5" ]; then
addrid=$addrid:$5
fi
LOG ifconfig $addrid down
ifconfig $addrid down
}
# Add IP address
# args: $1 = interface
# $2 = address
# $3 = mask
# $4 = broadcast
# $5 = label
#
AddINETAddr () {
local mask_arg
local brd_arg
local addrid=$1
LOG AddINETAddr $*
if [ "$5" ]; then
addrid=$addrid:$5
fi
if [ "$3" ]; then
mask_arg="netmask $3"
fi
if [ "$4" ]; then
brd_arg="broadcast $4"
fi
LOG ifconfig $addrid $2 $mask_arg $brd_arg up
ifconfig $addrid $2 $mask_arg $brd_arg up
}
# Add default routes
# args: $1 = routers list
#
AddDefaultRoutes() {
local router
if [ "$1" ]; then
LOG AddDefaultRoutes $*
for router in $1; do
LOG route add default gw $router
route add default gw $router
done ;
fi
}
# Delete default routes
# args: $1 = routers list
#
DelDefaultRoutes() {
local router
if [ "$1" ]; then
LOG DelDefaultRoutes $*
for router in $1; do
LOG route del default gw $router
route del default gw $router
done
fi
}
# ping a host
# args: $1 = dotquad address of the host
#
PingNode() {
LOG PingNode $*
if ping -q -c 1 -w 2 $1 ; then
return 0;
fi
return 1;
}
# Check (and add route, if alive) default routers
# args: $1 = routers list
# returns: 0 if at least one router is alive.
#
CheckRouterList() {
local router
local succeed=1
LOG CheckRouterList $*
for router in $1; do
if PingNode $router ; then
succeed=0
route add default gw $router
fi
done
return $succeed
}
# Delete/create static routes.
# args: $1 = operation (del/add)
# $2 = routes list in format "dst1 nexthop1 dst2 ..."
#
# BEWARE: this feature of DHCP is obsolete, because does not
# support subnetting.
#
X-StaticRouteList() {
local op=$1
local lst="$2"
local masklen
LOG X-StaticRouteList $*
if [ "$lst" ]; then
set $lst
while [ $# -gt 1 ]; do
route $op -net $1 netmask `ABCMask "$1"` gw $2
shift; shift;
done
fi
}
# Create static routes.
# arg: $1 = routes list in format "dst1 nexthop1 dst2 ..."
#
AddStaticRouteList() {
LOG AddStaticRouteList $*
X-StaticRouteList add "$1"
}
# Delete static routes.
# arg: $1 = routes list in format "dst1 nexthop1 dst2 ..."
#
DelStaticRouteList() {
LOG DelStaticRouteList $*
X-StaticRouteList del "$1"
}
# Broadcast unsolicited ARP to update neighbours' caches.
# args: $1 = interface
# $2 = address
#
UnsolicitedARP() {
if [ -f /sbin/arping ]; then
/sbin/arping -A -c 1 -I "$1" "$2" &
(sleep 2 ; /sbin/arping -U -c 1 -I "$1" "$2" ) &
fi
}
# Duplicate address detection.
# args: $1 = interface
# $2 = test address
# returns: 0, if DAD succeeded.
DAD() {
if [ -f /sbin/arping ]; then
/sbin/arping -c 2 -w 3 -D -I "$1" "$2"
return $?
fi
return 0
}
# Setup resolver.
# args: NO
# domain and nameserver list are passed in global variables.
#
# NOTE: we try to be careful and not to break user supplied resolv.conf.
# The script mangles it, only if it has dhcp magic signature.
#
UpdateDNS() {
local nameserver
local idstring="#### Generated by DHCPCD"
LOG UpdateDNS $*
if [ "$new_domain_name" = "" -a "$new_domain_name_servers" = "" ]; then
return 0;
fi
echo $idstring > /etc/resolv.conf.dhcp
if [ "$new_domain_name" ]; then
echo search $new_domain_name >> /etc/resolv.conf.dhcp
fi
echo options ndots:1 >> /etc/resolv.conf.dhcp
if [ "$new_domain_name_servers" ]; then
for nameserver in $new_domain_name_servers; do
echo nameserver $nameserver >> /etc/resolv.conf.dhcp
done
else
echo nameserver 127.0.0.1 >> /etc/resolv.conf.dhcp
fi
if [ -f /etc/resolv.conf ]; then
if [ "`head -1 /etc/resolv.conf`" != "$idstring" ]; then
return 0
fi
if [ "$old_domain_name" = "$new_domain_name" -a
"$new_domain_name_servers" = "$old_domain_name_servers" ]; then
return 0
fi
fi
mv /etc/resolv.conf.dhcp /etc/resolv.conf
}
case $reason in
NBI)
exit 1
;;
MEDIUM)
exit 0
;;
PREINIT)
ifconfig $interface:dhcp down
ifconfig $interface:dhcp1 down
if [ -d /proc/sys/net/ipv4/conf/$interface ]; then
ifconfig $interface:dhcp 10.10.10.10 netmask 255.255.255.255
ifconfig $interface:dhcp down
if [ -d /proc/sys/net/ipv4/conf/$interface ]; then
LOG The interface $interface already configured.
fi
fi
ifconfig $interface:dhcp up
exit 0
;;
ARPSEND)
exit 0
;;
ARPCHECK)
if DAD "$interface" "$check_ip_address" ; then
exit 0
fi
exit 1
;;
BOUND|RENEW|REBIND|REBOOT)
if [ "$old_ip_address" -a "$alias_ip_address" -a \
"$alias_ip_address" != "$old_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
if [ "$old_ip_address" -a "$old_ip_address" != "$new_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
fi
if [ "$old_ip_address" = "" -o "$old_ip_address" != "$new_ip_address" -o \
"$reason" = "BOUND" -o "$reason" = "REBOOT" ]; then
AddINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
AddStaticRouteList "$new_static_routes"
AddDefaultRoutes "$new_routers"
UnsolicitedARP "$interface" "$new_ip_address"
fi
if [ "$new_ip_address" != "$alias_ip_address" -a "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
UpdateDNS
exit 0
;;
EXPIRE|FAIL)
if [ "$alias_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
if [ "$old_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
fi
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 0
;;
TIMEOUT)
if [ "$alias_ip_address" ]; then
DelINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
# Seems, <null address> means, that no more old leases found.
# Or does it mean bug in dhcpcd? 8) Fail for now.
if [ "$new_ip_address" = "<null address>" ]; then
if [ "$old_ip_address" ]; then
DelINETAddr "$interface" "$old_ip_address" "$old_subnet_mask" "$old_broadcast_address" dhcp
fi
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 1
fi
if DAD "$interface" "$new_ip_address" ; then
AddINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
UnsolicitedARP "$interface" "$new_ip_address"
if [ "$alias_ip_address" -a "$alias_ip_address" != "$new_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
UnsolicitedARP "$interface" "$alias_ip_address"
fi
if CheckRouterList "$new_routers" ; then
AddStaticRouteList "$new_static_routes"
UpdateDNS
exit 0
fi
fi
DelINETAddr "$interface" "$new_ip_address" "$new_subnet_mask" "$new_broadcast_address" dhcp
DelDefaultRoutes "$old_routers"
DelStaticRouteList "$old_static_routes"
if [ "$alias_ip_address" ]; then
AddINETAddr "$interface" "$alias_ip_address" "$alias_subnet_mask" "$alias_broadcast_address" dhcp1
fi
exit 1
;;
esac
exit 0

View File

@ -1,68 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script just tags on the ingress interfac using Ipchains
# the result is used for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
#
############################################################
$IPCHAINS -A input -s 10.2.0.4/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64 set_tc_index
#
# values of the DSCP to change depending on the class
#
#becomes EF
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
#becomes AF11
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
#becomes AF21
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent 1:0
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0

View File

@ -1,87 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script tags the fwmark on the ingress interface using IPchains
# the result is used first for policing on the Ingress interface then
# for fast classification and re-marking
# on the egress interface
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
#
# tag all incoming packets from host 10.2.0.24 to value 1
# tag all incoming packets from host 10.2.0.3 to value 2
# tag the rest of incoming packets from subnet 10.2.0.0/24 to value 3
#These values are used in the egress
############################################################
$IPCHAINS -A input -s 10.2.0.0/24 -m 3
$IPCHAINS -A input -i $INDEV -s 10.2.0.24 -m 1
$IPCHAINS -A input -i $INDEV -s 10.2.0.3 -m 2
############################################################
#
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# attach a fw classifier to the ingress which polices anything marked
# by ipchains to tag value 3 (The rest of the subnet packets -- not
# tag 1 or 2) to not go beyond 1.5Mbps
# Allow up to at least 60 packets to burst (assuming maximum packet
# size of # 1.5 KB) in the long run and upto about 6 packets in the
# shot run
############################################################
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 3 fw \
police rate 1500kbit burst 90k mtu 9k drop flowid :1
############################################################
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0xb8
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x28
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x48
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 1 fw classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 2 fw classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 4 handle 3 fw classid 1:3
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $DEV ingress

View File

@ -1,170 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/usr/src/iproute2-current
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth0
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# Create u32 filters
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1: u32 \
divisor 1
############################################################
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter2=" police index 2 rate $CIR2 burst $CBS1 "
meter3=" police index 3 rate $CIR2 burst $CBS2 "
meter4=" police index 4 rate $CIR1 burst $CBS2 "
meter5=" police index 5 rate $CIR1 burst $CBS2 "
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesnt exceed its CIR/CBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x90 0xfc \
$meter3 \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesnt exceed its CIR/CBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x98 0xfc \
$meter3 \
drop flowid :3
#
# *********************** BE ***************************
#
# Anything else (not from the AF4*) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
# Note that the BE class is also used by the AF4* in the worst
# case
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,132 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=1000kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR2 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesnt exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter1 \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter2 \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3 \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,198 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color aware mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
# The colors are defined using the Diffserv Fields
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=500kbit
# the PIR is what is in excess of the CIR
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 20 max sized packets
EBS1=30k
EBS2=30k
# The meters: Note that we have shared meters in this case as identified
# by the index parameter
meter1=" police index 1 rate $CIR1 burst $CBS1 "
meter1a=" police index 2 rate $PIR1 burst $EBS1 "
meter2=" police index 3 rate $CIR2 burst $CBS1 "
meter2a=" police index 4 rate $PIR2 burst $EBS1 "
meter3=" police index 5 rate $CIR2 burst $CBS2 "
meter3a=" police index 6 rate $PIR2 burst $EBS2 "
meter4=" police index 7 rate $CIR1 burst $CBS2 "
############################################################
#
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
# *********************** AF41 ***************************
#AF41 (DSCP 0x22) from is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 1 is used.
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip tos 0x88 0xfc \
$meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip tos 0x88 0xfc \
$meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
# tcindex value of 2
# policer 2 is used
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip tos 0x88 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip tos 0x88 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip tos 0x88 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip tos 0x88 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF42 ***************************
#AF42 (DSCP 0x24) from is passed on with a tcindex value 2
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 2 is used. Note that this is shared with the AF41
#
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 8 u32 \
match ip tos 0x90 0xfc \
$meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 9 u32 \
match ip tos 0x90 0xfc \
$meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3 (policer 3)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 10 u32 \
match ip tos 0x90 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 11 u32 \
match ip tos 0x90 0xfc \
$meter3a \
drop flowid :3
#
# *********************** AF43 ***************************
#
#AF43 (DSCP 0x26) from is passed on with a tcindex value 3
#if it doesnt exceed its CIR/CBS + PIR/EBS
#policer 3 is used. Note that this is shared with the AF41 and AF42
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 13 u32 \
match ip tos 0x98 0xfc \
$meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 14 u32 \
match ip tos 0x98 0xfc \
$meter3a \
drop flowid :3
#
## *********************** BE ***************************
##
## Anything else (not from the AF4*) gets discarded if it
## exceeds 1Mbps and by default goes to BE if it doesnt
## Note that the BE class is also used by the AF4* in the worst
## case
##
$TC filter add dev $INDEV parent ffff: protocol ip prio 16 u32 \
match ip src 0/0\
$meter4 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,144 +0,0 @@
#! /bin/sh -x
#
# sample script on using the ingress capabilities
# This script fwmark tags(IPchains) based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with no PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.1)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
IPCHAINS=/root/DS-6-beta/ipchains-1.3.9/ipchains
INDEV=eth2
EGDEV="dev eth1"
CIR1=1500kbit
CIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
meter1="police rate $CIR1 burst $CBS1 "
meter1a="police rate $CIR2 burst $CBS1 "
meter2="police rate $CIR1 burst $CBS2 "
meter2a="police rate $CIR2 burst $CBS2 "
meter3="police rate $CIR2 burst $CBS1 "
meter3a="police rate $CIR2 burst $CBS1 "
meter4="police rate $CIR2 burst $CBS2 "
meter5="police rate $CIR1 burst $CBS2 "
#
# tag the rest of incoming packets from subnet 10.2.0.0/24 to fw value 1
# tag all incoming packets from any other subnet to fw tag 2
############################################################
$IPCHAINS -A input -i $INDEV -s 0/0 -m 2
$IPCHAINS -A input -i $INDEV -s 10.2.0.0/24 -m 1
#
############################################################
# install the ingress qdisc on the ingress interface
$TC qdisc add dev $INDEV handle ffff: ingress
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
############################################################
#
# anything with fw tag of 1 is passed on with a tcindex value 1
#if it doesnt exceed its allocated rate (CIR/CBS)
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 handle 1 fw \
$meter1 \
continue flowid 4:1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 handle 1 fw \
$meter1a \
continue flowid 4:1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 handle 1 fw \
$meter2 \
continue flowid 4:2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 handle 1 fw \
$meter2a \
continue flowid 4:2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 handle 1 fw \
$meter3 \
continue flowid 4:3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 handle 1 fw \
$meter3a \
drop flowid 4:3
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 handle 2 fw \
$meter5 \
drop flowid 4:4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:4 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping (using tcindex; could easily have
# replaced it with the fw classifier instead)
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,145 +0,0 @@
#! /bin/sh
#
# sample script on using the ingress capabilities using u32 classifier
# This script tags tcindex based on metering on the ingress
# interface the result is used for fast classification and re-marking
# on the egress interface
# This is an example of a color blind mode marker with PIR configured
# based on draft-wahjak-mcm-00.txt (section 3.2)
#
#path to various utilities;
#change to reflect yours.
#
IPROUTE=/root/DS-6-beta/iproute2-990530-dsing
TC=$IPROUTE/tc/tc
IP=$IPROUTE/ip/ip
INDEV=eth2
EGDEV="dev eth1"
CIR1=1000kbit
CIR2=1000kbit
# The PIR is the excess (in addition to the CIR i.e if always
# going to the PIR --> average rate is CIR+PIR)
PIR1=1000kbit
PIR2=500kbit
#The CBS is about 60 MTU sized packets
CBS1=90k
CBS2=90k
#the EBS is about 10 max sized packets
EBS1=15k
EBS2=15k
# The meters
meter1=" police rate $CIR1 burst $CBS1 "
meter1a=" police rate $PIR1 burst $EBS1 "
meter2=" police rate $CIR2 burst $CBS1 "
meter2a="police rate $PIR2 burst $CBS1 "
meter3=" police rate $CIR2 burst $CBS2 "
meter3a=" police rate $PIR2 burst $EBS2 "
meter4=" police rate $CIR1 burst $CBS2 "
meter5=" police rate $CIR1 burst $CBS2 "
# install the ingress qdisc on the ingress interface
############################################################
$TC qdisc add dev $INDEV handle ffff: ingress
############################################################
#
############################################################
# All packets are marked with a tcindex value which is used on the egress
# NOTE: tcindex 1 maps to AF41, 2->AF42, 3->AF43, 4->BE
#
#anything from subnet 10.2.0.2/24 is passed on with a tcindex value 1
#if it doesnt exceed its CIR/CBS + PIR/EBS
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 1 u32 \
match ip src 10.2.0.0/24 $meter1 \
continue flowid :1
$TC filter add dev $INDEV parent ffff: protocol ip prio 2 u32 \
match ip src 10.2.0.0/24 $meter1a \
continue flowid :1
#
# if it exceeds the above but not the extra rate/burst below, it gets a
#tcindex value of 2
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 3 u32 \
match ip src 10.2.0.0/24 $meter2 \
continue flowid :2
$TC filter add dev $INDEV parent ffff: protocol ip prio 4 u32 \
match ip src 10.2.0.0/24 $meter2a \
continue flowid :2
#
# if it exceeds the above but not the rule below, it gets a tcindex value
# of 3
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 5 u32 \
match ip src 10.2.0.0/24 $meter3 \
continue flowid :3
$TC filter add dev $INDEV parent ffff: protocol ip prio 6 u32 \
match ip src 10.2.0.0/24 $meter3a \
drop flowid :3
#
#
# Anything else (not from the subnet 10.2.0.24/24) gets discarded if it
# exceeds 1Mbps and by default goes to BE if it doesnt
#
$TC filter add dev $INDEV parent ffff: protocol ip prio 7 u32 \
match ip src 0/0 $meter5 \
drop flowid :4
######################## Egress side ########################
# attach a dsmarker
#
$TC qdisc add $EGDEV handle 1:0 root dsmark indices 64
#
# values of the DSCP to change depending on the class
#note that the ECN bits are masked out
#
#AF41 (0x88 is 0x22 shifted to the right by two bits)
#
$TC class change $EGDEV classid 1:1 dsmark mask 0x3 \
value 0x88
#AF42
$TC class change $EGDEV classid 1:2 dsmark mask 0x3 \
value 0x90
#AF43
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x98
#BE
$TC class change $EGDEV classid 1:3 dsmark mask 0x3 \
value 0x0
#
#
# The class mapping
#
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 1 tcindex classid 1:1
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 2 tcindex classid 1:2
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 3 tcindex classid 1:3
$TC filter add $EGDEV parent 1:0 protocol ip prio 1 \
handle 4 tcindex classid 1:4
#
#
echo "---- qdisc parameters Ingress ----------"
$TC qdisc ls dev $INDEV
echo "---- Class parameters Ingress ----------"
$TC class ls dev $INDEV
echo "---- filter parameters Ingress ----------"
$TC filter ls dev $INDEV parent ffff:
echo "---- qdisc parameters Egress ----------"
$TC qdisc ls $EGDEV
echo "---- Class parameters Egress ----------"
$TC class ls $EGDEV
echo "---- filter parameters Egress ----------"
$TC filter ls $EGDEV parent 1:0
#
#deleting the ingress qdisc
#$TC qdisc del $INDEV ingress

View File

@ -1,98 +0,0 @@
Note all these are mere examples which can be customized to your needs
AFCBQ
-----
AF PHB built using CBQ, DSMARK,GRED (default in GRIO mode) ,RED for BE
and the tcindex classifier with some algorithmic mapping
EFCBQ
-----
EF PHB built using CBQ (for rate control and prioritization),
DSMARK( to remark DSCPs), tcindex classifier and RED for the BE
traffic.
EFPRIO
------
EF PHB using the PRIO scheduler, Token Bucket to rate control EF,
tcindex classifier, DSMARK to remark, and RED for the BE traffic
EDGE scripts
==============
CB-3(1|2)-(u32/chains)
======================
The major differences are that the classifier is u32 on -u32 extension
and IPchains on the chains extension. CB stands for color Blind
and 31 is for the mode where only a CIR and CBS are defined whereas
32 stands for a mode where a CIR/CBS + PIR/EBS are defined.
Color Blind (CB)
==========-----=
We look at one special subnet that we are interested in for simplicty
reasons to demonstrate the capability. We send the packets from that
subnet to AF4*, BE or end up dropping depending on the metering results.
The algorithm overview is as follows:
*classify:
**case: subnet X
----------------
if !exceed meter1 tag as AF41
else
if !exceed meter2 tag as AF42
else
if !exceed meter 3 tag as AF43
else
drop
default case: Any other subnet
-------------------------------
if !exceed meter 5 tag as AF43
else
drop
One Egress side change the DSCPs of the packets to reflect AF4* and BE
based on the tags from the ingress.
-------------------------------------------------------------
Color Aware
===========
Define some meters with + policing and give them IDs eg
meter1=police index 1 rate $CIR1 burst $CBS1
meter2=police index 2 rate $CIR2 burst $CBS2 etc
General overview:
classify based on the DSCPs and use the policer ids to decide tagging
*classify on ingress:
switch (dscp) {
case AF41: /* tos&0xfc == 0x88 */
if (!exceed meter1) break;
case AF42: /* tos&0xfc == 0x90 */
if (!exceed meter2) {
tag as AF42;
break;
}
case AF43: /* tos&0xfc == 0x98 */
if (!exceed meter3) {
tag as AF43;
break;
} else
drop;
default:
if (!exceed meter4) tag as BE;
else drop;
}
On the Egress side mark the proper AF tags

View File

@ -1,105 +0,0 @@
#!/usr/bin/perl
#
#
# AF using CBQ for a single interface eth0
# 4 AF classes using GRED and one BE using RED
# Things you might want to change:
# - the device bandwidth (set at 10Mbits)
# - the bandwidth allocated for each AF class and the BE class
# - the drop probability associated with each AF virtual queue
#
# AF DSCP values used (based on AF draft 04)
# -----------------------------------------
# AF DSCP values
# AF1 1. 0x0a 2. 0x0c 3. 0x0e
# AF2 1. 0x12 2. 0x14 3. 0x16
# AF3 1. 0x1a 2. 0x1c 3. 0x1e
# AF4 1. 0x22 2. 0x24 3. 0x26
#
#
# A simple DSCP-class relationship formula used to generate
# values in the for loop of this script; $drop stands for the
# DP
# $dscp = ($class*8+$drop*2)
#
# if you use GRIO buffer sharing, then GRED priority is set as follows:
# $gprio=$drop+1;
#
$TC = "/usr/src/iproute2-current/tc/tc";
$DEV = "dev lo";
$DEV = "dev eth1";
$DEV = "dev eth0";
# the BE-class number
$beclass = "5";
#GRIO buffer sharing on or off?
$GRIO = "";
$GRIO = "grio";
# The bandwidth of your device
$linerate="10Mbit";
# The BE and AF rates
%rate_table=();
$berate="1500Kbit";
$rate_table{"AF1rate"}="1500Kbit";
$rate_table{"AF2rate"}="1500Kbit";
$rate_table{"AF3rate"}="1500Kbit";
$rate_table{"AF4rate"}="1500Kbit";
#
#
#
print "\n# --- General setup ---\n";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex mask 0xfc " .
"shift 2 pass_on\n";
#"shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth $linerate ".
"cell 8 avpkt 1000 mpu 64\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 tcindex ".
"mask 0xf0 shift 4 pass_on\n";
for $class (1..4) {
print "\n# --- AF Class $class specific setup---\n";
$AFrate=sprintf("AF%drate",$class);
print "$TC class add $DEV parent 2:0 classid 2:$class cbq ".
"bandwidth $linerate rate $rate_table{$AFrate} avpkt 1000 prio ".
(6-$class)." bounded allot 1514 weight 1 maxburst 21\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle $class ".
"tcindex classid 2:$class\n";
print "$TC qdisc add $DEV parent 2:$class gred setup DPs 3 default 2 ".
"$GRIO\n";
#
# per DP setup
#
for $drop (1..3) {
print "\n# --- AF Class $class DP $drop---\n";
$dscp = $class*8+$drop*2;
$tcindex = sprintf("1%x%x",$class,$drop);
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 ".
"handle $dscp tcindex classid 1:$tcindex\n";
$prob = $drop*0.02;
if ($GRIO) {
$gprio = $drop+1;
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob ".
"prio $gprio\n";
} else {
print "$TC qdisc change $DEV parent 2:$class gred limit 60KB min 15KB ".
"max 45KB burst 20 avpkt 1000 bandwidth $linerate DP $drop ".
"probability $prob \n";
}
}
}
#
#
print "\n#------BE Queue setup------\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 1:1\n";
print "$TC class add $DEV parent 2:0 classid 2:$beclass cbq ".
"bandwidth $linerate rate $berate avpkt 1000 prio 6 " .
"bounded allot 1514 weight 1 maxburst 21 \n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 handle 0 tcindex ".
"classid 2:5\n";
print "$TC qdisc add $DEV parent 2:5 red limit 60KB min 15KB max 45KB ".
"burst 20 avpkt 1000 bandwidth $linerate probability 0.4\n";

View File

@ -1,25 +0,0 @@
#!/usr/bin/perl
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
$efrate="1.5Mbit";
$MTU="1.5kB";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 prio\n";
#
# EF class: Maximum about one MTU sized packet allowed on the queue
#
print "$TC qdisc add $DEV parent 2:1 tbf rate $efrate burst $MTU limit 1.6kB\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
#
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,31 +0,0 @@
#!/usr/bin/perl
#
$TC = "/root/DS-6-beta/iproute2-990530-dsing/tc/tc";
$DEV = "dev eth1";
print "$TC qdisc add $DEV handle 1:0 root dsmark indices 64 set_tc_index\n";
print "$TC filter add $DEV parent 1:0 protocol ip prio 1 tcindex ".
"mask 0xfc shift 2\n";
print "$TC qdisc add $DEV parent 1:0 handle 2:0 cbq bandwidth ".
"10Mbit cell 8 avpkt 1000 mpu 64\n";
#
# EF class
#
print "$TC class add $DEV parent 2:0 classid 2:1 cbq bandwidth ".
"10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated ".
"allot 1514 weight 1 maxburst 10 \n";
# packet fifo for EF?
print "$TC qdisc add $DEV parent 2:1 pfifo limit 5\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 1 ".
"handle 0x2e tcindex classid 2:1 pass_on\n";
#
# BE class
#
print "#BE class(2:2) \n";
print "$TC class add $DEV parent 2:0 classid 2:2 cbq bandwidth ".
"10Mbit rate 5Mbit avpkt 1000 prio 7 allot 1514 weight 1 ".
"maxburst 21 borrow split 2:0 defmap 0xffff \n";
print "$TC qdisc add $DEV parent 2:2 red limit 60KB ".
"min 15KB max 45KB burst 20 avpkt 1000 bandwidth 10Mbit ".
"probability 0.4\n";
print "$TC filter add $DEV parent 2:0 protocol ip prio 2 ".
"handle 0 tcindex mask 0 classid 2:2 pass_on\n";

View File

@ -1,125 +0,0 @@
These were the tests done to validate the Diffserv scripts.
This document will be updated continously. If you do more
thorough validation testing please post the details to the
diffserv mailing list.
Nevertheless, these tests should serve for basic validation.
AFCBQ, EFCBQ, EFPRIO
----------------------
generate all possible DSCPs and observe that they
get sent to the proper classes. In the case of AF also
to the correct Virtual Queues.
Edge1
-----
generate TOS values 0x0,0x10,0xbb each with IP addresses
10.2.0.24 (mark 1), 10.2.0.3 (mark2) and 10.2.0.30 (mark 3)
and observe that they get marked as expected.
Edge2
-----
-Repeat the tests in Edge1
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 3
Edge31-cb-chains
----------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well). In this case the mark used will be 1.
Metering: The data throughput should not exceed 2*CIR1 + 2*CIR2
which is roughly: 5mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
Edge31-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
approximately 5mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
approximately 3.5mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1.5mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device
Edge32-cb-u32, Edge32-cb-chains
-------------------------------
-ftp with data direction from 10.2.0.2
*observe that the metering/policing works correctly (and the marking
as well).
Metering:
The data throughput should not exceed 2*CIR1 + 2*CIR2
+ 2*PIR2 + PIR1 for u32 which is roughly: 6mbps
The data throughput should not exceed 2*CIR1 + 5*CIR2
for chains which is roughly: 6mbps
Marking: the should be a variation of marked packets:
AF41(TOS=0x88) AF42(0x90) AF43(0x98) and BE (0x0)
TODO:
-More tests required to see the interaction of several sources (other
than subnet 10.2.0.0/24).
-More tests needed to capture stats on how many times the CIR was exceeded
but the data was not remarked etc.
Edge32-ca-u32
--------------
Generate data using modified tcpblast from 10.2.0.2 (behind eth2) to the
discard port of 10.1.0.2 (behind eth1)
1) generate with src tos = 0x88
Metering: Allocated throughput should not exceed 2*CIR1 + 2*CIR2
+PIR1 -- approximately 4mbps
Marking: Should vary between 0x88,0x90,0x98 and 0x0
2) generate with src tos = 0x90
Metering: Allocated throughput should not exceed CIR1 + 2*CIR2
+ 2* PIR2 approximately 3mbps
Marking: Should vary between 0x90,0x98 and 0x0
3) generate with src tos = 0x98
Metering: Allocated throughput should not exceed PIR1+ CIR1 + CIR2
approximately 2.5mbps
Marking: Should vary between 0x98 and 0x0
4) generate with src tos any other than the above
Metering: Allocated throughput should not exceed CIR1
approximately 1mbps
Marking: Should be consistent at 0x0
TODO: Testing on how each color shares when all 4 types of packets
are going through the edge device

View File

@ -1,6 +1,10 @@
# SPDX-License-Identifier: GPL-2.0
GENLOBJ=genl.o
include ../Config
include ../config.mk
SHARED_LIBS ?= y
CFLAGS += -fno-strict-aliasing
GENLMODULES :=
GENLMODULES += ctrl.o
@ -9,15 +13,30 @@ GENLOBJ += $(GENLMODULES)
GENLLIB :=
ifeq ($(SHARED_LIBS),y)
LDFLAGS += -Wl,-export-dynamic
LDLIBS += -lm -ldl
endif
all: genl
genl: $(GENLOBJ) $(LIBNETLINK) $(LIBUTIL) $(GENLLIB)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
install -m 0755 genl $(DESTDIR)$(SBINDIR)
clean:
rm -f $(GENLOBJ) $(GENLLIB) genl
ifneq ($(SHARED_LIBS),y)
genl: static-syms.o
static-syms.o: static-syms.h
static-syms.h: $(wildcard *.c)
files="$^" ; \
for s in `grep -B 3 '\<dlsym' $$files | sed -n '/snprintf/{s:.*"\([^"]*\)".*:\1:;s:%s::;p}'` ; do \
sed -n '/'$$s'[^ ]* =/{s:.* \([^ ]*'$$s'[^ ]*\) .*:extern char \1[] __attribute__((weak)); if (!strcmp(sym, "\1")) return \1;:;p}' $$files ; \
done > $@
endif

View File

@ -13,7 +13,6 @@
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <netinet/in.h>
@ -29,90 +28,19 @@
static int usage(void)
{
fprintf(stderr,"Usage: ctrl <CMD>\n" \
"CMD := get <PARMS> | list | monitor\n" \
"CMD := get <PARMS> | list | monitor | policy <PARMS>\n" \
"PARMS := name <name> | id <id>\n" \
"Examples:\n" \
"\tctrl ls\n" \
"\tctrl monitor\n" \
"\tctrl get name foobar\n" \
"\tctrl get id 0xF\n");
"\tctrl get id 0xF\n"
"\tctrl policy name foobar\n"
"\tctrl policy id 0xF\n");
return -1;
}
int genl_ctrl_resolve_family(const char *family)
{
struct rtnl_handle rth;
struct nlmsghdr *nlh;
struct genlmsghdr *ghdr;
int ret = 0;
struct {
struct nlmsghdr n;
char buf[4096];
} req;
memset(&req, 0, sizeof(req));
nlh = &req.n;
nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_type = GENL_ID_CTRL;
ghdr = NLMSG_DATA(&req.n);
ghdr->cmd = CTRL_CMD_GETFAMILY;
if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
fprintf(stderr, "Cannot open generic netlink socket\n");
exit(1);
}
addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
if (rtnl_talk(&rth, nlh, 0, 0, nlh, NULL, NULL) < 0) {
fprintf(stderr, "Error talking to the kernel\n");
goto errout;
}
{
struct rtattr *tb[CTRL_ATTR_MAX + 1];
struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
int len = nlh->nlmsg_len;
struct rtattr *attrs;
if (nlh->nlmsg_type != GENL_ID_CTRL) {
fprintf(stderr, "Not a controller message, nlmsg_len=%d "
"nlmsg_type=0x%x\n", nlh->nlmsg_len, nlh->nlmsg_type);
goto errout;
}
if (ghdr->cmd != CTRL_CMD_NEWFAMILY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
goto errout;
}
len -= NLMSG_LENGTH(GENL_HDRLEN);
if (len < 0) {
fprintf(stderr, "wrong controller message len %d\n", len);
return -1;
}
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
if (tb[CTRL_ATTR_FAMILY_ID] == NULL) {
fprintf(stderr, "Missing family id TLV\n");
goto errout;
}
ret = *(__u16 *) RTA_DATA(tb[CTRL_ATTR_FAMILY_ID]);
}
errout:
rtnl_close(&rth);
return ret;
}
void print_ctrl_cmd_flags(FILE *fp, __u32 fl)
static void print_ctrl_cmd_flags(FILE *fp, __u32 fl)
{
fprintf(fp, "\n\t\tCapabilities (0x%x):\n ", fl);
if (!fl) {
@ -132,7 +60,7 @@ void print_ctrl_cmd_flags(FILE *fp, __u32 fl)
fprintf(fp, "\n");
}
static int print_ctrl_cmds(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
{
struct rtattr *tb[CTRL_ATTR_OP_MAX + 1];
@ -177,8 +105,8 @@ static int print_ctrl_grp(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
/*
* The controller sends one nlmsg per family
*/
static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
void *arg)
static int print_ctrl(struct rtnl_ctrl_data *ctrl,
struct nlmsghdr *n, void *arg)
{
struct rtattr *tb[CTRL_ATTR_MAX + 1];
struct genlmsghdr *ghdr = NLMSG_DATA(n);
@ -197,7 +125,8 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
ghdr->cmd != CTRL_CMD_DELFAMILY &&
ghdr->cmd != CTRL_CMD_NEWFAMILY &&
ghdr->cmd != CTRL_CMD_NEWMCAST_GRP &&
ghdr->cmd != CTRL_CMD_DELMCAST_GRP) {
ghdr->cmd != CTRL_CMD_DELMCAST_GRP &&
ghdr->cmd != CTRL_CMD_GETPOLICY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
return 0;
}
@ -210,7 +139,7 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
}
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
parse_rtattr_flags(tb, CTRL_ATTR_MAX, attrs, len, NLA_F_NESTED);
if (tb[CTRL_ATTR_FAMILY_NAME]) {
char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]);
@ -233,6 +162,36 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
__u32 *ma = RTA_DATA(tb[CTRL_ATTR_MAXATTR]);
fprintf(fp, " max attribs: %d ",*ma);
}
if (tb[CTRL_ATTR_OP_POLICY]) {
const struct rtattr *pos;
rtattr_for_each_nested(pos, tb[CTRL_ATTR_OP_POLICY]) {
struct rtattr *ptb[CTRL_ATTR_POLICY_DUMP_MAX + 1];
struct rtattr *pattrs = RTA_DATA(pos);
int plen = RTA_PAYLOAD(pos);
parse_rtattr_flags(ptb, CTRL_ATTR_POLICY_DUMP_MAX,
pattrs, plen, NLA_F_NESTED);
fprintf(fp, " op %d policies:",
pos->rta_type & ~NLA_F_NESTED);
if (ptb[CTRL_ATTR_POLICY_DO]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DO]);
fprintf(fp, " do=%d", *v);
}
if (ptb[CTRL_ATTR_POLICY_DUMP]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DUMP]);
fprintf(fp, " dump=%d", *v);
}
}
}
if (tb[CTRL_ATTR_POLICY])
nl_print_policy(tb[CTRL_ATTR_POLICY], fp);
/* end of family definitions .. */
fprintf(fp,"\n");
if (tb[CTRL_ATTR_OPS]) {
@ -281,34 +240,37 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
return 0;
}
static int print_ctrl2(struct nlmsghdr *n, void *arg)
{
return print_ctrl(NULL, n, arg);
}
static int ctrl_list(int cmd, int argc, char **argv)
{
struct rtnl_handle rth;
struct nlmsghdr *nlh;
struct genlmsghdr *ghdr;
int ret = -1;
char d[GENL_NAMSIZ];
struct {
struct nlmsghdr n;
struct genlmsghdr g;
char buf[4096];
} req;
memset(&req, 0, sizeof(req));
nlh = &req.n;
nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
nlh->nlmsg_type = GENL_ID_CTRL;
ghdr = NLMSG_DATA(&req.n);
ghdr->cmd = CTRL_CMD_GETFAMILY;
} req = {
.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN),
.n.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
.n.nlmsg_type = GENL_ID_CTRL,
.g.cmd = CTRL_CMD_GETFAMILY,
};
struct nlmsghdr *nlh = &req.n;
struct nlmsghdr *answer = NULL;
if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
fprintf(stderr, "Cannot open generic netlink socket\n");
exit(1);
}
if (cmd == CTRL_CMD_GETFAMILY) {
if (cmd == CTRL_CMD_GETFAMILY || cmd == CTRL_CMD_GETPOLICY) {
req.g.cmd = cmd;
if (argc != 2) {
fprintf(stderr, "Wrong number of params\n");
return -1;
@ -316,7 +278,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
if (matches(*argv, "name") == 0) {
NEXT_ARG();
strncpy(d, *argv, sizeof (d) - 1);
strlcpy(d, *argv, sizeof(d));
addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME,
d, strlen(d) + 1);
} else if (matches(*argv, "id") == 0) {
@ -333,34 +295,37 @@ static int ctrl_list(int cmd, int argc, char **argv)
fprintf(stderr, "Wrong params\n");
goto ctrl_done;
}
}
if (rtnl_talk(&rth, nlh, 0, 0, nlh, NULL, NULL) < 0) {
if (cmd == CTRL_CMD_GETFAMILY) {
if (rtnl_talk(&rth, nlh, &answer) < 0) {
fprintf(stderr, "Error talking to the kernel\n");
goto ctrl_done;
}
if (print_ctrl(NULL, nlh, (void *) stdout) < 0) {
if (print_ctrl2(answer, (void *) stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
goto ctrl_done;
}
}
if (cmd == CTRL_CMD_UNSPEC) {
if (cmd == CTRL_CMD_UNSPEC || cmd == CTRL_CMD_GETPOLICY) {
nlh->nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
nlh->nlmsg_seq = rth.dump = ++rth.seq;
if (rtnl_send(&rth, (const char *) nlh, nlh->nlmsg_len) < 0) {
if (rtnl_send(&rth, nlh, nlh->nlmsg_len) < 0) {
perror("Failed to send dump request\n");
goto ctrl_done;
}
rtnl_dump_filter(&rth, print_ctrl, stdout, NULL, NULL);
rtnl_dump_filter(&rth, print_ctrl2, stdout);
}
ret = 0;
ctrl_done:
free(answer);
rtnl_close(&rth);
return ret;
}
@ -396,10 +361,12 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0)
return ctrl_list(CTRL_CMD_UNSPEC, argc-1, argv+1);
if (matches(*argv, "policy") == 0)
return ctrl_list(CTRL_CMD_GETPOLICY, argc-1, argv+1);
if (matches(*argv, "help") == 0)
return usage();
fprintf(stderr, "ctrl command \"%s\" is unknown, try \"ctrl -help\".\n",
fprintf(stderr, "ctrl command \"%s\" is unknown, try \"ctrl help\".\n",
*argv);
return -1;
@ -408,5 +375,5 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
struct genl_util ctrl_genl_util = {
.name = "ctrl",
.parse_genlopt = parse_ctrl,
.print_genlopt = print_ctrl,
.print_genlopt = print_ctrl2,
};

View File

@ -13,7 +13,6 @@
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>
#include <dlfcn.h>
#include <sys/socket.h>
@ -23,21 +22,19 @@
#include <errno.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h> /* until we put our own header */
#include "SNAPSHOT.h"
#include "version.h"
#include "utils.h"
#include "genl_utils.h"
int show_stats = 0;
int show_details = 0;
int show_raw = 0;
int resolve_hosts = 0;
int show_stats;
int show_details;
int show_raw;
static void *BODY;
static struct genl_util * genl_list;
static struct genl_util *genl_list;
static int print_nofopt(const struct sockaddr_nl *who, struct nlmsghdr *n,
void *arg)
static int print_nofopt(struct nlmsghdr *n, void *arg)
{
fprintf((FILE *) arg, "unknown genl type ..\n");
return 0;
@ -46,15 +43,16 @@ static int print_nofopt(const struct sockaddr_nl *who, struct nlmsghdr *n,
static int parse_nofopt(struct genl_util *f, int argc, char **argv)
{
if (argc) {
fprintf(stderr, "Unknown genl \"%s\", hence option \"%s\" "
"is unparsable\n", f->name, *argv);
fprintf(stderr,
"Unknown genl \"%s\", hence option \"%s\" is unparsable\n",
f->name, *argv);
return -1;
}
return 0;
}
static struct genl_util *get_genl_kind(char *str)
static struct genl_util *get_genl_kind(const char *str)
{
void *dlh;
char buf[256];
@ -86,9 +84,8 @@ reg:
return f;
noexist:
f = malloc(sizeof(*f));
f = calloc(1, sizeof(*f));
if (f) {
memset(f, 0, sizeof(*f));
strncpy(f->name, str, 15);
f->parse_genlopt = parse_nofopt;
f->print_genlopt = print_nofopt;
@ -101,22 +98,15 @@ static void usage(void) __attribute__((noreturn));
static void usage(void)
{
fprintf(stderr, "Usage: genl [ OPTIONS ] OBJECT | help }\n"
"where OBJECT := { ctrl etc }\n"
" OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }\n");
fprintf(stderr,
"Usage: genl [ OPTIONS ] OBJECT [help] }\n"
"where OBJECT := { ctrl etc }\n"
" OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -V[ersion] | -h[elp] }\n");
exit(-1);
}
int main(int argc, char **argv)
{
char *basename;
basename = strrchr(argv[0], '/');
if (basename == NULL)
basename = argv[0];
else
basename++;
while (argc > 1) {
if (argv[1][0] != '-')
break;
@ -128,34 +118,31 @@ int main(int argc, char **argv)
} else if (matches(argv[1], "-raw") == 0) {
++show_raw;
} else if (matches(argv[1], "-Version") == 0) {
printf("genl utility, iproute2-ss%s\n", SNAPSHOT);
printf("genl utility, iproute2-%s\n", version);
exit(0);
} else if (matches(argv[1], "-help") == 0) {
usage();
} else {
fprintf(stderr, "Option \"%s\" is unknown, try "
"\"genl -help\".\n", argv[1]);
fprintf(stderr,
"Option \"%s\" is unknown, try \"genl -help\".\n",
argv[1]);
exit(-1);
}
argc--; argv++;
}
if (argc > 1) {
struct genl_util *a;
int ret;
struct genl_util *a = NULL;
a = get_genl_kind(argv[1]);
if (NULL == a) {
fprintf(stderr,"bad genl %s\n",argv[1]);
if (!a) {
fprintf(stderr, "bad genl %s\n", argv[1]);
exit(-1);
}
ret = a->parse_genlopt(a, argc-1, argv+1);
return ret;
if (matches(argv[1], "help") == 0)
usage();
fprintf(stderr, "Object \"%s\" is unknown, try \"genl "
"-help\".\n", argv[1]);
exit(-1);
}
usage();

View File

@ -1,17 +1,15 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _TC_UTIL_H_
#define _TC_UTIL_H_ 1
#include <linux/genetlink.h>
#include "utils.h"
#include "linux/genetlink.h"
struct genl_util
{
struct genl_util {
struct genl_util *next;
char name[16];
int (*parse_genlopt)(struct genl_util *fu, int argc, char **argv);
int (*print_genlopt)(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
int (*print_genlopt)(struct nlmsghdr *n, void *arg);
};
extern int genl_ctrl_resolve_family(const char *family);
#endif

15
genl/static-syms.c Normal file
View File

@ -0,0 +1,15 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* This file creates a dummy version of dynamic loading
* for environments where dynamic linking
* is not used or available.
*/
#include <string.h>
#include "dlfcn.h"
void *_dlsym(const char *sym)
{
#include "static-syms.h"
return NULL;
}

View File

@ -1 +0,0 @@
static const char SNAPSHOT[] = "090324";

275
include/bpf_api.h Normal file
View File

@ -0,0 +1,275 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_API__
#define __BPF_API__
/* Note:
*
* This file can be included into eBPF kernel programs. It contains
* a couple of useful helper functions, map/section ABI (bpf_elf.h),
* misc macros and some eBPF specific LLVM built-ins.
*/
#include <stdint.h>
#include <linux/pkt_cls.h>
#include <linux/bpf.h>
#include <linux/filter.h>
#include <asm/byteorder.h>
#include "bpf_elf.h"
/** libbpf pin type. */
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
LIBBPF_PIN_BY_NAME,
};
/** Type helper macros. */
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
/** Misc macros. */
#ifndef __stringify
# define __stringify(X) #X
#endif
#ifndef __maybe_unused
# define __maybe_unused __attribute__((__unused__))
#endif
#ifndef offsetof
# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
#endif
#ifndef likely
# define likely(X) __builtin_expect(!!(X), 1)
#endif
#ifndef unlikely
# define unlikely(X) __builtin_expect(!!(X), 0)
#endif
#ifndef htons
# define htons(X) __constant_htons((X))
#endif
#ifndef ntohs
# define ntohs(X) __constant_ntohs((X))
#endif
#ifndef htonl
# define htonl(X) __constant_htonl((X))
#endif
#ifndef ntohl
# define ntohl(X) __constant_ntohl((X))
#endif
#ifndef __inline__
# define __inline__ __attribute__((always_inline))
#endif
/** Section helper macros. */
#ifndef __section
# define __section(NAME) \
__attribute__((section(NAME), used))
#endif
#ifndef __section_tail
# define __section_tail(ID, KEY) \
__section(__stringify(ID) "/" __stringify(KEY))
#endif
#ifndef __section_xdp_entry
# define __section_xdp_entry \
__section(ELF_SECTION_PROG)
#endif
#ifndef __section_cls_entry
# define __section_cls_entry \
__section(ELF_SECTION_CLASSIFIER)
#endif
#ifndef __section_act_entry
# define __section_act_entry \
__section(ELF_SECTION_ACTION)
#endif
#ifndef __section_lwt_entry
# define __section_lwt_entry \
__section(ELF_SECTION_PROG)
#endif
#ifndef __section_license
# define __section_license \
__section(ELF_SECTION_LICENSE)
#endif
#ifndef __section_maps
# define __section_maps \
__section(ELF_SECTION_MAPS)
#endif
/** Declaration helper macros. */
#ifndef BPF_LICENSE
# define BPF_LICENSE(NAME) \
char ____license[] __section_license = NAME
#endif
/** Classifier helper */
#ifndef BPF_H_DEFAULT
# define BPF_H_DEFAULT -1
#endif
/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
#ifndef __BPF_FUNC
# define __BPF_FUNC(NAME, ...) \
(* NAME)(__VA_ARGS__) __maybe_unused
#endif
#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...) \
__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
#endif
/* Map access/manipulation */
static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
static int BPF_FUNC(map_update_elem, void *map, const void *key,
const void *value, uint32_t flags);
static int BPF_FUNC(map_delete_elem, void *map, const void *key);
/* Time access */
static uint64_t BPF_FUNC(ktime_get_ns);
/* Debugging */
/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
* llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
* It would require ____fmt to be made const, which generates a reloc
* entry (non-map).
*/
static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
#ifndef printt
# define printt(fmt, ...) \
({ \
char ____fmt[] = fmt; \
trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
})
#endif
/* Random numbers */
static uint32_t BPF_FUNC(get_prandom_u32);
/* Tail calls */
static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
uint32_t index);
/* System helpers */
static uint32_t BPF_FUNC(get_smp_processor_id);
static uint32_t BPF_FUNC(get_numa_node_id);
/* Packet misc meta data */
static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
/* Packet redirection */
static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
uint32_t flags);
/* Packet manipulation */
static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
void *to, uint32_t len);
static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
const void *from, uint32_t len, uint32_t flags);
static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
uint32_t from, uint32_t to, uint32_t flags);
static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
uint32_t from, uint32_t to, uint32_t flags);
static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
const void *to, uint32_t to_size, uint32_t seed);
static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
uint32_t flags);
static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
uint32_t flags);
static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
/* Event notification */
static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
uint64_t index, const void *data, uint32_t size) =
(void *) BPF_FUNC_perf_event_output;
/* Packet vlan encap/decap */
static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
uint16_t vlan_tci);
static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
/* Packet tunnel encap/decap */
static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
const struct bpf_tunnel_key *from, uint32_t size,
uint32_t flags);
static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
void *to, uint32_t size);
static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
const void *from, uint32_t size);
/** LLVM built-ins, mem*() routines work for constant size */
#ifndef lock_xadd
# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
#endif
#ifndef memset
# define memset(s, c, n) __builtin_memset((s), (c), (n))
#endif
#ifndef memcpy
# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
#endif
#ifndef memmove
# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
#endif
/* FIXME: __builtin_memcmp() is not yet fully useable unless llvm bug
* https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
* this one would generate a reloc entry (non-map), otherwise.
*/
#if 0
#ifndef memcmp
# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
#endif
#endif
unsigned long long load_byte(void *skb, unsigned long long off)
asm ("llvm.bpf.load.byte");
unsigned long long load_half(void *skb, unsigned long long off)
asm ("llvm.bpf.load.half");
unsigned long long load_word(void *skb, unsigned long long off)
asm ("llvm.bpf.load.word");
#endif /* __BPF_API__ */

53
include/bpf_elf.h Normal file
View File

@ -0,0 +1,53 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_ELF__
#define __BPF_ELF__
#include <asm/types.h>
/* Note:
*
* Below ELF section names and bpf_elf_map structure definition
* are not (!) kernel ABI. It's rather a "contract" between the
* application and the BPF loader in tc. For compatibility, the
* section names should stay as-is. Introduction of aliases, if
* needed, are a possibility, though.
*/
/* ELF section names, etc */
#define ELF_SECTION_LICENSE "license"
#define ELF_SECTION_MAPS "maps"
#define ELF_SECTION_PROG "prog"
#define ELF_SECTION_CLASSIFIER "classifier"
#define ELF_SECTION_ACTION "action"
#define ELF_MAX_MAPS 64
#define ELF_MAX_LICENSE_LEN 128
/* Object pinning settings */
#define PIN_NONE 0
#define PIN_OBJECT_NS 1
#define PIN_GLOBAL_NS 2
/* ELF map definition */
struct bpf_elf_map {
__u32 type;
__u32 size_key;
__u32 size_value;
__u32 max_elem;
__u32 flags;
__u32 id;
__u32 pinning;
__u32 inner_id;
__u32 inner_idx;
};
#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
struct ____btf_map_##name { \
type_key key; \
type_val value; \
}; \
struct ____btf_map_##name \
__attribute__ ((section(".maps." #name), used)) \
____btf_map_##name = { }
#endif /* __BPF_ELF__ */

77
include/bpf_scm.h Normal file
View File

@ -0,0 +1,77 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __BPF_SCM__
#define __BPF_SCM__
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include "utils.h"
#include "bpf_elf.h"
#define BPF_SCM_AUX_VER 1
#define BPF_SCM_MAX_FDS ELF_MAX_MAPS
#define BPF_SCM_MSG_SIZE 1024
struct bpf_elf_st {
dev_t st_dev;
ino_t st_ino;
};
struct bpf_map_aux {
unsigned short uds_ver;
unsigned short num_ent;
char obj_name[64];
struct bpf_elf_st obj_st;
struct bpf_elf_map ent[BPF_SCM_MAX_FDS];
};
struct bpf_map_set_msg {
struct msghdr hdr;
struct iovec iov;
char msg_buf[BPF_SCM_MSG_SIZE];
struct bpf_map_aux aux;
};
static inline int *bpf_map_set_init(struct bpf_map_set_msg *msg,
struct sockaddr_un *addr,
unsigned int addr_len)
{
const unsigned int cmsg_ctl_len = sizeof(int) * BPF_SCM_MAX_FDS;
struct cmsghdr *cmsg;
msg->iov.iov_base = &msg->aux;
msg->iov.iov_len = sizeof(msg->aux);
msg->hdr.msg_iov = &msg->iov;
msg->hdr.msg_iovlen = 1;
msg->hdr.msg_name = (struct sockaddr *)addr;
msg->hdr.msg_namelen = addr_len;
BUILD_BUG_ON(sizeof(msg->msg_buf) < cmsg_ctl_len);
msg->hdr.msg_control = &msg->msg_buf;
msg->hdr.msg_controllen = cmsg_ctl_len;
cmsg = CMSG_FIRSTHDR(&msg->hdr);
cmsg->cmsg_len = msg->hdr.msg_controllen;
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
return (int *)CMSG_DATA(cmsg);
}
static inline void bpf_map_set_init_single(struct bpf_map_set_msg *msg,
int num)
{
struct cmsghdr *cmsg;
msg->hdr.msg_controllen = CMSG_LEN(sizeof(int) * num);
msg->iov.iov_len = offsetof(struct bpf_map_aux, ent) +
sizeof(struct bpf_elf_map) * num;
cmsg = CMSG_FIRSTHDR(&msg->hdr);
cmsg->cmsg_len = msg->hdr.msg_controllen;
}
#endif /* __BPF_SCM__ */

327
include/bpf_util.h Normal file
View File

@ -0,0 +1,327 @@
/*
* bpf_util.h BPF common code
*
* This program is free software; you can distribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
* Authors: Daniel Borkmann <daniel@iogearbox.net>
* Jiri Pirko <jiri@resnulli.us>
*/
#ifndef __BPF_UTIL__
#define __BPF_UTIL__
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/filter.h>
#include <linux/magic.h>
#include <linux/elf-em.h>
#include <linux/if_alg.h>
#include "utils.h"
#include "bpf_scm.h"
#define BPF_ENV_UDS "TC_BPF_UDS"
#define BPF_ENV_MNT "TC_BPF_MNT"
#ifndef BPF_MAX_LOG
# define BPF_MAX_LOG 4096
#endif
#define BPF_DIR_GLOBALS "globals"
#ifndef BPF_FS_MAGIC
# define BPF_FS_MAGIC 0xcafe4a11
#endif
#define BPF_DIR_MNT "/sys/fs/bpf"
#ifndef TRACEFS_MAGIC
# define TRACEFS_MAGIC 0x74726163
#endif
#define TRACE_DIR_MNT "/sys/kernel/tracing"
#ifndef AF_ALG
# define AF_ALG 38
#endif
#ifndef EM_BPF
# define EM_BPF 247
#endif
struct bpf_cfg_ops {
void (*cbpf_cb)(void *nl, const struct sock_filter *ops, int ops_len);
void (*ebpf_cb)(void *nl, int fd, const char *annotation);
};
enum bpf_mode {
CBPF_BYTECODE,
CBPF_FILE,
EBPF_OBJECT,
EBPF_PINNED,
BPF_MODE_MAX,
};
struct bpf_cfg_in {
const char *object;
const char *section;
const char *uds;
enum bpf_prog_type type;
enum bpf_mode mode;
__u32 ifindex;
bool verbose;
int argc;
char **argv;
struct sock_filter opcodes[BPF_MAXINSNS];
union {
int n_opcodes;
int prog_fd;
};
};
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
#define BPF_ALU64_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
#define BPF_ALU32_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
#define BPF_ALU64_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
#define BPF_ALU32_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* Short form of mov, dst_reg = src_reg */
#define BPF_MOV64_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
#define BPF_MOV32_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* Short form of mov, dst_reg = imm32 */
#define BPF_MOV64_IMM(DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
#define BPF_MOV32_IMM(DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
#define BPF_LD_IMM64(DST, IMM) \
BPF_LD_IMM64_RAW(DST, 0, IMM)
#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
((struct bpf_insn) { \
.code = BPF_LD | BPF_DW | BPF_IMM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = (__u32) (IMM) }), \
((struct bpf_insn) { \
.code = 0, /* zero is reserved opcode */ \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
#ifndef BPF_PSEUDO_MAP_FD
# define BPF_PSEUDO_MAP_FD 1
#endif
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
#define BPF_LD_MAP_FD(DST, MAP_FD) \
BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
#define BPF_LD_ABS(SIZE, IMM) \
((struct bpf_insn) { \
.code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
((struct bpf_insn) { \
.code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
#define BPF_JMP_REG(OP, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Raw code statement block */
#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
((struct bpf_insn) { \
.code = CODE, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = IMM })
/* Program exit */
#define BPF_EXIT_INSN() \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_EXIT, \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = 0 })
int bpf_parse_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops);
int bpf_load_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops,
void *nl);
int bpf_parse_and_load_common(struct bpf_cfg_in *cfg,
const struct bpf_cfg_ops *ops, void *nl);
const char *bpf_prog_to_default_section(enum bpf_prog_type type);
int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
int bpf_trace_pipe(void);
void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, __u32 ifindex,
char *log, size_t size_log);
int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log,
size_t size_log);
int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_dump_prog_info(FILE *f, uint32_t id);
#ifdef HAVE_ELF
int bpf_send_map_fds(const char *path, const char *obj);
int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
unsigned int entries);
#ifdef HAVE_LIBBPF
int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
int iproute2_bpf_fetch_ancillary(void);
int iproute2_get_root_path(char *root_path, size_t len);
bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map *imap,
struct bpf_elf_map *omap, char *omap_name);
int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
#endif /* HAVE_LIBBPF */
#else
static inline int bpf_send_map_fds(const char *path, const char *obj)
{
return 0;
}
static inline int bpf_recv_map_fds(const char *path, int *fds,
struct bpf_map_aux *aux,
unsigned int entries)
{
return -1;
}
#ifdef HAVE_LIBBPF
static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
{
fprintf(stderr, "No ELF library support compiled in.\n");
return -1;
}
#endif /* HAVE_LIBBPF */
#endif /* HAVE_ELF */
const char *get_libbpf_version(void);
#endif /* __BPF_UTIL__ */

6
include/cg_map.h Normal file
View File

@ -0,0 +1,6 @@
#ifndef __CG_MAP_H__
#define __CG_MAP_H__
const char *cg_id_to_path(__u64 id);
#endif /* __CG_MAP_H__ */

Some files were not shown because too many files have changed in this diff Show More