Commit Graph

2633 Commits

Author SHA1 Message Date
Stephen Hemminger bbe2abdf3d vv4.6.0 2016-05-18 11:56:02 -07:00
David Ahern b0a4ce620e ip link: Add support for kernel side filtering
Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-18 11:52:14 -07:00
Jamal Hadi Salim fdf1bdd0f1 tc simple action update and breakage
Brings it closer to more serious actions (adding branching
and allowing for late binding)

Unfortunately this breaks old syntax of the simple action.
But because simple is a pedagogical example unlikely to be used
in production environments (i.e its role is to serve as an example
on how to write actions), then this is ok.

New syntax for simple has new keyword "sdata". Example usage is:

sudo tc actions add action simple sdata "foobar" index 1
or
tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple sdata "foobar"

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:15:12 -07:00
Jamal Hadi Salim 43726b750a tc: don't ignore ok as an action branch
This is what used to happen before:

tc filter add dev tap1 parent ffff: protocol 0xfefe prio 10 \
     u32 match u32 0 0 flowid 1:16 \
     action ife decode allow mark ok

tc -s filter ls dev tap1 parent ffff:
filter protocol [65278] pref 10 u32
filter protocol [65278] pref 10 u32 fh 800: ht divisor 1
filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800
bkt 0 flowid 1:16
  match 00000000/00000000 at 0
        action order 1: ife decode action pipe
         index 2 ref 1 bind 1 installed 4 sec used 4 sec
         type: 0x0
         Metadata: allow mark
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 1 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Note the extra action added at the end..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:58 -07:00
Jamal Hadi Salim d3e511223f tc: introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
    Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:26 -07:00
Stephen Hemminger 29b7968926 add tc_ife.h 2016-05-16 11:13:05 -07:00
Stephen Hemminger c13363e4cd devlink: remove more unused code 2016-05-13 14:48:32 -07:00
subashab@codeaurora.org b38e740903 ss: Remove unused argument from kill_inet_sock
addr is not used here.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2016-05-13 14:47:28 -07:00
Stephen Hemminger 719c443bb8 devlink: remove unused code
Unused code causes warnings, removed.
2016-05-13 14:42:06 -07:00
Stephen Hemminger 8a781d7e25 update kernel headers to 4.6-rc6
Close to final upstream headers
2016-05-13 14:41:45 -07:00
Stephen Hemminger 7aca60c0eb Revert "devlink: implement shared buffer support"
This reverts commit b56700bf8a.
2016-05-13 14:38:47 -07:00
Stephen Hemminger c3d25ec392 Revert "devlink: implement shared buffer occupancy control"
This reverts commit a60ebcb6f3.
2016-05-13 14:38:38 -07:00
Edward Cree 2642b6b03e geneve: fix IPv6 remote address reporting
Since we can only configure unicast, we probably want to be able to
display unicast, rather than multicast.

Fixes: 906ac5437a ("geneve: add support for IPv6 link partners")
Signed-off-by: Edward Cree <ecree@solarflare.com>
2016-05-13 14:31:55 -07:00
Jiri Benc 7c337e2c20 ip link gre: print only relevant info in external mode
Display only attributes that are relevant when a GRE interface is in
'external' mode instead of the default values (which are ignored by the
kernel even if passed back).

Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-05-06 11:49:08 -07:00
Jiri Benc df217d5d5c ip link gre: create interfaces in external mode correctly
For GRE interfaces in 'external' mode, the kernel ignores all manual
settings like remote IP address or TTL. However, for some of those
attributes, kernel checks their value and does not allow them to be zero
(even though they're ignored later).

Currently, 'ip link' always includes all attributes in the netlink message.
This leads to problem with creating interfaces in 'external' mode. For
example, this command does not work:

ip link add gre1 type gretap external

and needs a bogus remote IP address to be specified, as the kernel enforces
remote IP address to be either not present, or not null.

Ignore the parameters that do not make sense in 'external' mode.
Unfortunately, we cannot error out, as there may be existing deployments
that workarounded the bug by specifying bogus values.

Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
2016-05-06 11:49:08 -07:00
Quentin Monnet 27d44f3a8a tc: add bash-completion function
Add function for command completion for tc in bash, and update Makefile
to install it under /usr/share/bash-completion/completions/.

Inside iproute2 repository, the completion code is in a new
`bash-completion` toplevel directory.

v2: Remove `if` statement in Makefile: do not try to install in
    /etc/bash_completion.d/ if /usr/share/bash-completion/completions/
    is not found; instead, the user can override the installation path
    with the specific environment variable.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
2016-05-03 09:53:26 -07:00
Jiri Pirko 4bf138d6d2 devlink: add manpage for shared buffer
Manpage for devlink "sb" object.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko a60ebcb6f3 devlink: implement shared buffer occupancy control
Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko b56700bf8a devlink: implement shared buffer support
Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 2f85a9c535 devlink: allow to parse both devlink and port handle in the same time
For filtering purposes, it makes sense for used to either specify
devlink handle of port handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 707a91c549 devlink: introduce dump filtering function
This function is to be used from dump callbacks to decide if the output
currect output should be filtered off or not. Filtering is based on
previously parsed and stored command line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 6563a6eb53 devlink: split dl_argv_parse_put to parse and put parts
It is handy to have parsed cmdline data stored so they can be used for
dumps filtering. So split original dl_argv_parse_put into parse and put
parts.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 43f35be4eb devlink: introduce helper to print out nice names (ifnames)
By default, ifnames will be printed out. User can turn that off using
"-n" option on the command line.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko 68cab0ba76 devlink: introduce pr_out_port_handle helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko ebaf76b55e list: add list_add_tail helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko f1239ca1f9 list: add list_for_each_entry_reverse macro
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:05 -07:00
Jiri Pirko ec7513faa9 devlink: fix "devlink port" help message
"dl" -> "devlink"

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-04-19 08:01:04 -07:00
Eric Dumazet d9ba887e9d ss: take care of unknown min_rtt
Kernel sets info->tcpi_min_rtt to ~0U when no RTT sample was ever
taken for the session, thus min_rtt is unknown.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-04-19 07:56:54 -07:00
Phil Sutter e56a959e55 ss: Fix accidental state filter override
Passing a filter expression and selecting an address family using the
'-f' flag would overwrite the state filter by accident. Therefore
calling e.g. 'ss -nl -f inet '(sport = :22)' would not only print
listening sockets (as requested by '-l' flag) but connected ones, as
well.

Fix this by reusing the formerly ineffective call to filter_states_set()
to restore the state filter as it was before the call to
filter_af_set().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-19 07:56:53 -07:00
Phil Sutter 9d320e1e92 ss: Drop silly assignment
An expression of the form '(a | b) & b' will evaluate to the value of b
for any value of a or b.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-19 07:56:53 -07:00
Jeff Harris 1c346dccf8 ip: neigh: Fix leftover attributes message during flush
Use the same rtnl_dump_request_n call as the show.  The rtnl_wilddump_request
assumes the type uses an ifinfomsg which is not the case for the neighbor
table.

Signed-off-by: Jeff Harris <jefftharris@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-04-19 07:52:33 -07:00
Stephen Hemminger bbac6c6301 ip: whitespace cleanup
Fix whitespace
2016-04-11 22:13:55 +00:00
Phil Sutter fe9322781e ip-link: Support printing VF trust setting
This adds a new item to VF lines of a PF, stating whether the VF is
trusted or not.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-04-11 22:11:33 +00:00
Gustavo Zacarias 5c5a0f3df9 iproute2: tc_bpf.c: fix building with musl libc
We need limits.h for PATH_MAX, fixes:

tc_bpf.c: In function ‘bpf_map_selfcheck_pinned’:
tc_bpf.c:222:12: error: ‘PATH_MAX’ undeclared (first use in this
function)
  char file[PATH_MAX], buff[4096];

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 22:09:57 +00:00
Stephen Hemminger 11522e7d02 ip: only display phys attributes with details option
Since output of ip commands are already cluttered, move the physical port details
under a show_details option.
2016-04-11 22:07:51 +00:00
Nicolas Dichtel df590401d6 iplink: display IFLA_PHYS_PORT_NAME
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2016-04-11 22:02:21 +00:00
Daniel Borkmann 4dd3f50af4 tc, bpf: add support for map pre/allocation
Follow-up to kernel commit 6c9059817432 ("bpf: pre-allocate hash map
elements"). Add flags support, so that we can pass in BPF_F_NO_PREALLOC
flag for disallowing preallocation. Update examples accordingly and also
remove the BPF_* map helper macros from them as they were not very useful.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:54:47 +00:00
Daniel Borkmann afc1a2000b tc, bpf: further improve error reporting
Make it easier to spot issues when loading the object file fails. This
includes reporting in what pinned object specs differ, better indication
when we've reached instruction limits. Don't retry to load a non relo
program once we failed with bpf(2), and report out of bounds tail call key.

Also, add truncation of huge log outputs by default. Sometimes errors are
quite easy to spot by only looking at the tail of the verifier log, but
logs can get huge in size e.g. up to few MB (due to verifier checking all
possible program paths). Thus, by default limit output to the last 4096
bytes and indicate that it's truncated. For the full log, the verbose option
can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:53:58 +00:00
Daniel Borkmann 0395711c52 tc, bpf: add new csum and tunnel signatures
Add new signatures for BPF_FUNC_csum_diff, BPF_FUNC_skb_get_tunnel_opt
and BPF_FUNC_skb_set_tunnel_opt.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:53:58 +00:00
Nikolay Aleksandrov 5a2d0201cc bridge: vlan: add support to filter by vlan id
Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port	vlan ids
eth2	 450

$ bridge -c vlan show vid 450
port	vlan ids
eth2	 400-500

$ bridge vlan show vid 1
port	vlan ids
eth1	 1 PVID Egress Untagged
eth2	 1 PVID
br0	 1 PVID Egress Untagged

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Nikolay Aleksandrov 24687d678f bridge: mdb: add support to filter by vlan id
Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Nikolay Aleksandrov ae6eb9075f bridge: fdb: add support to filter by vlan id
Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2016-04-11 21:52:47 +00:00
Eric Dumazet f1c656e5c0 iplink: display number of rx/tx queues
We can set the attributes, so would be nice to display them when
provided by the kernel.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-04-11 21:51:28 +00:00
Stephen Hemminger 6268b08c13 update kernel headers
Update from 4.6-rc3
2016-04-11 13:40:40 -07:00
Stephen Hemminger 3273e3c132 devlink: ignore build result
devlink binary is built
2016-04-11 13:35:12 -07:00
Daniel Borkmann 29bb2373a8 geneve: add support to set flow label
Follow-up for kernel commit 8eb3b99554b8 ("geneve: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-03-27 10:58:48 -07:00
Daniel Borkmann f8eb79a624 vxlan: add support to set flow label
Follow-up for kernel commit e7f70af111f0 ("vxlan: support setting
IPv6 flow label") to allow setting the label for the device config.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-03-27 10:58:48 -07:00
Jiri Pirko a3c4b484a1 add devlink tool
Add new tool called devlink which is userspace counterpart of devlink
Netlink socket.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-27 10:57:15 -07:00
Jiri Pirko 4952b45946 include: add linked list implementation from kernel
Rename hlist.h to list.h while adding it to be aligned with kernel

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-27 10:56:11 -07:00
Jesse Gross 325d02b44c geneve: Add support for configuring UDP checksums.
Enable support for configuring outer UDP checksums on Geneve tunnels:

ip link add type geneve id 10 remote 10.0.0.2 udpcsum

Signed-off-by: Jesse Gross <jesse@kernel.org>
2016-03-27 10:53:25 -07:00