Since commit 625df645b7 (Check user supplied interface name lengths)
iplink_parse() validates network device name using check_ifname()
helpers.
Remove redundant "name" length checks from iplink_parse() callers.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Do not stop parameters processing after "alias" parameter: it might
not be a last one. Seems copy pasted from "type" parameter code.
Check it's length does not exceed IFALIASZ - 1. Better we warn
than get RTNL error.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Correctly check for valid network device index supplied on
command line: indexes are always greather than zero. Check
for duplicate "index" argument.
Initialize @index to 0 to simplify handling it in iplink_modify().
Other callers (link_veth.c, iplink_vxcan.c) already did so.
No need to initialize ifi_index with 0 since it is already
initialized at the @struct req initialization time and not
modified in iplink_parse().
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Manual page ip-link(8) states that both local and remote accept
IPADDR not PREFIX. Use get_addr() instead of get_prefix() to
parse local/remote endpoint address correctly.
Force corresponding address family instead of using preferred_family
to catch weired cases as shown below.
Before this patch it is possible to create tunnel with commands:
ip li add dev ip6gre2 type ip6gre local fe80::1/64 remote fe80::2/64
ip -4 li add dev ip6gre2 type ip6gre local 10.0.0.1/24 remote 10.0.0.2/24
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
It is fully legal to submit zero (INADDR_ANY/IN6ADDR_ANY_INIT)
value for local and/or remote endpoints for all tunnel drivers:
no need additionally check this in userspace.
Note that all tunnel specific code already can pass zero address
to the kernel.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
The patch adds 'external' option to support collect metadata
gre6 tunnel. The 'external' keyword is already used to set the
device into collect metadata mode such as vxlan, geneve, ipip,
etc. This patch extends support for ipv6 gre and gretap.
Example of L3 and L2 gre device:
bash:~# ip link add dev ip6gre123 type ip6gre external
bash:~# ip link add dev ip6gretap123 type ip6gretap external
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Until kernel exports these, add GSO_MAX values into iplink
rather than assuming they are UINT_MAX + 1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Validate the upper limit for gso_max_size, valid range is [0-65,536]
inclusive. Fix minor whitespace in iplink man page.
Signed-off-by: Solio Sarabia <solio.sarabia@intel.com>
Add missing tag 'vxcan' inside the help text which was missing in commit
efe459c76d ('ip: link add vxcan support').
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Recently `external` support was added to the tunnel drivers, but there is no way
to introspect this from userspace. This adds support for that.
Now `ip -details link` shows it:
```
7: tunl60@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group
default qlen 1
link/tunnel6 :: brd :: promiscuity 0
ip6tnl external any remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000) addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```
Signed-off-by: Phil Dibowitz <phil@ipom.com>
This allows sending GSO maximum values when configuring a device.
The values are advisory. Most devices will ignore them but for some
pseudo devices such as veth pairs they can be set.
Example:
# ip link add dev vm1 type veth peer name vm2 gso_max_size 32768
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Specifying the IFLA_VXLAN_LINK attribute on a vxlan link modify is
optional in the kernel, so make the id argument optional for "ip link
set ..." to avoid a user needing to specify it when changing another
attribute.
Signed-off-by: Robert Shearman <rs823p@att.com>
Specifying "... ttl inherit" currently does nothing on a GRE link
modify since the previous ttl value is retrieved up front. Fix this by
explicitly setting ttl to 0 when "inherit" is specified for the
option, since 0 represents the semantics of inherit.
Signed-off-by: Robert Shearman <rs823p@att.com>
Looks like a typo: get_u8() returns 0 on success and -1 on error, so the
error checking here was ineffective.
Fixes: a11b7b71a6 ("link_gre6: really support encaplimit option")
Signed-off-by: Phil Sutter <phil@nwl.cc>
When xdpoffload option is used, communicate the ifindex down
to the kernel to trigger device-specific load.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse_common() parses and loads the program. Rename it
accordingly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Program type is needed both for parsing and loading of
the program. Parsing may also induce the type based on
signatures from __bpf_prog_meta. Instead of passing
the type around keep it in struct bpf_cfg_in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.
Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.
Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
Expose identifier type and hook types in ILA configuraiton
and reporting. This adds support in both ip ila ILA LWT.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Configuration support in both ip ila and ip LWT for checksum
neutral-map-auto. This is a mode of ILA where checksum
neutral mapping is assumed for packets (there is no C-bit
in the identifier to indicate checksum neutral).
Signed-off-by: Tom Herbert <tom@quantonium.net>
Add checksum neutral to ip ila configuration. This control whether
the C-bit is interpreted as checksum neutral bit.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Sample output:
$ sudo ./ip/ip fou add port 111 ipproto 11
$ sudo ./ip/ip fou add port 222 ipproto 22 -6
$ ./ip/ip fou show
port 222 ipproto 22 -6
port 111 ipproto 11
Signed-off-by: Greg Greenway <ggreenway@apple.com>
As was reported [1], the iproute2 fails to compile on old systems,
in Cong's case, it was Fedora 19, in our case it was RedHat 7.2, which
failed with the following errors during compilation:
ipxfrm.c: In function ‘xfrm_selector_print’:
ipxfrm.c:479:7: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
case IPPROTO_MH:
^
ipxfrm.c:479:7: note: each undeclared identifier is reported only once
for each function it appears in
ipxfrm.c: In function ‘xfrm_selector_upspec_parse’:
ipxfrm.c:1345:8: error: ‘IPPROTO_MH’ undeclared (first use in this
function)
case IPPROTO_MH:
^ make[1]: *** [ipxfrm.o] Error 1
The reason to it is the order of headers files. The IPPROTO_MH field is
set in kernel's UAPI header file (in6.h), but only in case
__UAPI_DEF_IPPROTO_V6 is set before. That define comes from other kernel's
header file (libc-compat.h) and is set in case there are no previous
libc relevant declarations.
In ip code, the include of <netdb.h> causes to indirect inclusion of
<netinet/in.h> and it sets __UAPI_DEF_IPPROTO_V6 to be zero and prevents from
IPPROTO_MH declaration.
This patch takes the simplest possible approach to fix the compilation
error by checking if IPPROTO_MH was defined before and in case it
wasn't, it defines it to be the same as in the kernel.
[1] https://www.spinics.net/lists/netdev/msg463980.html
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Riad Abo Raed <riada@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
IPvlan supported bridge-only functionality prior to commits
a190d04db937 ('ipvlan: introduce 'private' attribute for all
existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode').
These two commits allow to configure the VEPA and private modes now.
This patch adds those options in ip command.
e.g.
bash:~# ip link add link eth0 name ipvl0 type ipvlan mode l2 private
-or-
bash:~# ip link add link eth0 type ipvl0 type ipvlan mode l2 vepa
Also the output will reflect the mode and the mode-flag accordingly.
e.g.
bash:~# ip -details link show ipvl0
4: ipvl0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc ...
link/ether 00:1a:11:44:a5:3e brd ff:ff:ff:ff:ff:ff promiscuity 0
ipvlan mode l2 private addrgenmode eui64 numtxqueues 1 ...
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
This patch adds fastopen_no_cookie option to enable/disable TCP fastopen
without a cookie on a per-route basis.
Support in Linux was added with 71c02379c762 (tcp: Configure TFO without
cookie per socket and/or per route).
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Use strtol-based API to parse and validate integer input; atoi() does
not detect errors and may yield undefined behaviour if result can't be
represented.
v2: use get_unsigned() since network namespace is really an unsigned value.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
IP6_TNL_F_ALLOW_LOCAL_REMOTE allows tunnel traffic on ip6tnl devices
where the remote endpoint is a local host address.
Specifying "[no]allow-localremote" controls the
IP6_TNL_F_ALLOW_LOCAL_REMOTE flag on ip6tnl interfaces.
This is the user-space counterpart for kernel
commit 908d140a87a7 ("ip6_tunnel: Allow rcv/xmit even if remote address is a local address")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
This config maps to IFLA_BRPORT_VLAN_TUNNEL bridge port netlink
flag attribute. This flag enables vlan to tunnel mapping on a bridge
port. It is off by default.
set vlan_tunnel attribute on bridge port vxlan0:
$ip link set dev vxlan0 type bridge_slave vlan_tunnel on
$ip link set dev vxlan0 type bridge_slave vlan_tunnel off
or via bridge command
$bridge link set dev vxlan0 vlan_tunnel on
$bridge link set dev vxlan0 vlan_tunnel off
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.
With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.
With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.
We need to free answer after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Add neigh_suppress to the type help and document it in ip-link's man page.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 530903dd90 ("ip: fix igmp parsing when iface is long") uses
variable len to keep trailing colon from interface name comparison. This
variable is local to loop body but we set it in one pass and use it in
following one(s) so that we are actually using (pseudo)random length for
comparison. This became apparent since commit b48a1161f5 ("ipmaddr: Avoid
accessing uninitialized data") always initializes len to zero so that the
name comparison is always true. As a result, "ip maddr show dev eth0" shows
IPv4 multicast addresses for all interfaces.
Instead of keeping the length, let's simply replace the trailing colon with
a null byte. The bonus is that we get correct interface name in ma.name.
Fixes: 530903dd90 ("ip: fix igmp parsing when iface is long")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Petr Vorel <pvorel@suse.cz>
This patch adds the iproute2 support for getting and setting the
per-port group_fwd_mask. It also tries to resolve the value into a more
human friendly format by printing the known protocols instead of only
the raw value.
The man page is also updated with the new option.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
neigh suppression can be used to suppress arp and nd flood
to bridge ports. It maps to the recently added
kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Since kernel net-next commit c7c0bbeae950 ("net: ipmr: Add MFC offload
indication") the kernel indicates on an MFC entry whether it was offloaded
using the RTNH_F_OFFLOAD flag. Update the "ip mroute show" command to
indicate when a route is offloaded, similarly to the "ip route show"
command.
Example output:
$ ip mroute
(0.0.0.0, 239.255.0.1) Iif: sw1p7 Oifs: t_br0 State: resolved offload
(192.168.1.1, 239.255.0.1) Iif: sw1p7 Oifs: sw1p4 State: resolved offload
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Signed-off-by: Phil Sutter <phil@nwl.cc>
In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq. Avoid this by just caching the argv
pointer value until the later lookup/strcpy.
Signed-off-by: Phil Sutter <phil@nwl.cc>
When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.
We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
This fixes a corner-case for routes with a certain metric locked to
zero:
| ip route add 192.168.7.0/24 dev eth0 window 0
| ip route add 192.168.7.0/24 dev eth0 window lock 0
Since the kernel doesn't dump the attribute if it is zero, both routes
added above would appear as if they were equal although they are not.
Fix this by taking mxlock value for the given metric into account before
skipping it if it is not present.
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.
The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
How to reproduce:
$ ip -t mon label link
(gdb) bt
.#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
.#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Move the json printer which is based on json writer into the
iproute2 library, so it can be used by library code and tools
other than ip. Should probably have been done from the beginning
like that given json writer is in the library already anyway.
No functional changes.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Julien Fortin <julien@cumulusnetworks.com>
Obviously, 'addr showdump' feature wasn't adjusted to json output
support. As a consequence, calls to print_string() in print_addrinfo()
tried to dereference a NULL FILE pointer.
Fixes: d0e720111a ("ip: ipaddress.c: add support for json output")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.
Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit 72b365e8e0 ("libnetlink: Double the dump buffer size") increased
the buffer size for "ip link show" command to 32 KB to handle NICs with
large number of VFs. With "dev" filter, a different code path is taken and
iplink_get() still uses only 16 KB buffer.
The size of 32768 is not very future-proof as NICs supporting 120-128 VFs
are already in use so that single RTM_NEWLINK message in the dump can
exceed 30000 bytes. But it's what rtnl_talk() and rtnl_dump_filter_l() use
so let's be consistent. Once this proves insufficient, all three sizes
should be increased.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
If message length exceeds maxlen argument of rtnl_talk(), it is truncated
to maxlen but unlike in the case of truncation to the length of local
buffer in rtnl_talk(), the caller doesn't get any indication of a problem.
In particular, iplink_get() passes the truncated message on and parsing it
results in various warnings and sometimes even a segfault (observed with
"ip link show dev ..." for a NIC with 125 VFs).
Handle message truncation in iplink_get() the same way as truncation in
rtnl_talk() would be handled: return an error.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
This patch converts spots where manual buffer termination was missing to
strlcpy() since that does what is needed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Print the value analogous to flowlabel. While being at it, also break
the overlong lines to not exceed 80 characters boundary.
Signed-off-by: Phil Sutter <phil@nwl.cc>
When trying to change tclass or flowlabel of a GREv6 tunnel which has
the respective value set already, the code accidentally bitwise OR'ed
the old and the new value, leading to unexpected results. Fix this by
clearing the relevant bits of flowinfo variable prior to assigning the
new value.
Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch adds support for the L2ENCAP seg6 mode, enabling to encapsulate
L2 frames within SRv6 packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
The original issue was that filter.name might end up unterminated if
user provided string was too long. But in fact it is not necessary to
copy the commandline parameter at all: just make filter.name point to it
instead.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The patch adds ERSPAN type II tunnel support. The implementation is
based on the draft at
https://tools.ietf.org/html/draft-foschiano-erspan-01.
One of the purposes is for Linux box to be able to receive ERSPAN
monitoring traffic sent from the Cisco switch, by creating a ERSPAN
tunnel device. In addition, the patch also adds ERSPAN TX, so traffic
can also be encapsulated into ERSPAN and sent out.
The implementation reuses the key as ERSPAN session ID, and
field 'erspan' as ERSPAN Index fields:
./ip link add dev ers11 type erspan seq key 100 erspan 123 \
local 172.16.1.200 remote 172.16.1.100
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.
Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
can_state_names array contains at most CAN_STATE_MAX fields, so allowing
an index to it to be equal to that number is wrong. While here, also
make sure the array is indeed that big so nothing bad happens if
CAN_STATE_MAX ever increases.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Covscan complained about dead code but after reading it, I assume the
author's intention was to prefix the interface list with 'Oifs: '.
Initializing first to 1 and setting it to 0 after above prefix was
printed should fix it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This variable is initialized at declaration and nowhere else does any
assignment to it happen, so just drop the check.
Signed-off-by: Phil Sutter <phil@nwl.cc>
ila_csum_name2mode() returning -1 on error but being declared as
returning __u8 doesn't make much sense. Change the code to correctly
detect this issue. Checking for __u8 overruns shouldn't be necessary
though since ila_csum_name2mode() return values are well-defined.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This prevents word-splitting and therefore leads to more accurate error
message in case 'grep -c' prints something other than a number.
Signed-off-by: Phil Sutter <phil@nwl.cc>
To avoid code duplication and have a ligther impact on most of the files,
these functions were made to handle both stdout (FP context) or JSON
output. Using this api, the changes are easier to read and the code
stays as compact as possible.
includes json_writer.h in ip_common.h to make the lib/json_writer.c
functions available to the new "ip_print" api.
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
This patch adds support for the seg6local lightweight tunnel
("ip route add ... encap seg6local ...").
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.
Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The ikey and okey value are normal u32 values. The input accepts
them in dotted, hex or decimal form. For output, hex seems like
the best form since they are not really addresses.
Suggested-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
According to the IPv4 behavior of 'ip' it should be possible
to omit the arguments for local and remote address.
Without this patch omitting these parameters would lead to
uninitialized memory being interpreted as IPv6 addresses.
Reported-by: Christian Langrock <christian.langrock@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When ip netns {add|delete} is first run, it bind-mounts /var/run/netns
on top of itself, then marks it as shared. However, if there are already
bind-mounts in the directory from other tools, these would not be
propagated. Fix this by recursively bind-mounting.
Signed-off-by: Casey Callendrello <casey.callendrello@coreos.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Since kernel commit 475abbf1ef67 ("ipv4: fib: Set offload indication
according to nexthop flags") offload indication is reported on a
per-nexthop basis.
Adjust iproute2 to display it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
For the most of the address flags, use a table of values rather
than open coding every value. This allows for easier inevitable
expansion of flags.
This also fixes the missing stable-privacy flag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ip netns accepts invalid input as namespace name like an empty string or a
string longer than the maximum file name length.
Check that the netns name is not empty and less than or equal to NAME_MAX.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Ability to change geneve device attributes was added to kernel through
commit 5b861f6baa3a ("geneve: add rtnl changelink support"), however one
cannot do the same through ip-link(8) command. Changing the allowed
geneve device attributes using 'ip link set <geneve_name> type geneve id
<geneve_id> <allowed_attributes>' currently fails with 'operation not
supported' error. This patch adds support for it.
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
This patch replaces exits with returns in ip route
commands.
Allows to continue when invoked with ip -batch.
Signed-off-by: Élie Bouttier <elie@bouttier.eu>
In the presence of firewalls which improperly block ICMP Unreachable
(including Fragmentation Required) messages, Path MTU Discovery is
prevented from working.
The workaround is to handle IPv4 payloads opaquely, ignoring the DF
bit.
Kernel commit 22a59be8b7693eb2d0897a9638f5991f2f8e4ddd ("net: ipv4:
Add ability to have GRE ignore DF bit in IPv4 payloads") is
complemented by this user-space changeset which exposes control of
this setting.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
ip netns keeps track of created namespaces with bind mounts named
/var/run/netns/<namespace>. No input sanitization is done, allowing creation and
deletion of files relatives to /var/run/netns or, if the path is non existent or
invalid, allows to create "untracked" namespaces (invisible to the tool).
This commit denies creation or deletion of namespaces with names contaning
"/" or matching exactly "." or "..".
Signed-off-by: Matteo Croce <mcroce@redhat.com>
This patch extends route get to support mpls specific
route attributes like RTA_NEWDST.
Input:
RTA_DST - input label
RTA_NEWDST - labels in packet for multipath selection
By default the getroute handler returns matched
nexthop label, via and oif
With fibmatch keyword (RTM_F_FIB_MATCH flag), full matched
route is returned.
example:
$ip -f mpls route show
101
nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
201
nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12
$ip -f mpls route get 103
RTNETLINK answers: Network is unreachable
$ip -f mpls route get 101
101 as to 102/103 via inet 172.16.2.2 dev virt1-2
$ip -f mpls route get as to 302/303 101
101 as to 302/303 via inet 172.16.12.2 dev virt1-12
$ip -f mpls route get fibmatch 103
RTNETLINK answers: Network is unreachable
$ip -f mpls route get fibmatch 101
101
nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Let XDP link set command request that the program be offloaded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Allow user to select XDP DRV_MODE flag by using xdpdrv keyword
instead of xdp or xdpgeneric.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Add interpretation of XDP_ATTACHED_HW mode on dump.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support to the newly added IFLA_XDP_PROG_ID.
./ip link show dev eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:2 qdisc [...]
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
If a header that includes linux/in6.h is included before
iproute's utils.h, then iproute2 fails to compile on older
glibc versions.
Fixes: e8493916a8 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
After upstream commit 5071034e4af7 ('neigh: Really delete an arp/neigh entry
on "ip neigh delete" or "arp -d"'), we could delete a single FAILED neighbour
entry now. But `ip neigh flush` still skip the FAILED entry.
Move the filter after first round flush so we can flush FAILED entry on fixed
kernel and also do not keep retrying on old kernel.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
When the user specifies `table all` or `table 0` to
the `ip mroute show` command we dump the entirety of
the known mroute tables. Without some sort of
divisor to tell us what table we are looking at
the command is useless.
Add `Table: <vrf name>` to the output of 'ip mroute show table 0'
Follow the convention established by 'ip route show table 0'
for when to display
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This option is documented in gre6 help, but was not supported.
Fixes: af89576d7a ("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
When modifying a route we set the RTA_OIF attribute only if a device was
specified with "dev" or "oif" keyword. But for some unknown reason we
earlier alternatively check also for the presence of "nexthop" keyword,
even though it has no effect. So remove the pointless check.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Add IFLA_EVENT output so that event types can be viewed with
'monitor' command. This gives a little more information for why
a given message was received.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Uses newly introduced RTM_GETROUTE flag RTM_F_FIB_MATCH
to return a matching fib route. Introduces 'fibmatch'
keyword to ip route get.
ipv4:
----
$ip route show
default via 192.168.0.2 dev eth0
10.0.14.0/24
nexthop via 172.16.0.3 dev dummy0 weight 1
nexthop via 172.16.1.3 dev dummy1 weight 1
$ip route get 10.0.14.2
10.0.14.2 via 172.16.1.3 dev dummy1 src 172.16.1.1
cache
$ip route get fibmatch 10.0.14.2
10.0.14.0/24
nexthop via 172.16.0.3 dev dummy0 weight 1
nexthop via 172.16.1.3 dev dummy1 weight 1
ipv6:
----
$ip -6 route show
2001:db9:100::/120 metric 1024
nexthop via 2001:db8:2::2 dev dummy0 weight 1
nexthop via 2001:db8:12::2 dev dummy1 weight 1
$ip -6 route get 2001:db9:100::1
2001:db9:100::1 from :: via 2001:db8:12::2 dev dummy1 \
src 2001:db8:12::1 metric 1024 pref medium
$ip -6 route get fibmatch 2001:db9:100::1
2001:db9:100::/120 metric 1024
nexthop via 2001:db8:12::2 dev dummy1 weight 1
nexthop via 2001:db8:2::2 dev dummy0 weight 1
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
Add to usage message a description of how to configure Infiniband node
and port GUIDs. Also modify the man page to emphasize the GUIDs are
configured for Infiniband VFs.
Fixes: d91fb3f4c7 ("Add support for configuring Infiniband GUIDs")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Since commit a8f820a380a2a06 ('can: add Virtual CAN Tunnel driver (vxcan)')
for Linux 4.12 a virtual CAN tunnel driver analogue to veth is available in
Linux.
This patch adds the ability to create vxcan device pairs.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Change print_linkinfo_brief to take the filter as an input arg.
If the arg is NULL, use the global filter in ipaddress.c.
Signed-off-by: David Ahern <dsahern@gmail.com>
ipaddr_list_flush_or_save generates a list of nlmsg's for links and
optionally for addresses. Move the code into ip_linkaddr_list and
export it along with the supporting infrastructure.
API to use this function is:
struct nlmsg_chain linfo = { NULL, NULL};
struct nlmsg_chain ainfo = { NULL, NULL};
ip_linkaddr_list(family, filter_req, &linfo, &ainfo);
... error checking and code looping over linfo/ainfo ...
free_nlmsg_chain(&linfo);
free_nlmsg_chain(&ainfo);
Signed-off-by: David Ahern <dsahern@gmail.com>
Follow-up to d67b9cd28c1d ("xdp: refine xdp api with regards to
generic xdp") in order to update the XDP dumping part.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Including libc headers first helps as a workaround to redefinition of struct
ethhdr with a suitably patched musl libc that suppresses the kernel
if_ether.h.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Ability to change vxlan device attributes was added to kernel through
commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
cannot do the same through ip(8) command. Changing the allowed vxlan
device attributes using 'ip link set dev <vxlan_name> type vxlan
<allowed_attributes>' currently fails with 'operation not supported'
error. This failure is due to the incorrect rtnetlink message
construction for the 'ip link set' operation.
The vxlan_parse_opt() callback function is called for parsing options
for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
down default values for those attributes that were not provided as CLI
options. However, for the 'set' case we should be only passing down the
explicitly provided attributes and not any other (default) attributes.
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
syntax:
ip xfrm state .... offload dev <if-name> dir <in or out>
Example to add inbound offload:
ip xfrm state .... offload dev mlx0 dir in
Example to add outbound offload:
ip xfrm state .... offload dev mlx0 dir out
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Follow-up to commit c7272ca720 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca720, but the driver still lacks native XDP
support.
# ip link
[...]
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
[...]
In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca720,
but with 'xdpgeneric' instead of 'xdp' command for loading:
# ip -force link set dev eno1 xdpgeneric obj xdp.o
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.
Improve the parser to
(1) only set variables for keywords we care about
(2) recognize (currently) known keywords without value
This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.
Signed-off-by: Craig Gallek <kraig@google.com>
This patch adds commands to support the tunnel source properties
("ip sr tunsrc") and the HMAC key -> secret, algorithm binding
("ip sr hmac").
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:
$ ip vrf pids mgmt
1121 ntpd
1418 gdm-session-wor
1488 gnome-session
1491 dbus-launch
1492 dbus-daemon
1565 sshd
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:
$ ip vrf pids mgmt
1121 ntpd
1418 gdm-session-wor
1488 gnome-session
1491 dbus-launch
1492 dbus-daemon
1565 sshd
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Currently specifying a device to ip netconf and it dumps only values
for IPv4. Change this to dump data for all families unless a specific
family is given.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Currently, 'ip netconf' only shows ipv4 and ipv6 netconf settings. If IPv6
is not enabled, the dump ends with
RTNETLINK answers: Operation not supported
when IPv6 request is attempted. Further, if the mpls_router module is also
loaded a separate request is needed to get MPLS settings.
To make this better going forward, use the new PF_UNSPEC dump all option
if the kernel supports it. If the kernel does not, it sets NLMSG_ERROR and
returns EOPNOTSUPP which is trapped and we fall back to the existing output
to maintain compatibility with existing kernels.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add support for setting and displaying the ttl attribute
for MPLS IP lighweight tunnels.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Add support for setting and displaying the ttl-propagation attribute
initially used by MPLS to control propagation of MPLS TTL to IPv4/IPv6
TTL/hop-limit on popping final label on a per-route basis.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
These are basically stubs: The types which lacked their own help text
simply don't accept any options (yet). Still it might be a bit confusing
to users if they are presented with the generic 'ip link' help text
instead of something saying there are no type specific options.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Take help function in iplink_bridge.c as an example and make other link
types' help functions similar:
* Use a single fprintf() call (if possible).
* Don't state a full command line, just "... type OPTIONS".
* Put every option in it's own line, align options by column.
* List mandatory options first.
link_veth.c is intentionally left untouched because it's 'peer' option
eats all kinds of generic link options and the help text points this out
without duplicating all the options there again.
Signed-off-by: Phil Sutter <phil@nwl.cc>
When neither group or remote is specified (or if they are specified with
the any address), nothing is sent to the kernel. In this case, the
kernel defaults to IPv4. This makes impossible to use IPv6 with
unspecified unicast remote ("bridge fdb add" will return
EAFNOTSUPPORT).
If the user specifies a preferred address family (eg, "ip -6 link add"),
then send either IFLA_VXLAN_GROUP or IFLA_VXLAN_GROUP6 to enforce the
use of the appropriate family.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
MPLS multipath routes are missing a space between 'nexthop' and 'via':
$ ip -net ns1 -f mpls ro ls
100
nexthopvia inet 172.16.2.2 dev virt12
nexthopvia inet 172.16.3.2 dev br0
Add it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add support for new afstats subcommand. This uses the new
IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
per-device, AF-specific stats. At the moment the kernel only supports
MPLS AF stats, so that is all that's implemented here.
The print_num function is exposed from ipaddress.c to be used for
printing the new stats so that the human-readable option, if set, can
be respected.
Example of use:
$ ./ip/ip -f mpls link afstats dev eth1
3: eth1
mpls:
RX: bytes packets errors dropped noroute
9016 98 0 0 0
TX: bytes packets errors dropped
7232 113 0 0
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA(). Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for MPLS netconf to ip monitor and ip netconf commands.
Changes to header files not included as those are typically pulled
in my a header sync with the kernel.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
This patch adds support to the bridge_slave link type for displaying
xstats by reusing the previously added bridge xstats callbacks.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Since cgroups are not namespace aware, the directory heirarchy used by
ip vrf should account for network namespaces. In this case, change the
path from CGRP/BASE/vrf/NAME to CGRP/BASE/NETNS/vrf/NAME where CGRP is
the cgroup2 mount path, BASE in any base heirarchy inherited before VRF
is applied and NAME is the VRF name.
The intent is as follows: a user logs into the box into some namespace
with a name known to iproute2. Some other policy may have put the
process into a BASE heirarchy. From there the user executes a task in
a VRF and in doing so the task heirarchy becomes CGRP/BASE/NETNS/vrf/NAME.
The namespace level is omitted for the default namespace.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Move guts of netns_identify into a standalone function that returns
the netns name in a given buffer.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add support for VRF in a pre-existing hierarchy. For example, if the
current process is running in CGRP/foo/bar, the 'ip vrf exec NAME CMD'
should run CMD in the cgroup CGRP/foo/bar/vrf/NAME.
When listing process ids in a VRF, search for the directory vrf/NAME
regardless of base path (foo/bar/vrf/NAME and vrf/NAME) are still
running against the same vrf NAME.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
This patch implements support for the IFLA_BRPORT_FLUSH attribute
in iproute2 so it can flush bridge slave's fdb dynamic entries.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_MLD_VERSION
attribute in iproute2 so it can change the mcast mld version.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
This patch implements support for the IFLA_BR_MCAST_IGMP_VERSION
attribute in iproute2 so it can change the mcast igmp version.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
This patch implements support for the IFLA_BR_MCAST_STATS_ENABLED
attribute in iproute2 so it can enable/disable mcast stats accounting.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_VLAN_STATS_ENABLED
attribute in iproute2 so it can enable/disable vlan stats accounting.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_FDB_FLUSH attribute
in iproute2 so it can flush bridge fdb dynamic entries.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
This patch adds a new field that is printed in the end of the line which
denotes the real entry state. Before this patch an entry's IIF could
disappear and it would look like an unresolved one (iif = unresolved):
(3.0.16.1, 225.11.16.1) Iif: unresolved
with no way to really distinguish it from an unresolved entry.
After the patch if the dumped entry has RTNH_F_UNRESOLVED set we get:
(3.0.16.1, 225.11.16.1) Iif: unresolved State: unresolved
for unresolved entries and:
(0.0.0.0, 225.11.11.11) Iif: eth4 Oifs: eth3 State: resolved
for resolved entries after the OIF list. Note that "State:" has ':' in
it so it cannot be mistaken for an interface name.
And for the example above, we'd get:
(0.0.0.0, 225.11.11.11) Iif: unresolved State: resolved
Also when dumping all routes via ip route show table all,
it will show up as:
multicast 225.11.16.1/32 from 3.0.16.1/32 table default proto 17 unresolved
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
To specify multiple nexthops in a route the user is expected to use the
"nexthop" keyword which ip route uses to create the RTA_MULTIPATH.
However, ip route always accepts multiple 'via' keywords where only the
last one is used in the route leading to confusion. For example, ip
accepts this syntax:
$ ip ro add vrf red 1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
but the route entered inserted by the kernel is just the last gateway:
1.1.1.0/24 via 10.100.2.18 dev eth2
which is not the full request from the user. Detect the presense of
multiple 'via' and give the user a hint to add nexthop:
$ ip ro add vrf red 1.1.1.0/24 via 10.100.1.18 via 10.100.2.18
Error: argument "via" is wrong: use nexthop syntax to specify multiple via
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix "Policy buffer overflow" when trying to use deleteall with many
policies installed.
Signed-off-by: Alexander Heinlein <alexander.heinlein@secunet.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Entries with long vhost names in /proc/net/igmp have no whitespace
between name and colon, so sscanf() adds it to vhost and
'ip maddr show iface' doesn't include inet result.
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Show ipv6 tunnel keys on presence of GRE_KEY flag for tunnel types
other than GRE. Aligns ipv6 behaviour with ipv4.
Signed-off-by: dforster@brocade.com
Next up a non-root user gets various bpf related error messages:
$ ip vrf exec mgmt bash
Failed to load BPF prog: 'Operation not permitted'
Kernel compiled with CGROUP_BPF enabled?
Catch the EPERM error and do not show the kernel config option.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
A vrf is local to a namespace. Drop any VRF association before trying
to exec a command in the new namespace.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Path in vrf_switch for "default" VRF is supposed to be MNT/vrf not
MNT/default. Also, default_vrf flag is redundant with ifindex. Remove
the flag in favor of ifindex != 0.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Split ipvrf_identify into arg processing and a function that does the
actual cgroup file parsing. The latter function is used in a follow
on patch.
In the process, convert the reading of the cgroups file to use fopen
and fgets just in case the file ever grows beyond 4k. Move printing
of any error message and the vrf name to the caller of the new
vrf_identify.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Move the hint about CGROUP_BPF enabled to prog_load failure since
it fails before the attach. Update the existing error message to
print to stderr.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
'ip vrf' follows the user semnatics established by 'ip netns'.
The 'ip vrf' subcommand supports 3 usages:
1. Run a command against a given vrf:
ip vrf exec NAME CMD
Uses the recently committed cgroup/sock BPF option. vrf directory
is added to cgroup2 mount. Individual vrfs are created under it. BPF
filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
device index. From there the current process (ip's pid) is addded to
the cgroups.proc file and the given command is exected. In doing so
all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
the VRF domain.
The association is inherited parent to child allowing the command to
be a shell from which other commands are run relative to the VRF.
2. Show the VRF a process is bound to:
ip vrf id
This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
entry with the VRF name following.
3. Show process ids bound to a VRF
ip vrf pids NAME
This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
shows the process ids in the particular vrf cgroup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".
Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Adds support to configure BPF programs as nexthop actions via the LWT
framework.
Example:
ip route add 192.168.253.2/32 \
encap bpf out obj lwt_len_hist_kern.o section len_hist \
dev veth0
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.
Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.
Minimal example:
clang -target bpf -O2 -Wall -c prog.c -o prog.o
ip [-force] link set dev em1 xdp obj prog.o # attaching
ip [-d] link # dumping
ip link set dev em1 xdp off # detaching
For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX}
the old hard-coded value is being preserved, since the attribute flag will be
missing.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
L2TP_ATTR_UDP_CSUM is read by the kernel as a NLA_FLAG value,
but is validated as a NLA_U8, so we will write it as an u8,
but the value isn't actually being read by the kernel.
It is written by the kernel as a NLA_U8, so we will read as
such.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
L2TP_ATTR_RECV_SEQ and L2TP_ATTR_SEND_SEQ are declared as NLA_U8
attributes in the kernel, so let's threat them accordingly.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
udp6_csum_{tx,rx}, tunnel and session are the only ones
currently used.
recv_seq, send_seq, lns_mode and data_seq are partially
implemented in a useless way.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Adjusting iproute2 utility to support new macvlan link type mode called
"source".
Example of commands that can be applied:
ip link add link eth0 name macvlan0 type macvlan mode source
ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr flush
ip -details link show dev macvlan0
Based on previous work of Stefan Gula <steweg@gmail.com>
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Cc: steweg@gmail.com
v5:
- rebase and fix checkpatch
v4:
- add MACADDR_SET support
- skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
- fix man page
- Support adding, deleting and showing IP rules with UID ranges.
- Support querying per-UID routes via "ip route get uid <UID>".
UID range routing was added to net-next in 4fb7450683 ("Merge
branch 'uid-routing'")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Commit 7b8179c780 ("iproute2: Add new command to ip link to
enable/disable VF spoof check") tried to add support for
IFLA_VF_SPOOFCHK in a backwards-compatible manner, but aparently overdid
it: parse_rtattr_nested() handles missing attributes perfectly fine in
that it will leave the relevant field unassigned so calling code can
just compare against NULL. There is no need to layback from the previous
(IFLA_VF_TX_RATE) attribute to the next to check if IFLA_VF_SPOOFCHK is
present or not. To the contrary, it establishes a potentially incorrect
assumption of these two attributes directly following each other which
may not be the case (although up to now, kernel aligns them this way).
This patch cleans up the code to adhere to the common way of checking
for attribute existence. It has been tested to return correct results
regardless of whether the kernel exports IFLA_VF_SPOOFCHK or not.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Greg Rose <grose@lightfleet.com>
Recently a new per-port flag was added which controls the flooding of
unknown multicast, this patch adds support for controlling it via iproute2.
It also updates the man pages with information about the new flag.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Adjusting iproute2 utility to support new macvlan link type mode called
"source".
Example of commands that can be applied:
ip link add link eth0 name macvlan0 type macvlan mode source
ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr flush
ip -details link show dev macvlan0
Based on previous work of Stefan Gula <steweg@gmail.com>
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Cc: steweg@gmail.com
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Introduce a new API that exposes a list of vlans per VF (IFLA_VF_VLAN_LIST),
giving the ability for user-space application to specify it for the VF as
an option to support 802.1ad (VST QinQ).
We introduce struct vf_vlan_info, which extends struct vf_vlan and adds
an optional VF VLAN proto parameter.
Default VLAN-protocol is 802.1Q.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older kernel versions.
Suitable ip link tool command examples:
- Set vf vlan protocol 802.1ad (S-TAG)
ip link set eth0 vf 1 vlan 100 proto 802.1ad
- Set vf vlan S-TAG and vlan C-TAG (VST QinQ)
ip link set eth0 vf 1 vlan 100 proto 802.1ad vlan 30 proto 802.1Q
- Set vf to VST (802.1Q) mode
ip link set eth0 vf 1 vlan 100 proto 802.1Q
- Or by omitting the new parameter (backward compatible)
ip link set eth0 vf 1 vlan 100
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Add support to dump the mroute cache entry age if the show_stats (-s)
switch is provided.
Example:
$ ip -s mroute
(0.0.0.0, 239.10.10.10) Iif: eth0 Oifs: eth0
0 packets, 0 bytes, Age 245.44
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.
As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.
Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add
real 0m16.832s
user 0m1.350s
sys 0m15.029s
patched:
time seq 1 1000 | xargs -n 1 ip netns add
real 0m3.859s
user 0m0.132s
sys 0m3.205s
Signed-off-by: Anton Aksola <aakso@iki.fi>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
The original bond/bridge/vrf and slaves use same id, which make people
confused. Use bond/bridge/vrf_slave as id name will make code more clear.
Acked-by: Phil Sutter <psutter@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
In ip monitor, netns_map_init will check getnsid is supported or not.
But when /proc/self/ns/net does not exist, we just print out error
messages and exit. So user cannot use ip monitor anymore when
CONFIG_NET_NS is disabled:
# ip monitor
open("/proc/self/ns/net"): No such file or directory
If open "/proc/self/ns/net" failed, set have_rtnl_getnsid to false.
Fixes: d652ccbf81 ("netns: allow to dump and monitor nsid")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
ftell() may return -1 in error case, which is not handled and
therefore pass a negative offset to fseek(). The return code of
fseek() is also not checked.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
The new mode 'l3s' can be set like -
ip link add link <master> dev <IPvlan-slave> type ipvlan mode l3s
e.g. ip link add link eth0 dev ipvl0 type ipvlan mode l3s
Also did some trivial code restructuring.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
the maximum possible ICV length in a MACsec frame is 16 octects, not 32:
fix get_icvlen() accordingly, so that a proper error message is displayed
in case input 'icvlen' is greater than 16.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
use get_be64() in place of get_u64() when parsing input 'sci' parameter,
so that 'sci' can be entered using network byte order regardless the
endianness of target system; use ntohll() when printing out 'sci'. While
at it, improve documentation of 'sci' in ip-link.8.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
remove hardcoded base 10 parsing of 'port' parameter, update man page
and fix usage() functions as well. Fix misleading line in man page that
theoretically allowed specifying 'port' keyword right after 'sci' keyword.
Provide documentation of 'address' parameter in man pages and in usage()
functions as well.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Show which processes are using which tun/tap devices, e.g.:
$ ip -d tuntap
tun0: tun
Attached to processes: vpnc(9531)
vnet0: tap vnet_hdr
Attached to processes: qemu-system-x86(10442)
virbr0-nic: tap UNKNOWN_FLAGS:800
Attached to processes:
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
If we have multicast routes and do ip route show table all we'll get the
following output:
...
multicast ???/32 from ???/32 table default proto static iif eth0
The "???" are because the rtm_family is set to RTNL_FAMILY_IPMR instead
(or RTNL_FAMILY_IP6MR for ipv6). Add a simple workaround that returns the
real family based on the rtm_type (always RTN_MULTICAST for ipmr routes)
and the rtm_family. Similar workaround is already used in ipmroute, and
we can use this helper there as well.
After the patch the output is:
multicast 239.10.10.10/32 from 0.0.0.0/32 table default proto static iif eth0
Also fix a minor whitespace error and switch to tabs.
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
The code is a bit messy, as it starts with space after text and at some
point switches to space before text. But either way, printing space
before *and* after text almost certainly leads to printing more
whitespace than necessary.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Prior to this patch, If one route entry's RTA_PREFSRC and RTA_GATEWAY
both were NULL, it was supposed to be restored ONLY as a local address.
But as it didn't check tb[RTA_PREFSRC] when restoring local networks,
rtattr_cmp would return a success if it was NULL, this route entry would
be restored again as a local network.
This patch is to add tb[RTA_PREFSRC] check when restoring local networks.
Fixes: 74af8dd962 ("ip route: restore route entries in correct order")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Phil Sutter <phil@nwl.cc>
Currently, the `ip ila` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.
Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.
Fixes: ec71cae0bb ("ila: Support for configuring ila to use netfilter hook")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Currently, the `ip fou` command tries to initialize a genl context even
when we just want to see the help for the command, which doesn't require
to talk to the kernel at all.
Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.
Fixes: 6928747b6e ("ip fou: Support to configure foo-over-udp RX")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Currently, the `ip macsec` command tries to initialize a genl context
even when we just want to see the help for the command, which doesn't
require to talk to the kernel at all.
Delay genl initialization, which can fail if the module isn't loaded,
until the point where we will actually need it.
Fixes: b26fc590ce ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
All users of genl have the same code to open a genl socket and resolve
the family for their specific protocol. Introduce a helper to initialize
the handle, and use it in all the genl code.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
since kernel driver has valid default values for 'cipher' and 'icvlen',
there is no need for requiring users to specify both of them when a new
link is added. Also, prompt an error message and exit with appropriate
exit status in case of unsupported cipher suite.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
fix output of "ip address help" and "ip link help". Update TYPE list in man
pages ip-address.8 and ip-link.8 as well.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Since parse_rtattr_flags() calls memset already, there is no need for
callers to do so themselves.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).
Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.
The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Sometimes we cannot restore route entries, because in kernel
[1] fib_check_nh()
[2] fib_valid_prefsrc()
cause some routes to depend on existence of others while adding.
For example, we saved all the routes, and flushed all tables
[a] default via 192.168.122.1 dev eth0
[b] 192.168.122.0/24 dev eth0 src 192.168.122.21
[c] broadcast 127.0.0.0 dev lo table local src 127.0.0.1
[d] local 127.0.0.0/8 dev lo table local src 127.0.0.1
[e] local 127.0.0.1 dev lo table local src 127.0.0.1
[f] broadcast 127.255.255.255 dev lo table local src 127.0.0.1
[g] broadcast 192.168.122.0 dev eth0 table local src 192.168.122.21
[h] local 192.168.122.21 dev eth0 table local src 192.168.122.21
[i] broadcast 192.168.122.255 dev eth0 table local src 192.168.122.21
Now start to restore them:
If we want to add [a], we have to add [b] first, as [1] and
'via 192.168.122.1' in [a].
If we want to add [b], we have to add [h] first, as [2] and
'src 192.168.122.21' in [b].
So the correct order to restore should be like:
[e][h] -> [b][c][d][f][g][i] -> [a]
This patch fixes it by traversing the file 3 times, it only restores
part of them in each run according to the following conditions, to
make sure every entry can be restored successfully.
1. !gw && (!fib_prefsrc || fib_prefsrc == cfg->fc_dst)
2. !gw && (fib_prefsrc != cfg->fc_dst)
3. gw
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Add two NLA's that allow configuration of Infiniband node or port GUIDs
by referencing the IPoIB net device set over the physical function. The
format to be used is as follows:
ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78
Signed-off-by: Eli Cohen <eli@mellanox.com>
Add vrf keyword to 'ip route' commands. Allows:
1. Users can list routes by VRF name:
$ ip route show vrf NAME
VRF tables have all routes including local and broadcast routes.
The VRF keyword filters LOCAL and BROADCAST routes; to see all
routes the table option can be used. Or to see local routes only
for a VRF:
$ ip route show vrf NAME type local
2. Add or delete a route for a VRF:
$ ip route {add|delete} vrf NAME <route spec>
3. Do a route lookup for a VRF:
$ ip route get vrf NAME ADDRESS
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add ipvrf_get_table to lookup table id for device name. Returns 0
on any error or if name is not a VRF device.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add vrf keyword to 'ip neigh' commands. Allows listing neighbor
entries for all links associated with a given VRF.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Add vrf keyword to 'ip link' and 'ip addr' commands (common list code).
Allows:
1. Adding a link to a VRF
$ ip link set NAME vrf NAME
Removing a link from a VRF still uses 'ip link set NAME nomaster'
2. Showing links associated with a VRF:
$ ip link show vrf NAME
3. List addresses associated with links in a VRF
$ ip -br addr show vrf red
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Since the function won't ever change the data 'kind' is pointing at, it
can sanely be made const.
Fixes: e0513807f6 ("ip-address: Support filtering by slave type, too")
Suggested-by: Stephen Hemminger <shemming@brocade.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Currently a timeout is multiplied by HZ in user-space and
then it multiplied by HZ in kernel-space.
$ ./ip/ip r add 2002::0/64 dev veth1 expires 10
$ ./ip/ip -6 r
2002::/64 dev veth1 metric 1024 linkdown expires 996sec pref medium
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Stephen Hemminger <shemming@brocade.com>
Fixes: 68eede2505 ("route: allow routes to be configured with expire values")
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
This patch allows to query all interfaces enslaved to a bridge or bond
using the following syntax:
| ip addr show type bridge_slave
Filtering has to be done in userspace since the kernel does not support
filtering on IFLA_INFO_SLAVE_KIND.
Functionality introduced in this patch is not fully complete since it
does not allow to match on type and slave type at the same time, but it
doesn't prevent implementing a dedicated slave_type match, either.
Signed-off-by: Phil Sutter <phil@nwl.cc>
On Tue, Jun 21, 2016 at 06:18 PM CEST, Phil Sutter <phil@nwl.cc> wrote:
> By combining the attribute extraction and check for existence, the
> additional indentation level in the 'else' clause can be avoided.
>
> In addition to that, common actions for 'daddr' are combined since the
> function returns if neither of the branches are taken.
>
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
> ip/tcp_metrics.c | 45 ++++++++++++++++++---------------------------
> 1 file changed, 18 insertions(+), 27 deletions(-)
>
> diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
> index f82604f458ada..899830c127bcb 100644
> --- a/ip/tcp_metrics.c
> +++ b/ip/tcp_metrics.c
> @@ -112,47 +112,38 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
> parse_rtattr(attrs, TCP_METRICS_ATTR_MAX, (void *) ghdr + GENL_HDRLEN,
> len);
>
> - a = attrs[TCP_METRICS_ATTR_ADDR_IPV4];
> - if (a) {
> + if ((a = attrs[TCP_METRICS_ATTR_ADDR_IPV4])) {
Copy the pointer inside the branch?
Same gain on indentation while keeping checkpatch happy.
I only compile-tested the patch below.
Thanks,
Jakub
I forgot to change the variable in the conditional, too.
Fixes: 8fe58d5894 ("iplink: Check address length via netlink")
Signed-off-by: Phil Sutter <phil@nwl.cc>
This is a feature which was lost during the conversion to netlink
interface: If the device exists and a user tries to change the link
layer address, query the kernel for the old address first and reject the
new one if sizes differ.
This patch adds the same check when setting VF address by assuming same
length as PF device.
Note that at least for VFs the check can't be done in kernel space since
struct ifla_vf_mac lacks a length field and due to netlink padding the
exact size can't be communicated to the kernel.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Kernel commit 96c63fa7393d ("net: Add l3mdev rule") added support for
the FRA_L3MDEV attribute. The attribute enables use of l3mdev rules
which mean 'get table id from l3 master device'. This patch adds
support to iproute2 to show, add and delete rules with this attribute.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Not sure why this was limited to ip-link before. It is semantically
equal to the 'master' keyword, which is not restricted at all.
The man page and help text adjustments include the 'master' keyword as
well since that is also supported but wasn't documented before.
Cc: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Extend ip-link to create MACsec devices
ip link add link <master> <macsec> type macsec [options]
Add `ip macsec` command to configure receive-side secure channels and
secure associations within a macsec netdevice.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
Similar to the Linux kernel and perf add infrastructure to reduce the
amount of output tossed to a user during a build. Full build output
can be obtained with 'make V=1'
Builds go from:
make[1]: Leaving directory `/home/dsa/iproute2.git/lib'
make[1]: Entering directory `/home/dsa/iproute2.git/ip'
gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -c -o ip.o ip.c
gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -c -o ipaddress.o ipaddress.c
to:
...
AR libutil.a
ip
CC ip.o
CC ipaddress.o
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
A new HSR version was added in 4.7 that can be enabled
via iproute2. Per default the old version is selected,
however, with "ip link add [..] type hsr [..] version 1"
the newer version can be enabled.
Signed-off-by: Peter Heise <peter.heise@airbus.com>
Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Since we can only configure unicast, we probably want to be able to
display unicast, rather than multicast.
Fixes: 906ac5437a ("geneve: add support for IPv6 link partners")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Display only attributes that are relevant when a GRE interface is in
'external' mode instead of the default values (which are ignored by the
kernel even if passed back).
Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
For GRE interfaces in 'external' mode, the kernel ignores all manual
settings like remote IP address or TTL. However, for some of those
attributes, kernel checks their value and does not allow them to be zero
(even though they're ignored later).
Currently, 'ip link' always includes all attributes in the netlink message.
This leads to problem with creating interfaces in 'external' mode. For
example, this command does not work:
ip link add gre1 type gretap external
and needs a bogus remote IP address to be specified, as the kernel enforces
remote IP address to be either not present, or not null.
Ignore the parameters that do not make sense in 'external' mode.
Unfortunately, we cannot error out, as there may be existing deployments
that workarounded the bug by specifying bogus values.
Fixes: 926b39e1fe ("gre: add support for collect metadata flag")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Use the same rtnl_dump_request_n call as the show. The rtnl_wilddump_request
assumes the type uses an ifinfomsg which is not the case for the neighbor
table.
Signed-off-by: Jeff Harris <jefftharris@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
It doesn't make sense to use external control plane and fill internal FDB at
the same time. It's even an illegal combination for VXLAN-GPE.
Just switch off learning when 'external' is specified.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Follow-up for kernel commit 8eb3b99554b8 ("geneve: support setting
IPv6 flow label") to allow setting the label for the device config.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Follow-up for kernel commit e7f70af111f0 ("vxlan: support setting
IPv6 flow label") to allow setting the label for the device config.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Enable support for configuring outer UDP checksums on Geneve tunnels:
ip link add type geneve id 10 remote 10.0.0.2 udpcsum
Signed-off-by: Jesse Gross <jesse@kernel.org>
On recent kernels, UDP checksum computation has become more efficient and
the default behavior was changed, however, the ip command overrides this
by always specifying a particular behavior.
If the user does not specify that UDP checksums should either be computed
or not then we don't need to send an explicit netlink message - the kernel
can just use its default behavior.
Signed-off-by: Jesse Gross <jesse@kernel.org>
There is only a single user who needs it to be reentrant (not really,
but it's safer like this), add rt_addr_n2a_r() for it to use.
Signed-off-by: Phil Sutter <phil@nwl.cc>
There are only three users which require it to be reentrant, the rest is
fine without. Instead, provide a reentrant format_host_r() for users
which need it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This adds two helper functions which map a given data field to a color,
so color_fprintf() statements don't have to be duplicated with only a
different color value depending on that data field's value. In order for
this to work in a generic way, COLOR_CLEAR has been added to serve as a
fallback default of uncolored output.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Add support for ignore route attribute, and refine the code to use
rta_getattr_* function to get attribute value.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
To not make the output overly confusing, list them in a definition of
the STATE placeholder which is already used in the show/flush syntax but
wasn't explained before.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This is a bit pedantic, but brackets ([]) show optional values and since
TYPE must not become empty, they're not suited to surround the type
keyword choices. Use curly braces instead.
Also add some missing whitespace to the parameter list above.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Neither 'list' nor 'flush' actions accept parameters, and with given
prefix the action keyword is not optional anymore.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Although the ip command accepts both "neighbor" and "neighbour" as
subcommand, I assume it's sufficient to list it in help text as just
"neigh" like ip.8 does.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The help text was misleading: One could think it is possible to list
rules by selector, which would be nice but isn't. This change also
clarifies that 'ip rule' defaults to 'list' if no further arguments are
given.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Add IFLA_VF_TRUST message to trust the VF.
PF can accept some privileged operation from the trusted VF.
For example, ixgbe PF doesn't allow to enable VF promiscuous mode until
the VF is trusted because it may hurt performance.
To trust VF.
# ip link set dev eth0 vf 1 trust on
To untrust VF.
# ip link set dev eth0 vf 1 trust off
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Add support to be able to view and change IFLA_BRPORT_MULTICAST_ROUTER port
attribute.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Export all the read-only values that get returned about a bridge port
such as the timers, the ids, designated_port and cost,
topology_change_ack and config_pending. For the bridge ids the
br_dump_bridge_id function is exported from iplink_bridge.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
netns_map_add() does a malloc of (sizeof (struct nsid_cache) +
strlen(name)) and then proceed with strcpy() of name into the
zero-length member at the end of the nsid_cache structure. The
nul-terminator is written outside of the allocated memory and may
overwrite the allocator's internal structure.
This can trigger a segmentation fault on i386 uclibc with names of size 8:
after the corruption occurs, the call to closedir() on netns_map_init()
crashes while freeing the DIR structure.
Here is the relevant valgrind output:
==1251== Memcheck, a memory error detector
==1251== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1251== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
info
==1251== Command: ./ip netns
==1251==
==1251== Invalid write of size 1
==1251== at 0x4011975: strcpy (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==1251== by 0x8058B00: netns_map_add (ipnetns.c:181)
==1251== by 0x8058E2A: netns_map_init (ipnetns.c:226)
==1251== by 0x8058E79: do_netns (ipnetns.c:776)
==1251== by 0x804D9FF: do_cmd (ip.c:110)
==1251== by 0x804D814: main (ip.c:300)
Support for the new rx_nohandler statistic.
This code is designed to handle the case where the kernel reported statistic
structure is smaller than the larger structure in later releases (and vice versa).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds support to add mpls multipath
routes.
example:
ip -f mpls route add 100 \
nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
It seems that I've made a mistake when I exported these, instead of a
space in the end I've put a newline character which is wrong and breaks
the single line output.
Fixes: 7d6bc3b87a ("bonding: export 3ad actor and partner port state")
Reported-by: Sam Tannous <stannous@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_NF_CALL_(IP|IP6|ARP)TABLES
attributes in iproute2 so it can change their values.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_INTVL
attribute in iproute2 so it can change the startup query interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
attribute in iproute2 so it can change the query response interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_QUERY_INTVL attribute
in iproute2 so it can change the query interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_QUERIER_INTVL
attribute in iproute2 so it can change the querier interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_MEMBERSHIP_INTVL
attribute in iproute2 so it can change the membership interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_INTVL
attribute in iproute2 so it can change the last member interval.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_CNT
attribute in iproute2 so it can change the startup query count.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_CNT
attribute in iproute2 so it can change the last member count value.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_HASH_MAX attribute
in iproute2 so it can change the maximum hashed entries.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_HASH_ELASTICTITY
attribute in iproute2 so it can change the hash elasticity value.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_QUERIER attribute
in iproute2 so it can toggle the mcast querier value.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_QUERY_USE_IFADDR
attribute in iproute2 so it can toggle the multicast_query_use_ifaddr val.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_SNOOPING attribute
in iproute2 so it can change the multicast snooping value.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_MCAST_ROUTER attribute
in iproute2 so it can change the multicast router value.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_VLAN_DEFAULT_PVID
attribute in iproute2 so it can change the default pvid.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_GROUP_ADDR attribute
in iproute2 so it can change the group address.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This patch implements support for the IFLA_BR_GROUP_FWD_MASK attribute
in iproute2 so it can change the group forwarding mask.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Netlink already provides hello_timer, tcn_timer, topology_change_timer
and gc_timer, so let's make them visible.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
This change add the ability to create lwt/flow based/externally
controlled geneve device and to select the udp destination port used
by a full geneve tunnel.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
'ip monitor all' is broken on older kernels.
This patch fixes 'ip monitor all' to match
'all' and not 'all-nsid'.
It moves parsing arg 'all-nsid' to after parsing
'all'.
Before:
$ip monitor all
NETLINK_LISTEN_ALL_NSID: Protocol not available
After:
$ip monitor all
[NEIGH]Deleted 10.0.0.1 dev eth1 lladdr c4:54:44:4f:b2:dd STALE
Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
the warning was:
iproute.c:301:12: warning: 'val' may be used uninitialized in this
function [-Wmaybe-uninitialized]
features &= ~RTAX_FEATURE_ECN;
^
iproute.c:575:10: note: 'val' was declared here
__u32 val;
^
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Options 'group' and 'remote' cannot take 'any' as value but 'local' can.
Signed-off-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
This patch replaces exits with returns in iplink
command. Helps to continue on errors when
invoked with ip -force -batch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
"random" is a new IPv6 addrgenmode, enabling "stable_secret" type
addresses with an auto-generated secret.
$ ip link set eth0 addrgenmode random
$ ip -d link show dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether 00:21:86:a3:25:7d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode random
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
It is possible to switch to another addrgenmode after setting a
valid secret. Allow switching back without reconfiguring the
secret for completeness.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
I repeatedly failed to get this right, so now I have to clean up my mess
afterwards.
Fixes: 7d6aadcd0a ("ip{,6}tunnel: have a shared stats parser/printer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
This has a slight side-effect of not aborting when /proc/net/dev is
malformed, but OTOH stats are not parsed for uninteresting interfaces.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Currently ip6 encap support for lwtunnel is missing.
This patch implement it, mostly duplicating the ipv4 parts.
Also be sure to insert a space after the encap type, when
showing lwtunnel, to avoid the tunnel type and the following
argument being merged into a single word.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch add support for IFLA_VXLAN_COLLECT_METADATA via the
'external' keyword to the vxlan link.
Also enforce mutual exclusion between 'vni' and 'external'.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently parse_encap_ip() does not update correctly argv/argc;
if multiple lwtunnel arguments are provided, the parsing fails after
the first one, i.e.
ip route add 172.16.101.0/24 dev vxlan1 encap ip id 42 dst 192.168.255.1
fails with:
Error: either "to" is duplicate, or "dst" is a garbage.
This commit addresses the issue, stepping to next argument at each iteration
of the parsing loop.
Fixes: 1e5293056a ("lwtunnel: Add encapsulation support to ip route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
accidentally reordered fprintf statements. This patch restores the
original ordering.
Fixes: 0f7543322c ("route: ignore RTAX_HOPLIMIT of value -1")
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch:
- Adds a utility function for parsing a 64 bit address
- Adds a utility function for converting a 64 bit address to ASCII
- Adds and ILA encap type in lwt tunnels
Signed-off-by: Tom Herbert <tom@herbertland.com>
Currently, the table id for VRF devices requires an integer. Convert
it to use rtnl_rttable_a2n which handles table names from the iproute2
directory.
This also fixes a bug in the original commit where table name are not
properly handled.
Fixes: 15faa0a30b ("add support for VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Older kernels use -1 internally as indicator to use the sysctl default,
but they still export the setting. Newer kernels use 0 to indicate that
(which is why the conversion from -1 to 0 was done here), but they also
stopped exporting the value. Since the meaning of -1 is clear, treat it
equally like default on newer kernels (which is to not print anything).
Signed-off-by: Phil Sutter <phil@nwl.cc>
On 24.11.2015 02:26, Stephen Hemminger wrote:
> On Thu, 12 Nov 2015 21:10:08 +0000
> Konstantin Shemyak <konstantin@shemyak.com> wrote:
>
>> When creating an IP tunnel over IPv6, the address family must be passed in
>> the option, e.g.
>>
>> ip -6 tunnel add mode ip6gre local 1::1 remote 2::2
>>
>> This makes it impossible to create both IPv4 and IPv6 tunnels in one batch.
>>
>> In fact the address family option is redundant here, as each tunnel mode is
>> relevant for only one address family.
>> The patch determines whether the applicable address family is AF_INET6
>> instead of the default AF_INET and makes the "-6" option unnecessary for
>> "ip tunnel add".
>>
>> Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
>> ---
>> ip/iptunnel.c | 26 ++++++++++++++++++++++++++
>> testsuite/tests/ip/tunnel/add_tunnel.t | 14 ++++++++++++++
>> 2 files changed, 40 insertions(+)
>> create mode 100755 testsuite/tests/ip/tunnel/add_tunnel.t
>>
>> diff --git a/ip/iptunnel.c b/ip/iptunnel.c
>> index 78fa988..7826a37 100644
>> --- a/ip/iptunnel.c
>> +++ b/ip/iptunnel.c
>> @@ -629,8 +629,34 @@ static int do_6rd(int argc, char **argv)
>> return tnl_6rd_ioctl(cmd, medium, &ip6rd);
>> }
>>
>> +static int tunnel_mode_is_ipv6(char *tunnel_mode) {
>> + char *ipv6_modes[] = {
>> + "ipv6/ipv6", "ip6ip6",
>> + "vti6",
>> + "ip/ipv6", "ipv4/ipv6", "ipip6", "ip4ip6",
>> + "ip6gre", "gre/ipv6",
>> + "any/ipv6", "any"
>> + };
>> + int i;
>> +
>> + for (i = 0; i < sizeof(ipv6_modes) / sizeof(char *); i++) {
>> + if (strcmp(ipv6_modes[i], tunnel_mode) == 0)
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>
> The ipv6_modes table should be static const.
Thank you for the note! attached the corrected patch.
> Also is it possible to use strstr for ipv6 and ip6 or even strchr(tunnel_mode, '6')
> to simplify this?
There is IPv6 tunnel mode 'any', and IPv4 tunnel mode 'ipv6/ip' (aka
'sit'). It looks to me that attempts to find some substring match
would not make the code much shorter, but definitely less readable.
Konstantin Shemyak.
>From 42d27db0055c3a114fe6eb86d680bef9ec098ad4 Mon Sep 17 00:00:00 2001
From: Konstantin Shemyak <konstantin@shemyak.com>
Date: Thu, 12 Nov 2015 20:52:02 +0200
Subject: [PATCH] Tunnel address family is determined from the tunnel mode
When the tunnel mode already tells the IP address family, "ip tunnel"
command determines it and does not require option "-4"/"-6" to be passed.
This makes possible creating both IPv4 and IPv6 tunnels in one batch.
Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
This patch adds support to remote checksum checksum offload
to VXLAN. This patch adds remcsumtx and remcsumrx to ip vxlan
configuration to enable remote checksum offload for transmit
and receive on the VXLAN tunnel.
https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
Example:
ip link add name vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 \
udpcsum remcsumtx remcsumrx
Testing:
Ran single netperf over mlnx4 to illustrate the effest:
- Without RCO (UDP csum set to zero)
4335.99 Mbps
- With RCO enabled
7661.81 Mbps
Signed-off-by: Tom Herbert <tom@herbertland.com>
Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is actually not
necessary to allow that value to be specified, but keep it anyway for
backwards compatibility.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Linux version 3.1 introduced a consistency check for netlink dumps in
commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
iproute2 when flushing more addresses than can fit into a single
RTM_GETADDR response. To silence the spurious error message "Dump was
interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to
not care about NLM_F_DUMP_INTR.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since it's no longer relevant whether an IP address is primary or
secondary when flushing, ipaddr_flush() can be simplified a bit.
Signed-off-by: Phil Sutter <phil@nwl.cc>
I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>
would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)
Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address. But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed. That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.
It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>