If multiple ip processes are ran at the same time to set up
separate network namespaces, and it is the first time so /run/netns
has to be set up first, and they end up doing it at the same time,
the processes might enter a recursive loop creating thousands of
mount points, which might crash the system depending on resources
available.
Try to take a flock on /run/netns before doing the mount() dance, to
ensure this cannot happen. But do not try too hard, and if it fails
continue after printing a warning, to avoid introducing regressions.
First reported on Debian: https://bugs.debian.org/949235
To reproduce (WARNING: run in a VM to avoid system lockups):
for i in {0..9}
do
strace -e trace=mount -e inject=mount:delay_exit=1000000 ip \
netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait
The strace is to ensure the problem always reproduces, to add an
artificial synchronization point after the first mount().
Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Do not hardcode /usr/lib/ip as a path and allow libraries path
configuration in run-time.
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Gcc-10 complains about possible string length overflow.
This can't happen Ethernet address format is always limited to
18 characters or less. Just resize the temp buffer.
Fixes: 70dfb0b883 ("iplink: bridge: export bridge_id and designated_root")
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.
Function bpf_prog_load() is removed as it's conflict with libbpf
function name.
bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch aim to add basic checking functions for later iproute2
libbpf support.
First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.
Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.
When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Invoke parse_on_off() instead of rolling a custom function.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.
Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.
To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.
Have iplink_vlan pass false for allow_all.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.
Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.
Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".
Convert the macsec code to the new helpers.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
`ip addr` when run under qemu-user-riscv64, fails. This likely is due
to qemu-5.1 not doing translation of RTM_GETNSID calls. Aborting ip
completely is not helpful for the user however. This patch reworks
the error handling.
Before:
rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ffrtest:/ #
Afterwards:
rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
rtnl_send(RTM_GETNSID): Operation not supported. Continuing anyway.
link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.72.147/28 brd 192.168.72.159 scope global host0
valid_lft forever preferred_lft forever
inet6 fe80::443f:2dff:fe88:3ddb/64 scope link
valid_lft forever preferred_lft forever
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The XFRMA_SET_MARK_MASK attribute can be set in states (4.19+)
It is optional and the kernel default is 0xffffffff
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)
e.g.
./ip/ip xfrm state add output-mark 0x6 mask 0xab proto esp \
auth digest_null 0 enc cipher_null ''
ip xfrm state
src 0.0.0.0 dst 0.0.0.0
proto esp spi 0x00000000 reqid 0 mode transport
replay-window 0
output-mark 0x6/0xab
auth-trunc digest_null 0x30 0
enc ecb(cipher_null)
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
These were reported as IPv6-only and ignored:
# ip address add 192.0.2.2/24 dev dummy5 noprefixroute
Warning: noprefixroute option can be set only for IPv6 addresses
# ip address add 224.1.1.10/24 dev dummy5 autojoin
Warning: autojoin option can be set only for IPv6 addresses
This enables them back for IPv4.
Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Adel Belhouane <bugs.a.b@free.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Used for tracking neighbour table overflows.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Remove the extra space between the reported ipoib attrs - use only one
space instead of two.
Fixes: de0389935f ("iplink: Added support for the kernel IPoIB RTNL ops")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds support for recently
added link IFLA_PROTO_DOWN_REASON attribute.
IFLA_PROTO_DOWN_REASON enumerates reasons
for the already existing IFLA_PROTO_DOWN link
attribute.
$ cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity
$ ip link set dev vx10 protodown on protodown_reason vrrp on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <vrrp>
$ip -p -j link show dev vx10
[ {
<snip>
"proto_down": true,
"proto_down_reason": [ "vrrp" ]
} ]
$ip link set dev vx10 protodown_reason mlag on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <mlag,vrrp>
$ip -p -j link show dev vx10
[ {
<snip>
"proto_down": true,
"protodown_reason": [ "mlag","vrrp" ]
} ]
$ip -p -j link show dev vx10
$ip link set dev vx10 protodown off protodown_reason vrrp off
Error: Cannot clear protodown, active reasons.
$ip link set dev vx10 protodown off protodown_reason mlag off
$
Note: for somereason the json and non-json key for protodown
are different (protodown and proto_down). I have kept the
same for protodown reason for consistency (protodown_reason and
proto_down_reason).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The XFRMA_SET_MARK_MASK attribute is set in states (4.19+).
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)
sample output: note the output-mark mask
ip xfrm state
src 192.1.2.23 dst 192.1.3.33
proto esp spi 0xSPISPI reqid REQID mode tunnel
replay-window 32 flag af-unspec
output-mark 0x3/0xffffff
aead rfc4106(gcm(aes)) 0xENCAUTHKEY 128
if_id 0x1
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.
While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.
Fixes: 8589eb4efd ("treewide: refactor help messages")
Fixes: 5a3ec4ba64 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch enhances the iplink command to add a proto parameters to
create PRP device/interface similar to HSR. Both protocols are
quite similar and requires a pair of Ethernet interfaces. So re-use
the existing HSR iplink command to create PRP device/interface as
well. Use proto parameter to differentiate the two protocols.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ip maddress add|del takes a MAC address as argument, so insist on
getting a length of ETH_ALEN bytes. This makes sure the passed argument
is actually a MAC address and especially not an IPv4 address which
was previously accepted and silently taken as a MAC address.
While at it, do not print *argv in the error path as this has been
modified by ll_addr_a2n() and doesn't contain the full string anymore,
which can lead to misleading error messages.
Also while at it, replace the hardcoded buffer size with the actual
buffer size using sizeof().
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Replace the iproute2 snapshot with a version string which is
autogenerated as part of the build process using git describe.
This will also allow seeing if the version of the command
is built from the same sources is as upstream.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This flag allows to create SA where sequence number can cycle in
outbound packets if set.
Signed-off-by: Petr Vaněk <pv@excello.cz>
Signed-off-by: David Ahern <dsahern@kernel.org>
According to 'ip mptcp help', 'endpoint show' can accept no argument:
ip mptcp endpoint show [ id ID ]
It makes sense to print all endpoints when no filter is used.
So here if the following command is used, all endpoints are printed:
ip mptcp endpoint show
Same as:
ip mptcp endpoint
Fixes: 7e0767cd ("add support for mptcp netlink interface")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).
Add support for getting/deleting policies with this attribute.
For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
utils.h is included two times in ipaddress.c, there is no need for that.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Follow the precedent of other parts of iproute2 follow the example of:
Standard libc headers
Linux headers
Iproute2 support headers
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.
This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/
Examples:
- ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc
This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.
- ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto
Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.
Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.
Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
On some distros, i.e. rhel 7.6, compilation fails with the following:
ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code
This commit fixes the single place needed for compilation to pass.
Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb
$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Actually display that deletions are happening
when monitoring nexthops.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.
Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.
Signed-off-by: David Ahern <dsahern@gmail.com>
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.
Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:
Error: either "local" is duplicate, or "dadfailed" is a garbage.
But now, they just warn more explicitly:
Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch adds support for rpl segment routing settings.
Example:
ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.
v2:
- style fixes (David Ahern)
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.
Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.
With this patch, users can add and dump erspan options like:
# ip netns add a
# ip netns add b
# ip -n a link add eth0 type veth peer name eth0 netns b
# ip -n a link set eth0 up
# ip -n b link set eth0 up
# ip -n a addr add 10.1.0.1/24 dev eth0
# ip -n b addr add 10.1.0.2/24 dev eth0
# ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
local 10.1.0.2 remote 10.1.0.1
# ip -n b addr add 1.1.1.1/24 dev erspan1
# ip -n b link set erspan1 up
# ip -n b route add 2.1.1.0/24 dev erspan1
# ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
# ip -n a addr add 2.1.1.1/24 dev erspan1
# ip -n a link set erspan1 up
# ip -n a route add 1.1.1.0/24 encap ip id 1 \
erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
# ip -n a route show
# ip netns exec a ping 1.1.1.1 -c 1
1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
erspan_opts 2:0:1:2 dev erspan1 scope link
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms
v1->v2:
- improve the changelog.
- use PRINT_ANY to support dumping with json format.
v2->v3:
- implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
- keep the same format between input and output, json and non json.
- print version, index, dir and hwid as uint.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.
Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.
With this patch, users can add and dump vxlan options like:
# ip netns add a
# ip netns add b
# ip -n a link add eth0 type veth peer name eth0 netns b
# ip -n a link set eth0 up
# ip -n b link set eth0 up
# ip -n a addr add 10.1.0.1/24 dev eth0
# ip -n b addr add 10.1.0.2/24 dev eth0
# ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
remote 10.1.0.1 dev eth0 ttl 64 gbp
# ip -n b addr add 1.1.1.1/24 dev vxlan1
# ip -n b link set vxlan1 up
# ip -n b route add 2.1.1.0/24 dev vxlan1
# ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
gbp external
# ip -n a addr add 2.1.1.1/24 dev vxlan1
# ip -n a link set vxlan1 up
# ip -n a route add 1.1.1.0/24 encap ip id 1 \
vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
# ip -n a route show
# ip netns exec a ping 1.1.1.1 -c 1
1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
vxlan_opts 1110 dev vxlan1 scope link
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms
v1->v2:
- improve the changelog.
- get_u32 with base = 0 for gbp.
- use PRINT_ANY to support dumping with json format.
v2->v3:
- implement proper JSON array for opts.
v3->v4:
- keep the same format between input and output, json and non json.
- print gbp as uint.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.
Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.
With this patch, users can add and dump geneve options like:
# ip netns add a
# ip netns add b
# ip -n a link add eth0 type veth peer name eth0 netns b
# ip -n a link set eth0 up; ip -n b link set eth0 up
# ip -n a addr add 10.1.0.1/24 dev eth0
# ip -n b addr add 10.1.0.2/24 dev eth0
# ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
# ip -n b addr add 1.1.1.1/24 dev geneve1
# ip -n b link set geneve1 up
# ip -n b route add 2.1.1.0/24 dev geneve1
# ip -n a link add geneve1 type geneve external
# ip -n a addr add 2.1.1.1/24 dev geneve1
# ip -n a link set geneve1 up
# ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
dst 10.1.0.2 dev geneve1
# ip -n a route show
# ip netns exec a ping 1.1.1.1 -c 1
1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms
v1->v2:
- improve the changelog.
- use PRINT_ANY to support dumping with json format.
v2->v3:
- implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
- keep the same format between input and output, json and non json.
- print class and type as uint and print data as hex string.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.
To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
local 172.16.1.100 remote 172.16.1.200 \
erspan_ver 0
CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits
Examples:
1. Allows multiple subflows per MPTCP connection
$ ip mptcp limits set subflows 2
2. Accept ADD_ADDR announcement from the peer (server):
$ ip mptcp limits set add_addr_accepted 2
3. Add a ipv4 address to be annunced for backup subflows:
$ ip mptcp endpoint add 10.99.1.2 signal backup
4. Add an ipv6 address used as source for additional subflows:
$ ip mptcp endpoint add 2001::2 subflow
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
As commit f9d696cf41 ("xfrm: not try to delete ipcomp states when using
deleteall") does, this patch is to fix the same issue for ip6 state where
xsinfo->id.proto == IPPROTO_IPV6.
# ip xfrm state add src 2000::1 dst 2000::2 spi 0x1000 \
proto comp comp deflate mode tunnel sel src 2000::1 dst \
2000::2 proto gre
# ip xfrm sta deleteall
Failed to send delete-all request
: Operation not permitted
Note that the xsinfo->proto in common states can never be IPPROTO_IPV6.
Fixes: f9d696cf41 ("xfrm: not try to delete ipcomp states when using deleteall")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds support for configuring offload mode upon MACsec
device creation.
If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).
Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
In the commit referenced below, ip link started sending ERSPAN-specific
attributes even for GRE and gretap tunnels. Fix by more carefully
distinguishing between the GRE/tap and ERSPAN modes. Do not show
ERSPAN-related help in GRE/tap mode, likewise do not accept ERSPAN
arguments, or send ERSPAN attributes.
Fixes: 83c543af87 ("erspan: set erspan_ver to 1 by default")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
nh_dump_filter is missing a return value check in two cases.
Fix this simply adding an assignment to the proper variable.
Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds an accessor for the validate_str array, to handle future
changes adding a member.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
MacSEC can now be offloaded to specialized hardware devices. Offloading
is off by default when creating a new MACsec interface, but the mode can
be updated at runtime. This patch adds a new subcommand,
`ip macsec offload`, to allow users to select the offloading mode of a
MACsec interface. It takes the mode to switch to as an argument, which
can for now either be 'off' or 'phy':
# ip macsec offload macsec0 phy
# ip macsec offload macsec0 off
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch adds support to report the MACsec offloading mode currently
being enabled, which as of now can either be 'off' or 'phy'. This
information is reported through the `ip macsec show` command:
# ip macsec show
18: macsec0: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 3e5035b67c860001 on SA 0
0: PN 1, state on, key 00000000000000000000000000000000
RXSC: b4969112700f0001, state on
0: PN 1, state on, key 01000000000000000000000000000000
offload: phy
19: macsec1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 3e5035b67c880001 on SA 0
1: PN 1, state on, key 00000000000000000000000000000000
RXSC: b4969112700f0001, state on
1: PN 1, state on, key 01000000000000000000000000000000
offload: off
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
In kernel space, ipcomp(sub) states used by main states are not
allowed to be deleted by users, they would be freed only when
all main states are destroyed and no one uses them.
In user space, ip xfrm sta deleteall doesn't filter these ipcomp
states out, and it causes errors:
# ip xfrm state add src 192.168.0.1 dst 192.168.0.2 spi 0x1000 \
proto comp comp deflate mode tunnel sel src 192.168.0.1 dst \
192.168.0.2 proto gre
# ip xfrm sta deleteall
Failed to send delete-all request
: Operation not permitted
This patch is to fix it by filtering ipcomp states with a check
xsinfo->id.proto == IPPROTO_IPIP.
Fixes: c7699875be ("Import patch ipxfrm-20040707_2.diff")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently `ip -6 route show` gives us this output:
sharpd@eva ~/i/ip (master)> ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20
nexthop via fe80::99 dev enp39s0 weight 1
nexthop via fe80::44 dev enp39s0 weight 1 pref medium
Displaying `pref medium` as the last bit of output implies
that the RTA_PREF is a per nexthop value, when it is infact
a per route piece of data.
Change the output to display RTA_PREF and RTA_TTL_PROPAGATE
before the RTA_MULTIPATH data is shown:
sharpd@eva ~/i/ip (master)> ./ip -6 route show
::1 dev lo proto kernel metric 256 pref medium
4:5::6:7 nhid 18 proto static metric 20 pref medium
nexthop via fe80::99 dev enp39s0 weight 1
nexthop via fe80::44 dev enp39s0 weight 1
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 2897636267 ("erspan: add erspan version II support")
breaks the command:
# ip link add erspan1 type erspan key 1 seq erspan 123 \
local 10.1.0.2 remote 10.1.0.1
as erspan_ver is set to 0 by default, then IFLA_GRE_ERSPAN_INDEX
won't be set in gre_parse_opt().
# ip -d link show erspan1
...
erspan remote 10.1.0.1 local 10.1.0.2 ... erspan_index 0 erspan_ver 1
^^^^^^^^^^^^^^
This patch is to change to set erspan_ver to 1 by default.
Fixes: 2897636267 ("erspan: add erspan version II support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This restore the string format we have before jsonification, adding a
missing space between v2 and v3 on TX IGMP reports string.
Fixes: a9bc23a792 ("ip: bridge: add xstats json support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
After commit 8589eb4efd ("treewide: refactor help messages") help
messages for xfrm state and policy are broken, printing many times the
same protocol in UPSPEC section:
$ ip xfrm state help
[...]
UPSPEC := proto { { tcp | tcp | tcp | tcp } [ sport PORT ] [ dport PORT ] |
{ icmp | icmp | icmp } [ type NUMBER ] [ code NUMBER ] |
gre [ key { DOTTED-QUAD | NUMBER } ] | PROTO }
This happens because strxf_proto function is non-reentrant and gets called
multiple times in the same fprintf instruction.
This commit fix the issue avoiding calls to strxf_proto() with a constant
param, just hardcoding strings for protocol names.
Fixes: 8589eb4efd ("treewide: refactor help messages")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While at it, convert xfrm_xfrma_print and xfrm_encap_type_parse to use
the UAPI macros for encap_type as suggested by David Ahern, and add the
UAPI udp.h header (sync'd from ipsec-next to get the TCP_ENCAP_ESPINTCP
definition).
Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
$ make CCOPTS=-fno-common
gcc ... -o ip
ld: rt_names.o (symbol from plugin): in function "rtnl_rtprot_n2a":
(.text+0x0): multiple definition of "numeric"; ip.o (symbol from plugin):(.text+0x0): first defined here
gcc ... -o tipc
ld: ../lib/libutil.a(utils.o):(.bss+0xc): multiple definition of `pretty';
tipc.o:tipc.c:28: first defined here
References: https://bugzilla.opensuse.org/1160244
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The kernel now signals the offload state of a route using the
'RTM_F_OFFLOAD' and 'RTM_F_TRAP' flags. Print these to help users
understand the offload state of each route. The "rt_" prefix is used in
order to distinguish it from the offload state of nexthops.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The 802.3ad/LACP actor/partner operating states are only printed as
numbers, e.g,
ad_actor_oper_port_state 15
Add an additional output in ip link show that prints a string describing
the individual 3ad bit meanings in the following way:
ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>
JSON output is also supported, the field becomes a json array:
"ad_actor_oper_port_state_str":
["active","short_timeout","aggregating","in_sync"]
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
if_id is u32, error on -ve values instead of setting to 0
after :
ip link add ipsec1 type xfrm dev lo if_id -10
Error: argument "-10" is wrong: if_id value is invalid
before : note xfrm if_id 0
ip link add ipsec1 type xfrm dev lo if_id -10
ip -d link show dev ipsec1
9: ipsec1@lo: <NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/none 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
xfrm if_id 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for the BRIDGE_XSTATS_STP xstats, as follow:
# ip link xstats type bridge_slave dev lan4 stp
lan4
STP BPDU: RX: 0 TX: 61
STP TCN: RX: 0 TX: 0
STP Transitions: Blocked: 2 Forwarding: 1
Or below as JSON:
# ip -j -p link xstats type bridge_slave dev lan0 stp
[ {
"ifname": "lan0",
"stp": {
"rx_bpdu": 0,
"tx_bpdu": 500,
"rx_tcn": 0,
"tx_tcn": 0,
"transition_blk": 0,
"transition_fwd": 0
}
} ]
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Display permanent hardware address of an interface in output of
"ip link show" and "ip addr show". To reduce noise, permanent address is
only shown if it is different from current one.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
it allows to specify also the table name in addition to the table number in
SRv6 End.DT* behaviors.
To add an End.DT6 behavior route specifying the table by name:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table main dev eth0
The ip route show to print output this route:
$ ip -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End.DT6 table main dev eth0 metric 1024 pref medium
The JSON output:
$ ip -6 -j -p route show 2001:db8::1
[ {
"dst": "2001:db8::1",
"encap": "seg6local",
"action": "End.DT6",
"table": "main",
"dev": "eth0",
"metric": 1024,
"flags": [ ],
"pref": "medium"
} ]
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
Ip tool oneline option should output each record on a single line. While
oneline option is active the variable _SL_ replaces line feeds with the
'\' character. However, at the end of print_linkinfo() the variable _SL_
shouldn't be used, otherwise the whole output is on a single line.
Before this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00\2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP mode DEFAULT group default qlen 1000\
link/ether 52:54:00:60:0a:db brd ff:ff:ff:ff:ff:ff\3: eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode
DEFAULT group default qlen 1000\ link/ether 00:50:56:1b:05:cd brd
ff:ff:ff:ff:ff:ff\
After this fix:
$ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\ link/ether 52:54:00:60:0a:db
brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP mode DEFAULT group default qlen 1000\ link/ether 00:50:56:1b:05:cd
brd ff:ff:ff:ff:ff:ff
Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Extend iplink to show VF GUIDs (IFLA_VF_IB_NODE_GUID, IFLA_VF_IB_PORT_GUID),
giving the ability for user-space application to print GUID values.
This ability is added to the one of setting new node GUID and port GUID values.
Suitable ip link command:
- ip link show <device>
For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Since invarg() automatically adds a '\n' character, having one in the
error message generates an extra blank line.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Don't output the nsid and current-nsid json keys if they're not set.
Otherwise a parser would have to special case the "not-assigned"
string.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Negative values are invalid netns ids. Ensure that helper functions
don't accidentally try to process them.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
These attributes are signed (with -1 meaning NETNSA_NSID_NOT_ASSIGNED).
So let's use rta_getattr_s32() and print_int() instead of their
unsigned counterpart to avoid confusion.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Extend ll_name_to_index() to get the index of a netdevice using
alternative interface name. Allow alternative long names to pass checks
in couple of ip link/addr commands.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Implement addition/deletion of lists of properties, currently
alternative ifnames. Also extent the ip link show command to list them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
When `-all' argument is specified netns runs cmd on all namespaces
and NAME is not used, but netns nevertheless checks if argv[1] is a
valid namespace name ignoring the fact that argv[1] contains cmd
and not NAME. This results in bug where user cannot specify
absolute path to command.
# ip -all netns exec /usr/bin/whoami
Invalid netns name "/usr/bin/whoami"
This forces user to have his command in PATH.
Solution is simply to not validate argv[1] when `-all' argument is
specified.
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adds support to lookup a neigh entry
using recently added support in the kernel using RTM_GETNEIGH
example:
$ip neigh get 10.0.2.4 dev test-dummy0
10.0.2.4 dev test-dummy0 lladdr de:ad:be:ef:13:37 PERMANENT
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Conflicts:
devlink/devlink.c
Fixed the conflict by updating the numbering for all new attributes
after the ones in master branch.
Signed-off-by: David Ahern <dsahern@gmail.com>
Since linux commit 22d6552f827e ("xfrm interface: fix management of
phydev"), phydev is not mandatory anymore.
Note that it also could be useful before the above commit to not force the
user to put a phydev (the kernel was checking it anyway).
For example, it was useful to not set it in case of x-netns, because the
phydev is not available in the current netns:
Before the patch:
$ ip netns add foo
$ ip link add xfrm1 type xfrm dev eth1 if_id 1
$ ip link set xfrm1 netns foo
$ ip -n foo link set xfrm1 type xfrm dev eth1 if_id 2
Cannot find device "eth1"
$ ip -n foo link set xfrm1 type xfrm if_id 2
must specify physical device
Fixes: 286446c1e8 ("ip: support for xfrm interfaces")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Matt Ellison <matt@arroyo.io>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add a space after 'blackhole' is missing to properly separate the
protocol when it is given.
Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>