Currently, calling 'ip xfrm monitor all' will
actually invoke the 'all-nsid' command because the
soft-match for 'all-nsid' occurs before the precise
match for 'all'. This patch rearranges the checks
so that the 'all' command, itself an alias for
invoking 'ip xfrm monitor' with no argument, can
be called consistent with the syntax for other ip
commands that accept an 'all'.
Signed-off-by: Nathan Harold <nharold@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
A recent commit changed rtnl_talk_* to return the response message in
allocated memory so callers need to free it. The change to name_is_vrf
did not save the device index which is pointing to a struct inside the
now allocated and freed memory resulting in garbage getting returned
in some cases.
Fix by using a stack variable to save the return value and only set
it to ifi->ifi_index after all checks are done and before the answer
buffer is freed.
Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The ip command would always lookup the network device index
even when not necessary. This slows down operations like creating
lots of VLAN's.
David reported the original issue, this is an alternative patch
that solves it in a slightly more general method.
Using iproute2 to create a bridge and add 4094 vlans to it can take from
2 to 3 *minutes*. The reason is the extraneous call to ll_name_to_index.
ll_name_to_index results in an ioctl(SIOCGIFINDEX) call which in turn
invokes dev_load. If the index does not exist, which it won't when
creating a new link, dev_load calls modprobe twice -- once for
netdev-NAME and again for NAME. This is unnecessary overhead for each
link create.
When ip link is invoked for a new device, there is no reason to
call ll_name_to_index for the new device. With this patch, creating
a bridge and adding 4094 vlans takes less than 3 *seconds*.
old:
# time ip -batch ip-vlan.batch
real 3m13.727s
user 0m0.076s
sys 0m1.959s
new:
# time ip -batch ip-vlan.batch
real 0m3.222s
user 0m0.044s
sys 0m1.777s
Reported-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rta_expires is a signed int; print it as one.
Fixes: 663c3cb231 ("iproute: implement JSON and color output")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
However, some systems (e.g. Android) doesn't have /var
which results in error attempts to create network namespaces on these
systems. This change makes NETNS_RUN_DIR configurable at build time
by allowing to pass environment variable to make command.
Also, this change makes /etc/netns directory configurable through
NETNS_ETC_DIR environment variable.
For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make
Tested: verified that iproute2 with configuration mentioned above
creates namespaces in /mnt/vendor/netns
Signed-off-by: Pavel Maltsev <pavelm@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allowing 0% is sometimes useful for example in netem loss and drop
or perhaps dropping all traffic in a HTB bin.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199745
Reported-by: stuartmarsden@gmail.com
Fixes: 927e3cfb52 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As the kernel code says, limit is actually the amount of packets it can
hold queued at a time, as per:
static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
...
if (unlikely(sch->q.qlen >= sch->limit))
return qdisc_drop_all(skb, sch, to_free);
So lets fix the description of the field in the man page.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.
As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.
Fixes: ba2fc55b99 ("Drop capabilities if not running ip exec vrf with libcap")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Ss was using slabinfo to try and intuit TCP statistics.
The slabinfo changed several times since 2.4 and all these statistics
are broken by renames and slab merging. Plus slabinfo does not exist
at all if kernel is compiled with SLUB option.
Rather than trying to fix kernel, just trim away the no longer
valid statistics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The iproute2 header files must be complete to allow builds on
other places where some of the headers are not present.
For example, iproute2 is built on Windows Services for Linux
as a test tool. With the partial addition of rdma it was broken.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This makes rdma/include/uapi/rdma headers align with those produced
by doing make headers_install from upstream (Linus) tree.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Continue parsing a multipath payload as long as another nexthop can fit
in the payload.
# ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1
Before:
# ip route show 192.0.2.0/24
192.0.2.0/24
nexthop dev dummy0 weight 1
After:
# ip route show 192.0.2.0/24
192.0.2.0/24
nexthop dev dummy0 weight 1
nexthop dev dummy1 weight 1
Fixes: f48e14880a ("iproute: refactor multipath print")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Explicit link with pthread is not needed when linking dynamically. Even
static link with recent libdb does not pull in the code that uses
pthread. Finally, the configure check introduced in commit a25df4887d
(configure: Check for Berkeley DB for arpd compilation) does not add
-lpthread to its link command.
This change allows arpd build with toolchains that do not provide
threads support.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Debian does not distribute libdb4.x-dev for quite some time now. Current
stable carries libdb5.3-dev. Update the wording accordingly.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").
Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.
Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The dash printed by the ingress qdisc breaks JSON output, so only print it
in regular output mode.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
inadvertently changed the parameter to addattr_l() resulting in:
addattr_l ERROR: message exceeded bound of 4
when remote is specified.
Fixes: 6c4b672738 ("iplink_geneve: Get rid of inet_get_addr()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
In theory, the path for BPF could exceed the 4K PATH_MAX.
In practice, not really possible. But shut up gcc.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Thomas reported a change in behavior with respect to autodectecting
address families. Specifically, 'ip ro add default via fe80::1'
syntax was failing to treat fe80::1 as an IPv6 address as it did in
prior releases. The root causes appears to be a change in family when
the default keyword is parsed.
'default', 'any' and 'all' are relevant outside of AF_INET. Leave the
family arg as is for these when setting addr.
Fixes: 93fa12418d ("utils: Always specify family and ->bytelen in get_prefix_1()")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Serhey Popovych <serhe.popovych@gmail.com>
Attempt to add a multipath route where a nexthop definition refers to a
non-existent device causes 'ip' to crash and burn due to stack buffer
overflow:
# ip -6 route add fd00::1/64 nexthop dev fake1
Cannot find device "fake1"
Cannot find device "fake1"
Cannot find device "fake1"
...
Segmentation fault (core dumped)
Don't ignore errors from the helper routine that parses the nexthop
definition, and abort immediately if parsing fails.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
No 'g' to hairpin.
Fixes: 64108901b7 ("bridge: Add support for setting bridge port attributes")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The offset and peer_offset parameters are only printed to avoid
confusing external scripts that may parse "ip l2tp show session"
output. There's no reason to keep them in JSON.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Commit 9fd3f0b255 ("tc: enable json output for actions") added JSON
support for tc-actions at the expense of breaking other use cases that
reach tc_print_action(), as the latter don't expect the 'actions' array
to be a new object.
Consider the following taken duringrun of tc_chain.sh selftest,
and see the latter command output is broken:
$ ./tc/tc -j -p actions list action gact | grep -C 3 actions
[ {
"total acts": 1
},{
"actions": [ {
"order": 0,
$ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
},
"skip_hw": true,
"not_in_hw": true,{
"actions": [ {
"order": 1,
"kind": "gact",
"control_action": {
Relocate the open/close of the JSON object to declare the object only
for the case that needs it.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Tested-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ignore options "peer-offset" and "offset" when creating sessions. Keep
them when dumping sessions in order to avoid breaking external scripts.
"peer-offset" has always been a noop in iproute2. "offset" is now
ignored in Linux 4.16 (and was broken before that).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The check if netlink attributes supplied more than maximum supported
is to strict and may lead to backward compatibility issues with old
application with a newer kernel that supports new attribute.
CC: Steve Wise <swise@opengridcomputing.com>
Fixes: 74bd75c2b6 ("rdma: Add basic infrastructure for RDMA tool")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Part of upstream commit
4bbb3e0e8239 ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
(u64)-1 essentially means the size is unlimited. Print as 'unlimited'
as opposed to the current unsigned int range of 4294967295.
Signed-off-by: David Ahern <dsahern@gmail.com>
Steve Wise says:
====================
This series enhances the iproute2 rdma tool to include dumping of
connection manager id (cm_id), completion queue (cq), memory region (mr),
and protection domain (pd) rdma resources. It is the user-space part of
the kernel resource tracking series merged into rdma-next for 4.17 [1]
and [2].
Changes since v3:
- replaced rdma_cma.h inclusion with UAPI rdma_user_cm.h
- display only device names instead of device/port for cq, mr, and pd
since they are not associated with a specific port.
Changes since v2:
- pull in rdma-core:include/rdma/rdma_cma.h
- 80 column reformat
- add reviewed-by tags
Changes since v1/RFC:
- removed RFC tag
- initialize rd properly to avoid passing a garbage port number
- revert accidental change to qp_valid_filters
- removed cm_id dev/network/transport types
- cm_id ip addrs now passed up as __kernel_sockaddr_storage
- cm_id ip address ports printed as "address:port" strings
- only parse/display memory keys and iova if available
- filter on "users" for cqs and pds
- fixed memory leaks
- removed PD_FLAGS attribute
- filter on "mrlen" for mrs
- filter on "poll-ctx" for cqs
- don't require addrs or qp_type for parsing cm_ids
- only filter optional attrs if they are present
- remove PGSIZE MR attr to match kernel
[1] https://www.spinics.net/lists/linux-rdma/msg61720.html
[2] https://www.spinics.net/lists/linux-rdma/msg62979.htmlhttps://www.spinics.net/lists/linux-rdma/msg62980.html
====================
Signed-off-by: David Ahern <dsahern@gmail.com>
Sample output:
Without CAP_NET_ADMIN capability:
dev mlx4_0 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 users 0 pid 0 comm [ib_srp]
dev mlx4_0 users 1 pid 0 comm [ib_core]
dev cxgb4_0 users 0 pid 0 comm [ib_srp]
With CAP_NET_ADMIN capability:
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srpt]
dev mlx4_0 local_dma_lkey 0x8000 users 0 pid 0 comm [ib_srp]
dev mlx4_0 local_dma_lkey 0x8000 users 1 pid 0 comm [ib_core]
dev cxgb4_0 local_dma_lkey 0x0 users 0 pid 0 comm [ib_srp]
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Initialize the rd struct so port_idx is 0 unless set otherwise.
Otherwise, strict_port queries end up passing an uninitialized PORT
nlattr.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Pull in the latest rdma_netlink.h which has support for
the rdma nldev resource tracking objects being added
with this patch series.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add initial support for oneline mode in tc; actions, filters and qdiscs
will be gradually updated in the follow-up patches.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Jon Maloy says:
====================
1: We introduce ability to set/get 128-bit node identities
2: We rename 'net id' to 'cluster id' in the command API,
of course in a compatible way.
3: We print out all 32-bit node addresses as an integer in hex format,
i.e., we remove the assumption about an internal structure.
====================
Signed-off-by: David Ahern <dsahern@gmail.com>