ip route uses ll_name_to_index and ll_index_to_name to convert between
device names and indices. At the moment both use for the ioctl based glibc
functions if_nametoindex and if_indextoname and does not cache the result.
When using a batch file or dumping large number of routes this means the
same device lookups can be done repeatedly adding unnecessary overhead
(socket + ioctl + close for each device lookup).
Add a new function, ll_link_get, to send a netlink based RTM_GETLINK. If
successful, cache the result in idx_head and name_head so future lookups
can re-use the entry. Update ll_name_to_index and ll_index_to_name to use
ll_link_get and only fallback to the glibc functions if it fails.
With this change the time to install 720,022 routes with 2 ecmp nexthops
where the nexthop device is given is reduced from 31.4 seconds to 19.2
seconds. A dump of those routes drops from 13.3 to 2.8 seconds.
Signed-off-by: David Ahern <dsahern@gmail.com>
In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.
Commit 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.
See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.
Fixes: 2d34851cd3 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
in this way, a useless cast to unsigned int is avoided in bpf_print_ops()
and print_tunnel().
Tested with:
# ./tdc.py -c bpf
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The issue is discovered for bpf selftest test_skb_cgroup.sh.
Currently we have,
$ ./test_skb_cgroup_id.sh
Wait for testing link-local IP to become available ... OK
Object has unknown BTF type: 13!
[PASS]
In the above the BTF type 13 refers to BTF kind
BTF_KIND_FUNC_PROTO.
This patch added support of BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC during type parsing.
With this patch, I got
$ ./test_skb_cgroup_id.sh
Wait for testing link-local IP to become available ... OK
[PASS]
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.
Fix this by using RTM_GETNEIGH instead.
Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002
After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
Fixes: 05880354c2 ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Without this fix, the VF info can't be showed using command
"ip link".
146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off
Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.
Fixes: d97b16b2c9 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.
Extracted from patch from Ido to fix filtering with strict checking
enabled.
Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.
Signed-off-by: David Ahern <dsahern@gmail.com>
iproute2 has been updated for the new strict policy in the kernel. Add a
helper to call setsockopt to enable the feature. Add a call to ip.c and
bridge.c
The setsockopt fails on older kernels and the error can be safely ignored
- any new fields or attributes are ignored by the older kernel.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add a filter function to rtnl_addrdump_req to set device index in the
address dump request if the user is filtering addresses by device. In
addition, add a new ipaddr_link_get to do a single RTM_GETLINK request
instead of a device dump yet still store the data in the linfo list.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add a filter option to rtnl_routedump_req and use it to set rtm_flags
removing the need for rtnl_rtcache_request for dump requests.
Signed-off-by: David Ahern <dsahern@gmail.com>
Only AF_UNSPEC handled by rtnl_dump_ifinfo expects an ext_filter_mask
on a dump request. Update the linkdump request functions to only set
and send ext_filter_mask for AF_UNSPEC.
Signed-off-by: David Ahern <dsahern@gmail.com>
Change nlmsg_len from sizeof(req) to use NLMSG_LENGTH on the header.
2 of the inner headers are not 4-byte aligned, so add a 0-length buf
after the header with the __aligned(NLMSG_ALIGNTO) to ensure the size
of the request is large enough. Use NLMSG_ALIGN in NLMSG_LENGTH to set
nlmsg_len.
Signed-off-by: David Ahern <dsahern@gmail.com>
Print any extack message that has been appended to a NLMSG_DONE message.
To avoid duplication, move the existing print code to a new helper.
Signed-off-by: David Ahern <dsahern@gmail.com>
DECnet belongs in the history museum of dead protocols along
with Appletalk and IPX.
Linux support has outlived its natural life and the time has
come to remove it from iproute2. Dead code is a source
of bugs and exploits.
If anyone actually has DECnet running on some old distribution
they can just keep to the old version of iproute2.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Function was not used unlesss HAVE_ELF causing:
bpf.c:105:13: warning: ‘bpf_map_offload_neutral’ defined but not used [-Wunused-function]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When no error is reported in the first iov, do not prematurely return,
but process further iovs. This fixes batch processing.
Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
IPX has been depracted then removed from upstream kernels.
Drop support from ip route as well.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
In order to compare BPF map symbol type correctly in regard to the
latest LLVM, commit 7a04dd84a7 ("bpf: check map symbol type properly
with newer llvm compiler") compares map symbol type to both NOTYPE and
OBJECT. To do so, it first retrieves the type from "sym.st_info" and
stores it into a temporary variable.
However, the type is collected from the symbol "sym" before this latter
symbol is actually updated. gelf_getsym() is called after that and
updates "sym", and when comparison with OBJECT or NOTYPE happens it is
done on the type of the symbol collected in the previous passage of the
loop (or on an uninitialised symbol on the first passage). This may
eventually break map collection from the ELF file.
Fix this by assigning the type to the temporary variable only after the
call to gelf_getsym().
Fixes: 7a04dd84a7 ("bpf: check map symbol type properly with newer llvm compiler")
Reported-by: Ron Philip <ron.philip@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rntl_talk_extack and parse_rtattr_index not used in current code.
rtnl_dump_filter_l is only used in this file.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This is simpler and cleaner, and avoids having to include the header
from every file where the functions are used. The prototypes of the
internal implementation are in this header, so utils.h will have to be
included anyway for those.
Fixes: 508f3c231e ("Use libbsd for strlcpy if available")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If libc does not provide strlcpy check for libbsd with pkg-config to
avoid relying on inline version.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
-bash-4.4$ cat t.c
__attribute__((section("maps"))) int g;
-bash-4.4$ clang -target bpf -O2 -c t.c
-bash-4.4$ readelf -s t.o
Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 3 g
The following llvm commit enables BPF target to generate
proper symbol type and size.
commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
Author: Yonghong Song <yhs@fb.com>
Date: Wed Sep 19 16:04:13 2018 +0000
[bpf] Symbol sizes and types in object file
Clang-compiled object files currently don't include the symbol sizes and
types. Some tools however need that information. For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but OBJECT.
-bash-4.4$ clang -target bpf -O2 -c t.c
-bash-4.4$ readelf -s t.o
Symbol table '.symtab' contains 3 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t.c
2: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 g
This patch makes sure bpf library accepts both NOTYPE and OBJECT types
of global map symbols.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
iproute2 walks through the list of available tunnels using netlink
protocol in order to get device info instead of reading
them from proc filesystem. However the kernel reports device statistics
using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
IFLA_PROTINFO one but iproutes expects these info in
IFLA_STATS64/IFLA_STATS attributes.
The issue can be triggered with the following reproducer:
$ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
$ip -6 -d -s tunnel show ip6d0
ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
Dump terminated
Fix the issue introducing IFLA_INET6_STATS attribute parsing
Fixes: 3e95393871 ("iptunnel/ip6tunnel: Use netlink to walk through
tunnels list")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Stephen converted macsec's sci to use 0xhex, but 0xhex handles
unsigned int's, not 64 bits ints. Thus, the output of the "ip macsec
show" command is mangled, with half of the SCI replaced with 0s:
# ip macsec show
11: macsec0: [...]
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0000000001560001 on SA 0
# ip -d link show macsec0
11: macsec0@ens3: [...]
link/ether 52:54:00:12:01:56 brd ff:ff:ff:ff:ff:ff promiscuity 0
macsec sci 5254001201560001 [...]
where TXSC and sci should match.
Fixes: c0b904de62 ("macsec: support JSON")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In __rtnl_talk_iov() main loop, err is a pointer to memory in dynamically
allocated 'buf' that is used to store netlink messages. If netlink message
is an error message, buf is deallocated before returning with error code.
However, on return err->error code is checked one more time to generate
return value, after memory which err points to has already been
freed. Save error code in temporary variable and use the variable to
generate return value.
Fixes: c60389e4f9 ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add this helper to read signed 64-bit integers from a string.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
rtnl_wilddump_stats_req_filter only takes RTM_GETSTATS as the type argument
so rename to rtnl_statsdump_req_filter for consistency with other request
functions and hardcode the type argument.
Signed-off-by: David Ahern <dsahern@gmail.com>
Rename rtnl_wilddump_req_filter to rtnl_linkdump_req_filter,
rtnl_wilddump_request to rtnl_linkdump_req and
rtnl_wilddump_req_filter_fn to rtnl_linkdump_req_filter_fn.
In all cases drop the type argument which at this point is only
RTM_GETLINK and hardcode in the functions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_nsiddump_req for namespace id dumps using the proper rtgenmsg
as the header. Convert existing RTM_GETNSID dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_neightbldump_req for neighbor table dumps using the proper ndtmsg
as the header. Convert existing RTM_GETNEIGHTBL dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>